Myhill-Nerode Theorem : A beautiful alternative to Pumping Lemma

When I took the formal languages and automata course in my undergrad, I was taught that the only way of proving if a language is regular or non-regular is only through Pumping Lemma.

Pumping Lemma is based on the fact that all sufficiently long words in a language may be pumped, i.e. have a middle section of the word which can repeated an arbitrary number of times – to produce a new word that also lies in the same language. For example, let’s say that for any language L there exists a constant X, such that any word W in L with length X can be split into three substrings W = ABC, where B is never empty, so that the words ABBC, ABBBC, … constructed by repeating B are still in L. This process of repeating the middle substring is called “Pumping”. The Pumping Lemma also guarantees that the length of AB will be at most X, imposing a limit on the number of ways in which W can be split.

The major drawback of Pumping Lemma is that it can only tell if a given language is not in a language class. It can never tell if a language belongs to a given language class. It ends up as a necessary but not sufficient condition to determine class membership.

I personally found this very non-intuitive and difficult to apply to problems.

Then came the Myhill-Nerode theorem, the most interesting stuff I have learnt in the last month. It has a very short and simple statement:

                  “A language is non-regular if and only if there is an infinite distinctive set for it.”

The surprising fact is, it gains superiority over Pumping Lemma by using just four words – “if and only if”. By stressing on those words, it makes sure that there is a necessary and sufficient condition to prove the regularity of a language.

What is an infinite distinctive set? 

Let L be any language. If there is an infinite set of strings S such that for all X,Y belonging to S (X != Y), there exists a Z such that L(XZ) != L(YZ),  then S is called an infinite distinctive set.

The most important question – How to prove/disprove regularity using this Theorem? 

Example 1 – 

Given L = {w : number of 0’s = number of 1’s}.

Let’s take S = 0* which is infinite. Let x,y be two strings such that x = 0^n, y = 0^m where (m,n) are chosen arbitrarily and m!=n.

Let’s take a z = 1^m.

This leads to xz = 0^n 1^m and yz = 0^m 1^m.

Clearly yz belongs to L and xz does not belong to L. We can conclude that S is an infinite distinctive set.

Hence, L is non-regular.

Example 2 –

Given L = {w: number of 0’s > number of 1’s}.

Let’s take S = 0* which is infinite. Let x,y be two string such that x = 0^n, y = 0^m where (m,n) are chosen arbitrarily and m!=n.

Let’s take a z = 1^min(m,n) where min(m,n) = n if m > n and min(m,n) = m if m < n.

This leads to xz = 0^n1^min(m,n) and yz = 0^m1^min(m,n). There are two possible cases –

Case 1 – min(m,n) = m,  xz = 0^n1^m and yz = 0^m1^m. xz belongs to L and yz does not belong to L.

Case 2 – min(m,n) = n,  xz = 0^n1^n and yz = 0^m1^n. xz does not belong to L and yz belongs to L.

In both cases, we can conclude that S is an infinite distinctive set. Hence, L is non-regular.

Other problems to ponder over – 

1. If a Language L is regular, does it mean that L.L(concatenation) is always regular ?

Inspiration for this post –

My “Theory of Computation” professor Dr. Kenneth W. Regan who taught me this theorem and ended up transferring his awe for this theorem to me.

PS –

This post will be published after 3 pm today as some of the problems are my assignment problems and I am not supposed to disclose the answers before submission.