An idea that I find to be fun is square-free words. These are not words like words in spoken languages. You start with what you call an alphabet, which is a set of symbols. We will use the letters , , and for our alphabet. Then a word is any sequence of these symbols. So for our alphabet, , , and are all words. You can also have an empty word, which is represented by the greek letter epsilon, .
Words can have factors. This is basically a word inside of a word. So if the word is , then one factor of this is . We can factor a word by breaking it into factors. So if we say to let , then and are two factors of .
A word is square-free if it has no back-to-back repeating patterns. In other words, its not possible to write a word as . Examples of square-free words are , , and . Two words which are not square-free is and .
How might we list all square-free words?
We could list every single word, and then check them to see if they are square-free or not. However, this would be slow. First, the number of factors of a word does not scale nicely as the length of the words grow. And second, its more likely for a word to not be square-free. It would be ideal if there was a way to not have to check all the factors of each word, and not have to check every word.
The thing to consider that you can list all words in a tree like so.
To shorten the way that we test words there is an alternate definition of a square-free word which we need to prove is equivalent to our original one.
Let be a word. Then is square-free if and only if
- is square-free.
- every factor which ends at is not preceded by itself.
Something to keep in mind is that “a factor ending at ” means a factor all the way at the end of the word. It refers to position, not the symbol itself.
Suppose that is square-free. Every factor of a square-free word is square-free, which means that is. Also, no factor of a square-free word can be preceded by itself, so every factor which ends at is not preceded by itself.
Let be a word. Suppose that is square-free, and that every factor which ends at is not preceded by itself. There are no repeating factors in , so if there is a repeating factor it’s going to need to end at . But no factor which ends at which has a repeating factor. Then has no repeating factors and is square-free.
This goes hand in hand with the tree we’ve created. At any word in the tree, its parent is , and we know its parent is square-free. So we can use the above theorem to only test factors ending at , which cuts down on what we have to test significantly.
The following code written in Python implements this by printing out all square-free words up to a certain length.
import sys ''' Start with the rightmost letter in a word and go back one position at a time. At each position look at the word in front of it (including it), and the word behind it of equal length. This function tests if any of those pairs of words are the same. For example, if the word is abcde we test the following pairs: d e (pivot on e) bc de (pivot on d) If the word is abcdef, e f (pivot on f) cd ef (pivot on e) abc def (pivot on d) ''' def testword(word): wordl = len(word) pivot = wordl - 1 # We cut off one earlier if the word is of odd length. cutoff = (wordl % 2) + wordl / 2 while pivot >= cutoff: right = word[pivot:] left = word[len(right) * (-2):pivot] if left == right: return False pivot -= 1 return True ''' Enumerates all words by doing a breadth first search of the tree where, given a word with length l, its children are all words of length l+1 which have it as a suffix. ''' def enumerate_words(alphabet, stoplen): breadth_queue = [''] words =  # Words that are square-free. while breadth_queue: word = breadth_queue breadth_queue = breadth_queue[1:] children =  for w in alphabet: test = word + w if testword(test): children.append(test) if len(word) < stoplen: breadth_queue = breadth_queue + children words.append(word) return words if __name__ == '__main__': if len(sys.argv) < 2: size = 5 else: size = int(sys.argv) for w in enumerate_words( ['a','b','c'],size): print w