A Trie in Python in under 34 Lines*

*: For core functionality, including pretty-printing code, it's 55 lines.

I went looking for a trie implementation for wordle-squared, but I found solutions that were over-complicated and messy. Hopefully this is useful to you! A trie implementation that's about as simple as possible, pythonic, and includes some basic utilities and pretty printing.

What is a Trie? (Quickly)

A trie, otherwise known as a prefix-tree, is a datastructure for storing sequences. Instead of storing data in the nodes, a single element of the sequence is stored in each edge. The path from the root to a node then restores the sequence. This is an implementation of a trie for strings; each edge stores a character. It's great for quickly finding words that have a certain prefix or finding letters that may continue a stem. I used it to quickly generate millions of word-grids for the aforementioned wordle-squared game I developed.

If you're looking for a longer description of what a Trie is and why you should care... Try wikipedia

The Code

Without further adieu, here's the code! Of course, it's free to use without creditation.

In [1]:
from collections import defaultdict

class TrieNode(object):
    def __init__(self):
        # Maps char -> TrieNode
        self.children = defaultdict(lambda: TrieNode())
        # This node may have children, but if any words terminate here, this is true.
        self.can_terminate_word = False

    def add(self, word: str):
        if not word:
            self.can_terminate_word = True
            return
        head, *tail = word
        node = self.children[head].add(tail)
                
    def find_prefix(self, word: str):
        if word:
            head, *tail = word
            if head in self.children:
                return self.children[head].find_prefix(tail)
            else:
                print("Prefix not found in tree")
                return None
        else:
            return self

    def suffixes(self):
        if self.can_terminate_word:
            yield ""
        for letter, node in sorted(self.children.items()):
            for suffix in node.suffixes():
                yield letter + suffix
                
    def pprint(self):
        print(" +", end='')
        self._pprint()

    def _pprint(self, indent_str=""):          
        needs_indent = False
                        
        if self.can_terminate_word:
            print(".")  # Terminate with '.'
            needs_indent = True

        for ix, (letter, node) in enumerate(sorted(self.children.items())):
            last_child = ix == len(self.children)-1

            if needs_indent:
                print(indent_str + " +", end='')
            print('-' + letter, end='')
            
            child_indent = indent_str + ("  " if last_child else " |")
            node._pprint(child_indent)
            needs_indent = True

How to use it:

First, let's get some words...

In [2]:
words = """
We're no strangers to love
You know the rules and so do I
A full commitment's what I'm thinking of
You wouldn't get this from any other guy
I just wanna tell you how I'm feeling
Gotta make you understand
Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you
We've known each other for so long
""".replace("\n", " ").split(" ")

words[:10]
Out[2]:
['', "We're", 'no', 'strangers', 'to', 'love', 'You', 'know', 'the', 'rules']

Adding words to the trie is as simple as root_node.add()

In [3]:
root = TrieNode()
for word in words:
    root.add(word.lower())

You can pretty-print the trie using .pprint().

In [4]:
root.pprint()
 +.
 +-a.
 | +-n-d.
 | | +-y.
 | +-r-o-u-n-d.
 +-c-o-m-m-i-t-m-e-n-t-'-s.
 | +-r-y.
 +-d-e-s-e-r-t.
 | +-o.
 |   +-w-n.
 +-e-a-c-h.
 +-f-e-e-l-i-n-g.
 | +-o-r.
 | +-r-o-m.
 | +-u-l-l.
 +-g-e-t.
 | +-i-v-e.
 | +-o-n-n-a.
 | | +-o-d-b-y-e.
 | | +-t-t-a.
 | +-u-y.
 +-h-o-w.
 | +-u-r-t.
 +-i.
 | +-'-m.
 +-j-u-s-t.
 +-k-n-o-w.
 |       +-n.
 +-l-e-t.
 | +-i-e.
 | +-o-n-g.
 |   +-v-e.
 +-m-a-k-e.
 +-n-e-v-e-r.
 | +-o.
 +-o-f.
 | +-t-h-e-r.
 +-r-u-l-e-s.
 |   +-n.
 +-s-a-y.
 | +-o.
 | +-t-r-a-n-g-e-r-s.
 +-t-e-l-l.
 | +-h-e.
 | | +-i-n-k-i-n-g.
 | |   +-s.
 | +-o.
 +-u-n-d-e-r-s-t-a-n-d.
 | +-p.
 +-w-a-n-n-a.
 | +-e-'-r-e.
 | |   +-v-e.
 | +-h-a-t.
 | +-o-u-l-d-n-'-t.
 +-y-o-u.

You can jump to a part of the trie with find_prefix(str)

In [5]:
node = root.find_prefix('th')  # Find all words starting with 'th'
node.pprint()
 +-e.
 +-i-n-k-i-n-g.
   +-s.

You can find all word-endings starting from any node with .suffixes(), which returns a generator

In [6]:
print(list(node.suffixes()))  # th-e, th-inking, th-is
['e', 'inking', 'is']

Check if a node denotes a complete word by checking can_terminate_word

In [7]:
print("th:", root.find_prefix('th').can_terminate_word, "\n"+
      "the:", root.find_prefix('the').can_terminate_word)
th: False 
the: True

All of these can be used from any part of the tree!

In [8]:
root.find_prefix('t').find_prefix('h').add('ere')  # Add 'there'

root.find_prefix('th').pprint()  # Check there's there.
 +-e.
 | +-r-e.
 +-i-n-k-i-n-g.
   +-s.

I hope that was helpful!