*: For core functionality, including pretty-printing code, it's 55 lines.
I went looking for a trie implementation for wordle-squared, but I found solutions that were over-complicated and messy. Hopefully this is useful to you! A trie implementation that's about as simple as possible, pythonic, and includes some basic utilities and pretty printing.
What is a Trie? (Quickly)¶
A trie, otherwise known as a prefix-tree, is a datastructure for storing sequences. Instead of storing data in the nodes, a single element of the sequence is stored in each edge. The path from the root to a node then restores the sequence. This is an implementation of a trie for strings; each edge stores a character. It's great for quickly finding words that have a certain prefix or finding letters that may continue a stem. I used it to quickly generate millions of word-grids for the aforementioned wordle-squared game I developed.
If you're looking for a longer description of what a Trie is and why you should care... Try wikipedia
Without further adieu, here's the code! Of course, it's free to use without creditation.
from collections import defaultdict class TrieNode(object): def __init__(self): # Maps char -> TrieNode self.children = defaultdict(lambda: TrieNode()) # This node may have children, but if any words terminate here, this is true. self.can_terminate_word = False def add(self, word: str): if not word: self.can_terminate_word = True return head, *tail = word node = self.children[head].add(tail) def find_prefix(self, word: str): if word: head, *tail = word if head in self.children: return self.children[head].find_prefix(tail) else: print("Prefix not found in tree") return None else: return self def suffixes(self): if self.can_terminate_word: yield "" for letter, node in sorted(self.children.items()): for suffix in node.suffixes(): yield letter + suffix def pprint(self): print(" +", end='') self._pprint() def _pprint(self, indent_str=""): needs_indent = False if self.can_terminate_word: print(".") # Terminate with '.' needs_indent = True for ix, (letter, node) in enumerate(sorted(self.children.items())): last_child = ix == len(self.children)-1 if needs_indent: print(indent_str + " +", end='') print('-' + letter, end='') child_indent = indent_str + (" " if last_child else " |") node._pprint(child_indent) needs_indent = True
words = """ We're no strangers to love You know the rules and so do I A full commitment's what I'm thinking of You wouldn't get this from any other guy I just wanna tell you how I'm feeling Gotta make you understand Never gonna give you up Never gonna let you down Never gonna run around and desert you Never gonna make you cry Never gonna say goodbye Never gonna tell a lie and hurt you We've known each other for so long """.replace("\n", " ").split(" ") words[:10]
['', "We're", 'no', 'strangers', 'to', 'love', 'You', 'know', 'the', 'rules']
Adding words to the trie is as simple as root_node.add()
root = TrieNode() for word in words: root.add(word.lower())
You can pretty-print the trie using .pprint().
+. +-a. | +-n-d. | | +-y. | +-r-o-u-n-d. +-c-o-m-m-i-t-m-e-n-t-'-s. | +-r-y. +-d-e-s-e-r-t. | +-o. | +-w-n. +-e-a-c-h. +-f-e-e-l-i-n-g. | +-o-r. | +-r-o-m. | +-u-l-l. +-g-e-t. | +-i-v-e. | +-o-n-n-a. | | +-o-d-b-y-e. | | +-t-t-a. | +-u-y. +-h-o-w. | +-u-r-t. +-i. | +-'-m. +-j-u-s-t. +-k-n-o-w. | +-n. +-l-e-t. | +-i-e. | +-o-n-g. | +-v-e. +-m-a-k-e. +-n-e-v-e-r. | +-o. +-o-f. | +-t-h-e-r. +-r-u-l-e-s. | +-n. +-s-a-y. | +-o. | +-t-r-a-n-g-e-r-s. +-t-e-l-l. | +-h-e. | | +-i-n-k-i-n-g. | | +-s. | +-o. +-u-n-d-e-r-s-t-a-n-d. | +-p. +-w-a-n-n-a. | +-e-'-r-e. | | +-v-e. | +-h-a-t. | +-o-u-l-d-n-'-t. +-y-o-u.
You can jump to a part of the trie with find_prefix(str)
node = root.find_prefix('th') # Find all words starting with 'th' node.pprint()
+-e. +-i-n-k-i-n-g. +-s.
You can find all word-endings starting from any node with .suffixes(), which returns a generator
print(list(node.suffixes())) # th-e, th-inking, th-is
['e', 'inking', 'is']
Check if a node denotes a complete word by checking
print("th:", root.find_prefix('th').can_terminate_word, "\n"+ "the:", root.find_prefix('the').can_terminate_word)
th: False the: True
All of these can be used from any part of the tree!
root.find_prefix('t').find_prefix('h').add('ere') # Add 'there' root.find_prefix('th').pprint() # Check there's there.
+-e. | +-r-e. +-i-n-k-i-n-g. +-s.
I hope that was helpful!