*: For core functionality, including pretty-printing code, it's 55 lines.
I went looking for a trie implementation for wordle-squared, but I found solutions that were over-complicated and messy. Hopefully this is useful to you! A trie implementation that's about as simple as possible, pythonic, and includes some basic utilities and pretty printing.
What is a Trie? (Quickly)¶
A trie, otherwise known as a prefix-tree, is a datastructure for storing sequences. Instead of storing data in the nodes, a single element of the sequence is stored in each edge. The path from the root to a node then restores the sequence. This is an implementation of a trie for strings; each edge stores a character. It's great for quickly finding words that have a certain prefix or finding letters that may continue a stem. I used it to quickly generate millions of word-grids for the aforementioned wordle-squared game I developed.
If you're looking for a longer description of what a Trie is and why you should care... Try wikipedia
The Code¶
Without further adieu, here's the code! Of course, it's free to use without creditation.
from collections import defaultdict
class TrieNode(object):
def __init__(self):
# Maps char -> TrieNode
self.children = defaultdict(lambda: TrieNode())
# This node may have children, but if any words terminate here, this is true.
self.can_terminate_word = False
def add(self, word: str):
if not word:
self.can_terminate_word = True
return
head, *tail = word
node = self.children[head].add(tail)
def find_prefix(self, word: str):
if word:
head, *tail = word
if head in self.children:
return self.children[head].find_prefix(tail)
else:
print("Prefix not found in tree")
return None
else:
return self
def suffixes(self):
if self.can_terminate_word:
yield ""
for letter, node in sorted(self.children.items()):
for suffix in node.suffixes():
yield letter + suffix
def pprint(self):
print(" +", end='')
self._pprint()
def _pprint(self, indent_str=""):
needs_indent = False
if self.can_terminate_word:
print(".") # Terminate with '.'
needs_indent = True
for ix, (letter, node) in enumerate(sorted(self.children.items())):
last_child = ix == len(self.children)-1
if needs_indent:
print(indent_str + " +", end='')
print('-' + letter, end='')
child_indent = indent_str + (" " if last_child else " |")
node._pprint(child_indent)
needs_indent = True
How to use it:¶
First, let's get some words...
words = """
We're no strangers to love
You know the rules and so do I
A full commitment's what I'm thinking of
You wouldn't get this from any other guy
I just wanna tell you how I'm feeling
Gotta make you understand
Never gonna give you up
Never gonna let you down
Never gonna run around and desert you
Never gonna make you cry
Never gonna say goodbye
Never gonna tell a lie and hurt you
We've known each other for so long
""".replace("\n", " ").split(" ")
words[:10]
Adding words to the trie is as simple as root_node.add()
root = TrieNode()
for word in words:
root.add(word.lower())
You can pretty-print the trie using .pprint().
root.pprint()
You can jump to a part of the trie with find_prefix(str)
node = root.find_prefix('th') # Find all words starting with 'th'
node.pprint()
You can find all word-endings starting from any node with .suffixes(), which returns a generator
print(list(node.suffixes())) # th-e, th-inking, th-is
Check if a node denotes a complete word by checking can_terminate_word
print("th:", root.find_prefix('th').can_terminate_word, "\n"+
"the:", root.find_prefix('the').can_terminate_word)
All of these can be used from any part of the tree!
root.find_prefix('t').find_prefix('h').add('ere') # Add 'there'
root.find_prefix('th').pprint() # Check there's there.
I hope that was helpful!