Sunday, August 20, 2017

[MOOC] Implement TrieMatching

This is the third problem in assignment 1 for this class. It has been a long time since I worked on the first problem. I was confused why we have to model the trie as the graph where node is the number, edge the alphabet. Why can't we model it where node is the alphabet?

An counter-example will be: Consider a pattern 'ATAC', and we want to see if it matches to 'ATAT'. If we model the pattern as the first graph representation, it will be {'A': ['T', 'C'], 'T': ['A'], 'C': ['$']}. Then we will say 'ATAT' matches 'ATAC', which is obviously wrong.

Here is the code.


def solve (text, n, patterns):
    result = []
    # write your code here
    trie = build_trie(patterns)
    for i in range(len(text)):
        if match_pattern(text[i:], trie):
            result.append(i)
    return result

def build_trie(patterns):
    """Build trie for patterns"""
    trie = {0: {}}
    counter = 1
    for p in patterns:
        p = p + '$'
        current_node = 0
        for c in p:
            if c in trie[current_node]:
                current_node = trie[current_node][c]
            else:
                trie[current_node][c] = counter
                trie[counter] = {}
                current_node = counter
                counter += 1
    return trie

def match_pattern(text, trie):
    cur_node = 0
    for ch in text:
        if ch in trie[cur_node]:
            cur_node = trie[cur_node][ch]
            if '$' in trie[cur_node]:
                return True
        else:
            return False
    if '$' in trie[cur_node]:
        return True
    else:
        return False

No comments:

Post a Comment