leetcode-[word系列]-[79,212,126,127,139,140,290,291]

lc上有这样关键词含有word的两两成组的八道题, 分别是word break I&II, word pattern I&II, word ladder I&II, word search I&&II.
今晚重新集中做了这几道题, 然后记录一下解析.

139. word break I

描述

给一个非空字符串s和一个word字典wordDict, 检查s是否能由wordDict里的word(相当于substring)拼接成. 每个word可以用0次或多次.
例如s = "leetcode", wordDict = ["leet", "code"], s可以由leet和code拼接成.

思路

经典dp, 用一个数组dp[], dp[i]表示s[:i]这个substring能否被拼接出来. 初始值dp[0] = True空字符串默认一定能拼接出来.
对于某个i, 如果s[i+len(word)] == word 并且 dp[i] == True, 那么当前字符串可以被拼接到i+len(word)的位置, 也就是dp[i+len(word)]为True.

代码

class Solution(object):
    def wordBreak(self, s, wordDict):
        """
        :type s: str
        :type wordDict: List[str]
        :rtype: bool
        """
        dp = [False] * (len(s) + 1)
        dp[0] = True
        for i in xrange(len(s)):
            for j in xrange(len(wordDict)):
                n = len(wordDict[j])
                if dp[i] and s[i:i + n] == wordDict[j]:
                    dp[i + n] = True
        return dp[-1]

140. word break II

描述

题意基本和word break I 一样, 不同的是这个题要求返回所有可能的拼接结果. 不同的word用空格隔开.

思路

一个思路绝对正确并且(理论上)最优的bottom up方法是, 用dp数组(其实是字典)来存处理到当前位置i的时候, 对于substrings[:i]的所有可能拼接结果. 然后每次每次match到新的word, 加入对应新位置i+len(word)的结果中去.

但是test case有一个非常强…会把这种方法卡掉. 分别试了试正向和反向的bottom up, 都会挂在一个特殊case上(长这样: aaaaaaa...b, 候选字典全是各种长度的a, 因为没有b, 肯定不行)上. 一个方法是先用上个题的方法check一下是否可分, 如果不可分直接返回空list. 或者用top down.

top down的理论上要慢一些, 因为函数上下文切换和求hash(本题的bottom up也求了hash, 不过可以用list of list解决避免hash). 但是bottom up在上面那个case会出现超级多的不同组合, 导致TLE/MLE.

代码

top down(反向)

class Solution(object):
    def wordBreak(self, s, wordDict):
        """
        :type s: str
        :type wordDict: List[str]
        :rtype: List[str]
        """
        dp = collections.defaultdict(list)
        dp[len(s)].append("")
        def dfs(i):
            if i not in dp:
                for j in range(i+1, len(s)+1):
                    if s[i:j] in wordDict:
                        for tail in dfs(j):
                            if tail:
                                dp[i].append(s[i:j] + " " + tail)
                            else:
                                dp[i].append(s[i:j] + tail)
            return dp[i]
        return dfs(0)

bottom up(反向), MLE

class Solution(object):
    def wordBreak(self, s, wordDict):
        """
        :type s: str
        :type wordDict: List[str]
        :rtype: List[str]
        """
        dp = collections.defaultdict(list)
        dp[len(s)].append("")
        for i in xrange(len(s)-1, -1, -1):
            for j in xrange(i+1, len(s)+1):
                if s[i:j] in wordDict:
                    for tail in dp[j]:
                        if tail:
                            dp[i].append(s[i:j] + " " + tail)
                        else:
                            dp[i].append(s[i:j] + tail)
        return dp[0]

290. word pattern I

描述

给一个pattern和str(空格隔开的一堆word), 判断是否同构.

例如pattern = "abba", str = "dog cat cat dog"就是同构的, a -> dog, b -> cat. pattern = "aaaa", str = "dog cat cat dog"因为match不了就不是同构.

思路

水题, 直接按顺序同步处理pattern和str, assign一堆id就行. 为了缩短代码长度默认取最晚出现的坐标

代码

class Solution(object):
    def wordPattern(self, pattern, str):
        """
        :type pattern: str
        :type str: str
        :rtype: bool
        """
        str = str.split(" ")
        p2i = {c:i for i,c in enumerate(pattern)}
        s2i = {c:i for i,c in enumerate(str)}
        return [p2i[c] for c in pattern] == [s2i[c] for c in str]

或者这样也行…更短, 就是复杂度高了.

class Solution(object):
    def wordPattern(self, pattern, str):
        """
        :type pattern: str
        :type str: str
        :rtype: bool
        """
        return [pattern.index(c) for c in pattern] == [str.split(" ").index(c) for c in str.split(" ")]

291. word pattern II

描述

和上一个题不同的一点是, str中不再含有空格. 我们需要自行决定在哪里分割开来匹配pattern串. 只能暴力搜索.
例如

pattern = "abab", str = "redblueredblue", return: true
pattern = pattern = "aaaa", str = "asdasdasdasd", return: true
pattern = "aabb", str = "xyzabcxzyabc", return false
pattern = "ab", str = "xx", return false (从这可以看出来pattern和str是严格的双向映射)

思路

今年1月面pony.ai的时候准备过这个题, 当时被难的稀里哗啦抄答案也抄的一知半解…不过最近做的多了而且做过736这种更复杂的token相关的题, 终于能快速ac了.

用一个字典p2s表示当前的pattern到str的映射, 另一个字典s2p表示逆映射(主要是为了对付ab和xx这种情况, x被用来和a匹配就不能再用来匹配b了).
如果当前pattern[i]存在于p2s中, 说明被映射过了, 现在只能用之前已经存入的值p2s[pattern[i]]. 如果这个值和目前str的开头匹配不上, 直接returnFalse. 否则分别移动pattern和str递归进行.
如果当前pattern[i]不存在于p2s中, 这是一个新的pattern字符, 没有被用过, 可以用来匹配任意的str字符串. 那么枚举一下可能的substring(并且str的substring同样不能被映射过), 并递归调用. 注意更新两个字典.

代码

class Solution(object):
    def wordPatternMatch(self, pattern, str):
        """
        :type pattern: str
        :type str: str
        :rtype: bool
        """
        def dfs(i, j, p2s, s2p):
            if i == len(pattern) and j == len(str):
                return True
            if i == len(pattern) or j == len(str):
                return False
            if pattern[i] in p2s:
                if not str[j:].startswith(p2s[pattern[i]]):
                    return False
                return dfs(i+1, j + len(p2s[pattern[i]]), p2s, s2p)
            else:
                for k in xrange(j + 1, len(str) + 1):
                    if str[j:k] not in s2p:
                        tmp_p2s = dict(p2s)
                        tmp_s2p = dict(s2p)
                        tmp_p2s[pattern[i]] = str[j:k]
                        tmp_s2p[str[j:k]] = pattern[i]
                        if dfs(i+1, k, tmp_p2s, tmp_s2p):
                            return True
            return False
        
        return dfs(0, 0, {}, {})

127. word ladder I

描述

给一个起始word和终止word, 还有一个梯子wordDict. 每次更新一个字符, 最少几步从起始更新到终止.

思路

建一个跳转表, 然后层序BFS就行…不过还是WA了两发… 因为我上面描述就是错的…它要返回的是最短序列, 也就是更新步数+1:)

代码

class Solution(object):
    def ladderLength(self, beginWord, endWord, wordList):
        """
        :type beginWord: str
        :type endWord: str
        :type wordList: List[str]
        :rtype: int
        """
        if endWord not in wordList:
            return 0
        d = collections.defaultdict(set)
        for w in wordList + [beginWord]:
            for i in xrange(len(w)):
                d[w[:i] + '*' + w[i+1:]].add(w)
        q = collections.deque()
        q.append(beginWord)
        seen = set()
        seen.add(beginWord)
        res = 0
        while q:
            size = len(q)
            for _ in xrange(size):
                cur = q.popleft()
                if cur == endWord:
                    return res + 1
                for i in xrange(len(cur)):
                    mask = cur[:i] + '*' + cur[i+1:]
                    if mask in d:
                        for nei in d[mask]:
                            if nei not in seen:
                                seen.add(nei)   # 注意一下这里,下一个题对应语句在不同的位置
                                q.append(nei)
            res += 1
        return 0

126. word ladder II

描述

和上一题一样, 这次要返回所有最短(变换)序列.

思路

依旧是BFS, 往队列里塞值的时候多加一个path. 然后因为路径(可能)有交叉, visited集合(我习惯用seen)不能在每次word入队时立刻更新, 否则会阻断未来的通路. 但是可以在pop出这个word时更新, 因为如果不更新, 下次再遇到本次这个word, 所产生的结果一定是更长的(invalid).
当第一次遇到endWord时, 当前的len(path)就是最短路径. 下次遇到更长的立刻break出去.

代码

class Solution(object):
    def findLadders(self, beginWord, endWord, wordList):
        """
        :type beginWord: str
        :type endWord: str
        :type wordList: List[str]
        :rtype: List[List[str]]
        """
        if endWord not in wordList:
            return []
        d = collections.defaultdict(set)
        for w in wordList + [beginWord]:
            for i in xrange(len(w)):
                d[w[:i] + '*' + w[i+1:]].add(w)
        q = collections.deque()
        q.append([beginWord, [beginWord]])
        seen = set()
        res = []
        length = float('inf')
        while q:
            cur, path = q.popleft()
            seen.add(cur)    # here
            if len(path) > length:
                return res
            if cur == endWord:
                length = len(path)
                res.append(path)
                continue
            for i in xrange(len(cur)):
                mask = cur[:i] + '*' + cur[i+1:]
                for nei in d[mask]:
                    if nei not in seen:
                        q.append([nei, path+[nei]])
        return res

79. word search I

描述

给一个字符棋盘, 和一个word, 检查它在不在这个棋盘里. 路径不能重叠.

思路

标准DFS. 注意当前正在使用的字符要mask掉, 如果本轮搜索失败, 返回时复原.

代码

class Solution(object):
    def exist(self, board, word):
        """
        :type board: List[List[str]]
        :type word: str
        :rtype: bool
        """
        def dfs(i, j, cur):
            if cur == len(word):
                return True
            for ni,nj in ((i+1,j), (i,j+1), (i-1,j), (i, j-1)):
                if 0<=ni<len(board) and 0<=nj<len(board[0]):
                    if word[cur] == board[ni][nj]:
                        board[ni][nj] = "#"
                        if dfs(ni, nj, cur + 1):
                            return True
                        board[ni][nj] = word[cur]
            return False
        
        for i in xrange(len(board)):
            for j in xrange(len(board[0])):
                if board[i][j] == word[0]:
                    board[i][j] = "#"
                    if dfs(i, j, 1):
                        return True
                    board[i][j] = word[0]
        return False

212. word search II

描述

和上一题不同的是, 这次要搜多个word. 然后返回存在于棋盘中的那些.

思路

还想上一个题那样逐一处理会TLE. 因为没有利用已经有的信息. 比如要找abcde和abc, 在找 abcde的过程中顺便就把abc找到了, 而不用退出来再找一遍abc.

考虑到这一点, 用trie来处理. 把所有word插入trie, 完成后, trie的最外层的节点都是不同word的入口点, 当然也有些word的入口点包含在trie的路径中, 比如abcde和bcd和cde.

代码

class Solution(object):
    def findWords(self, board, words):
        """
        :type board: List[List[str]]
        :type words: List[str]
        :rtype: List[str]
        """
        trie = {}
        for w in words:
            t = trie
            for c in w:
                if c not in t:
                    t[c] = {}
                t = t[c]
            t['#'] = '#'
        res = set()

        def dfs(i, j, path, level):
            if '#' in level:
                res.add(path) # 不可以停下来, 还要继续搜
            for ni,nj in ((i+1,j), (i-1,j), (i,j-1), (i,j+1)):
                if 0<=ni<len(board) and 0<=nj<len(board[0]) and board[ni][nj] in level:
                    tmp = board[ni][nj]
                    board[ni][nj] ="@"
                    dfs(ni, nj, path+tmp, level[tmp])
                    board[ni][nj] = tmp
                    
        for i in xrange(len(board)):
            for j in xrange(len(board[0])):
                if board[i][j] in trie:
                    tmp = board[i][j]
                    board[i][j] ="@"
                    dfs(i, j, tmp, trie[tmp])
                    board[i][j] = tmp
        return list(res)