[Leetcode] Short Encoding of Words(Medium)

820 - Short Encoding of Words

A valid encoding of an array of words is any reference string s and array of indices indices such that:

  • words.length == indices.length
  • The reference string s ends with the '#' character.
  • For each index indices[i], the substring of s starting from indices[i] and up to (but not including) the next '#' character is equal to words[i].

Given an array of words, return the length of the shortest reference string s possible of any valid encoding of words.

example:

Input: words = ["time", "me", "bell"]
Output: 10
Explanation: A valid encoding would be s = "time#bell#" and indices = [0, 2, 5].
words[0] = "time", the substring of s starting from indices[0] = 0 to the next '#' is underlined in "time#bell#"
words[1] = "me", the substring of s starting from indices[1] = 2 to the next '#' is underlined in "time#bell#"
words[2] = "bell", the substring of s starting from indices[2] = 5 to the next '#' is underlined in "time#bell#"
Input: words = ["t"]
Output: 2
Explanation: A valid encoding would be s = "t#" and indices = [0].

How can we solve this problem?

這題看上去好像很難理解它要我們解決什麼問題。但是我們可以從例子中可以觀察到他就只是單純用#來分隔每個子串。例如:time,me,bell會以這樣的方式進行Encoding,time#me#bell。但是問題要找出我們最短的是Encoding 長度為多少,也就是有重疊的Word我們無需額外加入到子串中,就剛才的例子我們可以看得出來timeme都包含了me。所以,我們Encoding會從time#me#bell縮短為time#bell#

哪我們要怎麼知道是否不用額外加入到字串中呢?

題目中有給出幾個條件(不符合條件需插入至String結尾):

  • 對於任何words[j],從string中的indices[i]#前,與words[j]相同,也就是與words[j]是string的後綴(suffix)。例如:timeme, metime的後綴。
  • words[i]的長度等於indices的長度。也就是words[i]-indices會等於words[j]。例如words[i]為4,indices為1,words[j]為2。4-1 != 2,也就是代表words[j]不是words[i]的後綴。

Solution:

我們可以透過 Tire Tree 來解決這個問題。因為我們要知道words有沒有相同後綴(suffix),所以建構Tree的時候,需要放過來插入。直到 leaf(葉子) 就會知道當前有相同後綴(suffix)的長度為多少,最後加總加一起,就是我們要的結果。

class TrieNode{
public:
    TrieNode(){
        node = vector<TrieNode*>(26);
    }
    vector<TrieNode*> node; 
};

    
int AddWord(TrieNode* root,string& w){
    int res = -1; //if some words have the same suffix,it will return 0 ,-1+'#' = 0
    for(auto i = w.rbegin();i<w.rend();i++){
        if(!root->node[*i - 'a']){
            root->node[*i - 'a'] = new TrieNode();
            res = w.length();
        }
        root = root->node[*i - 'a'];
    }
    return res + 1; //extra 
}


class Solution {
public:
    int minimumLengthEncoding(vector<string>& words) {
        int res = 0;
        sort(words.begin(),words.end(),[&](string& a, string& b){
           return a.length() > b.length();
        });
        TrieNode* root = new TrieNode();
        for(int i = 0;i<words.size();i++)
            res += AddWord(root,words[i]);
        
        // findShortWords(0,root,res);
        return res;
    }
};

TireTree

另外一種解法是我們只需要知道哪些words[j]words[i]有共同的suffix且符合條件。

  • 過濾掉重複的words,並生成unique的list
  • 大到小排序(根據字串的長度),因為我們要知道words[i]是否包含了words[j]。如果words[i].length() < words[j].length()就一定不會包含。
  • 計算長度:
    • 看看words[j]有沒有符合條件
    • words[j]可能在words[i]不只出現過一次。所以我們需要找出所有可能。直到找出符合條件的或者沒有找到符合條件的。例如:timt(i),t(j), ji出現了2次,而且有一個是符號條件的,也就是indices為3的t
class Solution {
public:
    int minimumLengthEncoding(vector<string>& words){
        set<string> unique(words.begin(),words.end()); //O(n)
        vector<string> list;
        for(auto word : unique) list.push_back(word); //O(n)
        
        //sorting by word size
        //O n log n
        sort(list.begin(),list.end(),[&](string& a,string& b){
            return a.length() > b.length();
        });
        vector<int> v(list.size(),0);
        int res = 0;

        //O n*n-1 = n^2
        for(int i = 0;i<list.size();i++){
            if(v[i]) continue;
            for(int j = i+1;j<list.size();j++){
                int index = list[i].find(list[j]);
                if(index == -1) continue; //not found,go to next one
                if(index != -1 && list[i].length() == index + list[j].length()){
                  v[j] = 1;
                }else{
                    while((index = list[i].find(list[j],index)) != -1){ 
                        if(list[i].length() == index + list[j].length()) {
                            v[j] = 1;
                            break;
                        }
                        index ++;
                    }
                }
            }
            res += list[i].length() + 1; //adding extra #
        }
    }
};