Leetcode Study Day 46

2023-11-12

Minimum Genetic Mutation

A gene string can be represented by an 8-character long string, with choices from ‘A’, ‘C’, ‘G’, and ‘T’.

Suppose we need to investigate a mutation from a gene string startGene to a gene string endGene where one mutation is defined as one single character changed in the gene string.

For example, “AACCGGTT” –> “AACCGGTA” is one mutation.
There is also a gene bank bank that records all the valid gene mutations. A gene must be in bank to make it a valid gene string.

Given the two gene strings startGene and endGene and the gene bank bank, return the minimum number of mutations needed to mutate from startGene to endGene. If there is no such a mutation, return -1.

Note that the starting point is assumed to be valid, so it might not be included in the bank.

Example 1:

Input: startGene = "AACCGGTT", endGene = "AACCGGTA", bank = ["AACCGGTA"]
Output: 1
Example 2:

Input: startGene = "AACCGGTT", endGene = "AAACGGTA", bank = ["AACCGGTA","AACCGCTA","AAACGGTA"]
Output: 2
 

Constraints:

0 <= bank.length <= 10
startGene.length == endGene.length == bank[i].length == 8
startGene, endGene, and bank[i] consist of only the characters ['A', 'C', 'G', 'T'].

Solution

I asked ChatGPT for get some hints and pseudo codes, in general, his idea is to use BFS to find the shortest path. He stored the bank gene into a set. Then use BFS to store and pop the mutated gene.

Here is his pseudo code:

This problem can be approached as a graph traversal problem, where each gene string represents a node and a mutation is an edge between two nodes. The goal is to find the shortest path from the startGene to the endGene. Here's a step-by-step approach using Breadth-First Search (BFS), which is well-suited for finding the shortest path in unweighted graphs:

Initialize: Create a queue for BFS and a set to keep track of visited nodes. Add the startGene to the queue.

BFS Loop:

While the queue is not empty, remove the first gene from the queue.
Check if this gene is the endGene. If yes, return the number of mutations (depth) required to reach this gene.
Otherwise, iterate over each character in the gene. For each character, try changing it to the other three possible nucleotides ('A', 'C', 'G', 'T').
For each modified gene, check if it is in the bank and not yet visited.
If so, add it to the queue and mark it as visited.
Return -1: If the queue gets empty and the endGene is not found, return -1, indicating no possible mutation path.

Here’s a pseudo-code to illustrate this:

pseudo
function minGeneticMutation(startGene, endGene, bank):
    if endGene not in bank:
        return -1

    bankSet = set(bank)
    visited = set()
    queue = [(startGene, 0)]  # (current gene, mutation steps)

    while queue is not empty:
        currentGene, steps = queue.pop(0)
        
        if currentGene == endGene:
            return steps

        for i in 0 to length(currentGene) - 1:
            for nucleotide in ['A', 'C', 'G', 'T']:
                if nucleotide != currentGene[i]:
                    mutatedGene = currentGene[:i] + nucleotide + currentGene[i+1:]
                    if mutatedGene in bankSet and mutatedGene not in visited:
                        queue.append((mutatedGene, steps + 1))
                        visited.add(mutatedGene)

    return -1

Full code

Here is my code based on the pseudo code above, one thing I learnt is that we can use substr to split the string:

class Solution {
public:
    int minMutation(string startGene, string endGene, vector<string>& bank) {
        vector <char> geneChar = {'A','C','G','T'};
        if(find(bank.begin(), bank.end(), endGene) == bank.end())
            return -1;
        set<string> bankSet;
        for (int i = 0; i < bank.size(); i++)
            bankSet.insert(bank[i]);
        set<string> visited;
        queue <pair<string, int>> bankQ;
        bankQ.push({startGene,0});
        while (!bankQ.empty()){
            string current = bankQ.front().first;
            int step = bankQ.front().second;
            bankQ.pop();
            if (current == endGene){
                return step;
            }
            for(int i = 0; i < current.size(); i++){
                for(char singleGene: geneChar){
                    string mutatedGene = current.substr(0, i) + singleGene + current.substr(i + 1, current.size());
                    if (bankSet.find(mutatedGene)!= bankSet.end() && visited.find(mutatedGene) == visited.end()){
                        bankQ.push({mutatedGene, step+1});
                        visited.insert(mutatedGene);
                    }
                }
            }
        }
        return -1;
    }
};

Word Ladder

A transformation sequence from word beginWord to word endWord using a dictionary wordList is a sequence of words beginWord -> s1 -> s2 -> … -> sk such that:

Every adjacent pair of words differs by a single letter.
Every si for 1 <= i <= k is in wordList. Note that beginWord does not need to be in wordList.
sk == endWord
Given two words, beginWord and endWord, and a dictionary wordList, return the number of words in the shortest transformation sequence from beginWord to endWord, or 0 if no such sequence exists.

Example 1:

Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]
Output: 5
Explanation: One shortest transformation sequence is "hit" -> "hot" -> "dot" -> "dog" -> cog", which is 5 words long.
Example 2:

Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]
Output: 0
Explanation: The endWord "cog" is not in wordList, therefore there is no valid transformation sequence.
 

Constraints:

1 <= beginWord.length <= 10
endWord.length == beginWord.length
1 <= wordList.length <= 5000
wordList[i].length == beginWord.length
beginWord, endWord, and wordList[i] consist of lowercase English letters.
beginWord != endWord
All the words in wordList are unique.

My solution

The main idea is similar to the previous one, which is to use BFS to find the shortest path. The only difference is that this time we need to find the words with only one word difference between the original one, whereas the previous one is to try to replace each character with the four characters.

Here is my code:

class Solution {
public:
    int difference(string current, string word){
        int diff = 0;
        for (int i = 0; i < current.size(); i++){
            if (current[i] != word[i])
                diff ++;
            if (diff > 1)
                break;
        }
        return diff;
    }

    int ladderLength(string beginWord, string endWord, vector<string>& wordList) {
        if(find(wordList.begin(), wordList.end(), endWord) == wordList.end())
            return 0;
        unordered_set<string> wordSet;
        for (int i = 0; i < wordList.size(); i++)
            wordSet.insert(wordList[i]);
        unordered_set<string> visited;
        queue <pair<string, int>> wordQ;
        wordQ.push({beginWord,1});
        while (!wordQ.empty()){
            string current = wordQ.front().first;
            int step = wordQ.front().second;
            wordQ.pop();
            if (current == endWord){
                return step;
            }
            for (string word: wordSet){
                if (difference(current, word) == 1 && visited.find(word) == visited.end()){
                    wordQ.push({word,step+1});
                    visited.insert(word);
                }
            }
        }
        return 0;
    }
};