433 Minimum Genetic Mutation

1. Question

A gene string can be represented by an 8-character long string, with choices from"A","C","G","T".

Suppose we need to investigate about a mutation (mutation from "start" to "end"), where ONE mutation is defined as ONE single character changed in the gene string.

For example,"AACCGGTT"->"AACCGGTA"is 1 mutation.

Also, there is a given gene "bank", which records all the valid gene mutations. A gene must be in the bank to make it a valid gene string.

Now, given 3 things - start, end, bank, your task is to determine what is the minimum number of mutations needed to mutate from "start" to "end". If there is no such a mutation, return -1.

Note:

  1. Starting point is assumed to be valid, so it might not be included in the bank.

  2. If multiple mutations are needed, all mutations during in the sequence must be valid.

  3. You may assume start and end string is not the same.

Example 1:

start: "AACCGGTT"
end:   "AACCGGTA"
bank: ["AACCGGTA"]

return: 1

Example 2:

start: "AACCGGTT"
end:   "AAACGGTA"
bank: ["AACCGGTA", "AACCGCTA", "AAACGGTA"]

return: 2

Example 3:

start: "AAAAACCC"
end:   "AACCCCCC"
bank: ["AAAACCCC", "AAACCCCC", "AACCCCCC"]

return: 3

2. Implementation

(1) BFS

class Solution {
    public int minMutation(String start, String end, String[] bank) {
        Set<String> sequences = new HashSet<>();

        char[] code = {'A', 'C', 'G', 'T'};

        for (String seq : bank) {
            sequences.add(seq);
        }

        int level = 0;

        Set<String> visited = new HashSet<>();
        Queue<String> queue = new LinkedList<>();
        queue.add(start);

        while (!queue.isEmpty()) {
            int size = queue.size();

            for (int i = 0; i < size; i++) {
                String curGene = queue.remove();

                if (curGene.equals(end)) {
                    return level;
                }

                char[] letters = curGene.toCharArray();

                for (int j = 0; j < letters.length; j++) {
                    char oldC = letters[j];
                    for (char c : code) {
                        letters[j] = c;

                        String nextGene = new String(letters);

                        if (sequences.contains(nextGene) && !visited.contains(nextGene)) {
                            visited.add(nextGene);
                            queue.add(nextGene);
                        }
                    }
                    letters[j] = oldC;
                }
            }
            ++level;
        }
        return -1;
    }
}

(2) Bi-directional BFS

class Solution {
    public int minMutation(String start, String end, String[] bank) {
        if (start.length() != end.length()) {
            return -1;
        }

        Set<String> sequences = new HashSet<>();

        char[] code = {'A', 'C', 'G', 'T'};

        for (String seq : bank) {
            sequences.add(seq);
        }

        if (!sequences.contains(end)) {
            return -1;
        }

        int level = 0;
        Set<String> visited = new HashSet<>();
        Set<String> beginSet = new HashSet<>();
        Set<String> endSet = new HashSet<>();

        beginSet.add(start);
        endSet.add(end);

        Set<String> temp = null;

        while (!beginSet.isEmpty() && !endSet.isEmpty()) {
            System.out.println("beginSet size: " + beginSet.size() + " endSet size: " + endSet.size());
            if (beginSet.size() >= endSet.size()) {
                temp = beginSet;
                beginSet = endSet;
                endSet = temp;
            }

            temp = new HashSet<>();

            for (String gene : beginSet) {
                char[] letters = gene.toCharArray();

                for (int i = 0; i < letters.length; i++) {
                    char oldC = letters[i];

                    for (char c : code) {
                        if (c == oldC) continue;
                        letters[i] = c;

                        String nextGene = new String(letters);

                        if (endSet.contains(nextGene)) {
                            return level + 1;
                        }

                        if (sequences.contains(nextGene) && !visited.contains(nextGene)) {
                            visited.add(nextGene);
                            temp.add(nextGene);
                        }
                    }
                    letters[i] = oldC;
                }
            }
            beginSet = temp;
            ++level;
        }
        return -1;
    }
}

3. Time & Space Complexity

BFS: 时间复杂度O(4* L * m) => O(mL), L为start的长度,m为bank的size, 空间复杂度O(m)

Last updated