Problem

Given two strings str1 and str2, return the shortest string that has both str1 and str2 as subsequences. If there are multiple valid strings, return any of them.

A string s is a subsequence of string t if deleting some number of characters from t (possibly 0) results in the string s.

https://leetcode.com/problems/shortest-common-supersequence/

Example 1:

Input: str1 = "abac", str2 = "cab"
Output: "cabac"
Explanation:
str1 = "abac" is a subsequence of "cabac" because we can delete the first "c".
str2 = "cab" is a subsequence of "cabac" because we can delete the last "ac".
The answer provided is the shortest such string that satisfies these properties.

Example 2:

Input: str1 = "aaaaaaaa", str2 = "aaaaaaaa"
Output: "aaaaaaaa"

Constraints:

  • 1 <= str1.length, str2.length <= 1000
  • str1 and str2 consist of lowercase English letters.

Test Cases

1
2
class Solution:
def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
solution_test.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import pytest

from solution import Solution


def verify(res: str, str1: str, str2: str, expected: str) -> None:
assert len(res) == len(expected)

m = len(str1)
n = len(str2)
i = j = 0
for c in res:
if i < m and str1[i] == c:
i += 1
if j < n and str2[j] == c:
j += 1

assert i == m
assert j == n


@pytest.mark.parametrize('str1, str2, expected', [
("abac", "cab", "cabac"),
("aaaaaaaa", "aaaaaaaa", "aaaaaaaa"),
])
@pytest.mark.parametrize('sol', [Solution()])
def test_solution(sol, str1, str2, expected):
res = sol.shortestCommonSupersequence(str1, str2)
if res != expected:
verify(res, str1, str2, expected)

Thoughts

1143. Longest Common Subsequence 本质上是同一个问题。想要公共超序列最短,就是要让公共子序列最长。设 str1 和 str2 的长度分别为 m 和 n,如果最长的公共子序列长度为 k,那么最短的公共超序列长度即为 k + (m - k) + (n - k) = m + n - k

先照搬 1143. Longest Common Subsequence 的代码,在最后会得到完整的 lcs(i, j) 表格。从 lcs(m, n) 开始就可以找出构造 LCS 的方式,而在 LCS 中把那些不相等的字符加上,就能得到最短公共超序列(SCS)。

对于 (i, j),如果 lcs(i, j) == lcs(i-1, j) 则说明 str1[i-1] 不属于 LCS,将其添加到 SCS,并继续看 (i-1, j);否则如果 lcs(i, j) == lcs(i, j-1) 则说明 str2[j-1] 不属于 LCS,将其添加到 SCS,并继续看 (i, j-1);否则(必然有 lcs(i, j) == lcs(i-1, j-1) + 1)说明 str1[i-1] == str2[j-1] 属于 LCS,将其加入 LCS 和 SCS,并继续看 (i-1, j-1)

str1 = "abac"str2 = "cab" 为例,根据 lcs(i, j) 表格求出 LCS 和 SCS 的过程如下图示:

按上边提到的规则,找到从 (m, n)(0, 0) 的路径。对于除了终点的任何一个格子,向上走的取所在行对应的字符(取自 str1),向左走的取所在列对应的字符(取自 str2),斜向左上走的说明所在行和列的字符相同。

空间复杂度 O(m * n),时间复杂度 O(m * n)

Code

solution.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
class Solution:
def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
m = len(str1)
n = len(str2)
lcs = [[0] * (n + 1) for _ in range(m + 1)]
for i, c1 in enumerate(str1, 1):
for j, c2 in enumerate(str2, 1):
if c1 == c2:
lcs[i][j] = 1 + lcs[i-1][j-1]
else:
lcs[i][j] = max(lcs[i][j-1], lcs[i-1][j])

res: list[str] = []
i, j = m, n
while i > 0 and j > 0:
if lcs[i][j] == lcs[i-1][j]:
res.append(str1[i-1])
i -= 1
elif lcs[i][j] == lcs[i][j-1]:
res.append(str2[j-1])
j -= 1
else:
res.append(str1[i-1])
i -= 1
j -= 1

while i > 0:
res.append(str1[i-1])
i -= 1

while j > 0:
res.append(str2[j-1])
j -= 1

return ''.join(reversed(res))