Problem

You are given a binary string s and an integer k.

You are also given a 2D integer array queries, where queries[i] = [l_i, r_i].

A binary string satisfies the k-constraint if either of the following conditions holds:

  • The number of 0’s in the string is at most k.
  • The number of 1’s in the string is at most k.

Return an integer array answer, where answer[i] is the number of substrings of s[li..ri] that satisfy the k-constraint.

A substring is a contiguous non-empty sequence of characters within a string.

https://leetcode.cn/problems/count-substrings-that-satisfy-k-constraint-ii/

Example 1:

Input: s = "0001111", k = 2, queries = [[0,6]]
Output: [26]
Explanation:
For the query [0, 6], all substrings of s[0..6] = "0001111" satisfy the k-constraint except for the substrings s[0..5] = "000111" and s[0..6] = "0001111".

Example 2:

Input: s = "010101", k = 1, queries = [[0,5],[1,4],[2,3]]
Output: [15,9,3]
Explanation:
The substrings of s with a length greater than 3 do not satisfy the k-constraint.

Constraints:

  • 1 <= s.length <= 10^5
  • s[i] is either '0' or '1'.
  • 1 <= k <= s.length
  • 1 <= queries.length <= 10^5
  • queries[i] == [l_i, r_i]
  • 0 <= l_i <= r_i < s.length
  • All queries are distinct.

Test Cases

1
2
class Solution:
def countKConstraintSubstrings(self, s: str, k: int, queries: List[List[int]]) -> List[int]:
solution_test.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import pytest

from solution import Solution


@pytest.mark.parametrize('s, k, queries, expected', [
('0001111', 2, [[0,6]], [26]),
('010101', 1, [[0,5],[1,4],[2,3]], [15,9,3]),

('000', 1, [[0,0],[0,1],[0,2],[1,1],[1,2],[2,2]], [1,3,6,1,3,1])
])
class Test:
def test_solution(self, s, k, queries, expected):
sol = Solution()
assert sol.countKConstraintSubstrings(s, k, queries) == expected

Thoughts

3258. Count Substrings That Satisfy K-Constraint I 的进阶版。

s 长度为 nqueries 长度为 q

如果对 queries 中的每个值,直接用 problem 3258 的逻辑处理,时间复杂度为 O(q*n)

太慢了。多次查询时,大量计算是重复的,可缓存一些中间结果。

对于给定的查询边界 1lrn1\le l\le r\le n,k-子串(符合 k-constraint 的子串)总数是以 sl,rs_{l,r} 中每个位置开头且右端不会超出 r 的 k-子串数总和。

以 i 开头的子串,如果 si,j1s_{i,j-1} 符合 k-约束但 si,js_{i,j} 不符合,那么更长的子串也都不符合。

所以对于每一个 i,记录相应的 j,记为 jij_i。当 lirl\le i\le r 时,可以计数以 i 开头但右端不会超出 r 的 k-子串数量。

于是:

countl,r=i=lr(min(ji,r)i)count_{l,r}=\sum_{i=l}^r(min(j_i,r)-i)

辅助的空间复杂度 O(n),时间复杂度 O(n + q*t),其中 t 是 queries 的平均区间长度。

还是太慢,如果 t ≈ n,时间复杂度就还是 O(q*n)。而且对于两个重叠的 query 区间,重叠部分有些累加过程是重复的。

对于当前的 i、j,如果 si,js_{i,j} 符合 k-约束,在移动 j 之前算一下右端不超过 j 的 k-子串总数(记为 totaljtotal_j),显然等于右端不超过 j - 1 的 k-子串总数,再加上右端刚好是 j 的 k-子串数量。而后者等于 j - i + 1。于是有:

totalj=totalj1+(ji+1)total_j=total_{j-1}+(j-i+1)

对于区间 [l, r]totalrtotall1total_r-total_{l-1} 是右端点落在 [l, r] 内的 k-子串总数。

注意这并不是题目要求的区间 [l, r] 内的 k-子串总数,因为题目还要求左端点也在区间内。

对于左端点 l,以 l 开头的 k-子串最远不会到达 jlj_l。反过来看就是说所有右端能落在 jlj_l 的 k-子串,左端一定在 [l, r] 区间内。

把区间 [l, r] 分成两半:[l,jl)[l,j_l)[jl,r][j_l, r](需要小心 jlj_l 超过 r 的情况,此处略)。右端点落在右半段的 k-子串,其左端点一定在 [l, r] 内,总数为 totalrtotaljl1total_r-total_{j_l-1}。而左半段,显然其中所有的子串都符合 k-约束,共 (jll)(jll+1)/2(j_l-l)*(j_l-l+1)/2 个。

这样对于任意区间,可以用常数时间求出其中包含的 k-子串总数。那么整个问题的求解时间复杂度为 O(n+q),空间复杂度为 O(n)

Code

solution.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from typing import List


class Solution:
def countKConstraintSubstrings(self, s: str, k: int, queries: List[List[int]]) -> List[int]:
n = len(s)
lens: list[int] = list(range(n, 0, -1)) # lens[i]: the max length of k-substrings starting at i.
totals: list[int] = [0] * (n + 1) # totals[j]: the number of k-substrings ending before j.
zero = ord('0')
digits = [0, 0] # Number of '0's and '1's.
i, j = 0, -1
while True:
if digits[0] <= k or digits[1] <= k:
j += 1
if j > 0:
totals[j] = totals[j - 1] + j - i
if j >= n:
break
digits[ord(s[j]) - zero] += 1
else:
lens[i] = j - i
digits[ord(s[i]) - zero] -= 1
i += 1

result = [0] * len(queries)
for idx, (l, r) in enumerate(queries):
left = min(lens[l], r - l + 1)
cnt1 = left * (left + 1) >> 1
cnt2 = totals[r + 1] - totals[l + left]
result[idx] = cnt1 + cnt2

return result