49. Group Anagrams

Problem

Given an array of strings strs, group the anagrams together. You can return the answer in any order.

An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, using all the original letters exactly once.

https://leetcode.com/problems/group-anagrams/

Example 1:

Input: strs = ["eat","tea","tan","ate","nat","bat"]
Output: [["bat"],["nat","tan"],["ate","eat","tea"]]
Explanation:

There is no string in strs that can be rearranged to form "bat".

The strings "nat" and "tan" are anagrams as they can be rearranged to form each other.

The strings "ate", "eat", and "tea" are anagrams as they can be rearranged to form each other.

Example 2:

Input: strs = [""]
Output: [[""]]

Example 3:

Input: strs = ["a"]
Output: [["a"]]

Constraints:

1 <= strs.length <= 10^4
0 <= strs[i].length <= 100
strs[i] consists of lowercase English letters.

Test Cases

1 2	class Solution: def groupAnagrams(self, strs: List[str]) -> List[List[str]]:

solution_test.py

import pytest

from solution import Solution


def _sort(groups: list[list[int]]):
    for group in groups:
        group.sort()

    groups.sort()

@pytest.mark.parametrize('strs, expected', [
    (["eat","tea","tan","ate","nat","bat"], [["bat"],["nat","tan"],["ate","eat","tea"]]),
    ([''], [['']]),
    (['a'], [['a']]),
])
class Test:
    def test_solution(self, strs, expected):
        sol = Solution()
        result = sol.groupAnagrams(strs)
        _sort(result)
        _sort(expected)
        assert result == expected

Thoughts

每个 group 都有共同的「字母成分特征」，即包含哪几个字母、每个字母出现几次。

可以构造一个特征值，确保同组内的字符串计算得到的特征值一致，但不同组的不一致。利用该特征值做哈希表即可。

可以对某个字符串按字母排序，排序后的字符串作为特征值。

因为只有 26 个字母，可以统计字符串中每个字母的次数，拼成一个长度为 26 的不变元组，像 Python 就可以直接以不变元组作为哈希表的 key。

如果不借助 Python 这种特性，可以把不变元组格式化为字符串，或者用多层的哈希表，每层只对一个字母的数量做哈希散列。

整体时间复杂度为 O(n * k) 或者 O(n * k * log k)，其中 k 是每个字符串的平均长度。

Code

solution.py

from typing import List


class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        groups: list[list[int]] = []
        mapping: dict[str, int] = {} # sorted(s): groups index
        for s in strs:
            key = ''.join(sorted(s))
            idx = len(groups)
            if key not in mapping:
                mapping[key] = idx
                groups.append([])
            else:
                idx = mapping[key]

            groups[idx].append(s)

        return groups