Imagine you are the class monitor of a large class, where many students know each other (are friends). You need to quickly determine which students are connected through friendships (e.g., A knows B, B knows C, so A, B, C are in the same group) and whether two students know each other (even if indirectly, through a mutual friend). What data structure efficiently solves this? The answer is the Union-Find (Disjoint Set Union, DSU).
一、What is Union-Find?¶
Union-Find is a data structure for efficiently managing element grouping. Its core operations are:
1. Union: Merge two distinct groups (sets) into one.
2. Find: Determine if two elements belong to the same group.
The name “Union-Find” directly reflects its functions: “Union” for merging, “Find” for querying.
二、Basic Principles of Union-Find¶
Union-Find uses an array parent to represent each element’s “parent node”. Each group can be visualized as a tree, where the root node (a node pointing to itself) is the group’s representative.
- Initialization: Each element is its own group, so parent[i] = i (assuming elements are numbered).
- Find Operation: Locate the root of an element (its group representative).
- Union Operation: Merge two groups by making one group’s root point to the other group’s root.
Example: Managing Friendships with Union-Find¶
Suppose we have 5 students: Xiaoming (0), Xiaohong (1), Xiaogang (2), Xiaoli (3), Xiaofang (4). Initially, each is their own group (parent = [0,1,2,3,4]).
Step 1: Xiaoming and Xiaohong are friends
- Xiaoming’s root is 0, Xiaohong’s root is 1.
- Merge: Set Xiaohong’s parent to 0 (parent[1] = 0). Now Xiaohong and Xiaoming share root 0.
Step 2: Xiaogang and Xiaohong are friends
- Xiaogang’s root is 2, Xiaohong’s root is 0 (since parent[1] = 0).
- Merge: Set Xiaogang’s parent to 0 (parent[2] = 0). Now Xiaogang, Xiaoming, Xiaohong share root 0.
Step 3: Xiaoli and Xiaoming are friends
- Xiaoli’s root is 3, Xiaoming’s root is 0.
- Merge: Set Xiaoli’s parent to 0 (parent[3] = 0). Now Xiaoli, Xiaoming, Xiaohong, Xiaogang share root 0.
Query: Is Xiaofang (4) friends with Xiaoming?
- Xiaofang’s root is 4, Xiaoming’s root is 0. Different roots → No.
三、Optimizations: Faster Union-Find¶
Naive Union-Find can slow down with repeated operations (e.g., trees degenerate into linked lists). Path Compression and Union by Rank are critical optimizations.
1. Path Compression: Shorten the Path¶
During Find, make all nodes along the path directly point to the root.
- Example: Original path A → B → C → Root becomes A → Root, B → Root, C → Root after compression.
- Analogy: Like avoiding detours by taking shortcuts, so future queries are faster.
2. Union by Rank: Attach Smaller Trees to Larger Trees¶
When merging two groups, attach the root of the smaller tree to the root of the larger tree.
- Analogy: Merge a small group into a large group to avoid “bending” the large tree.
四、Core Code (Simplified Version)¶
Here’s a Python implementation with path compression and union by rank:
class UnionFind:
def __init__(self, size):
self.parent = list(range(size)) # Parent array (root nodes initially point to themselves)
self.rank = [0] * size # Rank array (stores tree heights)
def find(self, x):
# Path compression: recursively update parent to root
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
# Union by rank: merge two groups
root_x = self.find(x)
root_y = self.find(y)
if root_x == root_y: # Already in the same group
return
# Attach smaller rank tree to larger rank tree
if self.rank[root_x] < self.rank[root_y]:
self.parent[root_x] = root_y
else:
self.parent[root_y] = root_x
if self.rank[root_x] == self.rank[root_y]:
self.rank[root_x] += 1 # Increment rank if equal
五、Applications¶
Union-Find solves grouping problems efficiently:
- Network Connectivity: Check if two devices are in the same network.
- Family Relations: Determine if two people share ancestry.
- Minimum Spanning Tree: In Kruskal’s algorithm, detect cycles when adding edges.
- Mathematical Equivalence: E.g., check if two numbers are equivalent under modular arithmetic.
六、Summary¶
Union-Find uses the parent array to manage groups, with find and union as core operations. Path compression and union by rank optimize it to near-constant time complexity per operation. It acts as an “efficient group manager,” making it indispensable for grouping problems.
When facing “grouping” challenges (e.g., checking if elements belong to the same set), Union-Find is a simple yet powerful tool!