In the world of data structures, there’s a tool called the Union-Find (Disjoint Set Union, DSU) that specializes in solving problems involving “set merging” and “element membership”. For example, if we want to know if “Xiaoming and Xiaohong are in the same circle of friends” or “whether two network nodes are connected”, Union-Find can efficiently solve these problems.
Basic Operations of Union-Find¶
Union-Find mainly has two core operations: find (to find the root node of an element’s set) and union (to merge two sets). To understand these operations, let’s start with a simple scenario:
We have 5 elements numbered 1 to 5. Initially, each element is its own “set” (like everyone being in their own circle of friends). We use an array parent to record each element’s “parent node” (the direct upper-level element). Initially, parent[i] = i (each element is its own parent).
1. find Operation: Finding the “Ancestor”¶
The goal of find(x) is to locate the “root node” (the “ancestor”) of the set that contains element x. For example, if we merge 1 and 2, then parent[2] = 1. At this point, find(2) will return 1 (since 2’s parent is 1, and 1’s parent is itself, making 1 the root).
The basic logic of the find function is:
def find(x):
while parent[x] != x: # If x is not its own parent, keep moving up
x = parent[x]
return x # Return the root node
2. union Operation: Merging Sets¶
The goal of union(x, y) is to merge the two sets containing x and y. For example, to merge the sets containing 1 and 2, we simply make the root of one set point to the root of the other.
The basic logic of the union function is:
def union(x, y):
root_x = find(x)
root_y = find(y)
if root_x != root_y:
parent[root_y] = root_x # Make y's root point to x's root
Problem with Basic Union-Find: Slow Lookups!¶
While the basic Union-Find works, it can be inefficient in some scenarios. For example, after multiple merges, the set may form a “long chain” structure:
Suppose we merge 1-2, 2-3, 3-4, 4-5 in sequence. The parent array will then be:
parent[1]=1, parent[2]=1, parent[3]=2, parent[4]=3, parent[5]=4.
When we call find(5), we need to traverse 5→4→3→2→1 to reach the root. If such a “long chain” is very long (e.g., merging 1000 elements), each find operation could take 1000 steps, leading to poor efficiency.
Path Compression: Speeding Up Lookups¶
To solve the long chain problem, Path Compression was developed. Its core idea is: during each find operation, make every node on the path directly point to the root. This “flattens” the path, making future lookups almost “instant”.
Principle of Path Compression¶
Imagine climbing a mountain with many steps to the summit (root). Path compression is like being pulled directly to the summit whenever you encounter a step, so you never need to walk those steps again.
In implementation, during the find function, after recursively finding the root, we set the parent of all nodes on the path directly to the root.
Path-Compressed find Implementation¶
Here’s the recursive version of find with path compression:
def find(x):
if parent[x] != x: # If x is not the root
# Recursively find the root and set x's parent directly to the root
parent[x] = find(parent[x])
return parent[x]
Example:
Suppose we have a long chain 5→4→3→2→1 (i.e., parent[5]=4, parent[4]=3, parent[3]=2, parent[2]=1, parent[1]=1).
-
First
find(5):
Recursively callsfind(4)→find(3)→find(2)→find(1)(root, returns 1).
Thenparent[2],parent[3],parent[4], andparent[5]are all set to 1.
Now theparentarray becomes:parent[5]=1,parent[4]=1,parent[3]=1,parent[2]=1,parent[1]=1. -
Second
find(5):
Directly returnsparent[5]=1(no traversal needed), reducing path length from 5 steps to 1 step!
Iterative Path Compression (More Intuitive)¶
For easier understanding, here’s an iterative version of path compression:
def find(x):
# Step 1: Find the root node
root = x
while parent[root] != root:
root = parent[root]
# Step 2: Compress the path by setting all nodes to point directly to the root
while parent[x] != root:
next_parent = parent[x] # Save current parent
parent[x] = root # Point directly to the root
x = next_parent # Move to the next node
return root
This iterative version first locates the root, then traverses the path from the original node to the root, setting each node’s parent directly to the root.
Effect of Path Compression: Almost O(1) Lookup¶
Path compression optimizes the time complexity of find from O(h) (where h is the tree height) in the basic version to nearly O(1). Even for sets with 1 million elements, a single find operation takes negligible time.
This is because path compression flattens the tree structure, eventually making every node point directly to the root (forming a “short and wide” tree).
Summary¶
Path compression is the “speed magic” of Union-Find. By making all nodes point directly to the root during lookup, it compresses long paths into “instant” lookups. Combined with other optimizations like “union by rank”, Union-Find achieves nearly constant-time complexity for both union and find operations, making it the core technique for solving “set problems”.
Next time you need to quickly determine element membership, remember to use Union-Find with path compression to make your code run efficiently!