Hash Functions: How Do Hash Functions Generate Hash Values? A Must-Know for Beginners

A hash function is a "translator" that converts input of arbitrary length into a fixed-length hash value, which serves as the "ID number" of the data. Its core characteristics include: fixed length (e.g., MD5 produces 32 hexadecimal characters), one-way irreversibility (original data cannot be derived from the hash value), near-uniqueness (extremely low collision probability), and the avalanche effect (minor input changes lead to drastic hash value changes). The generation process consists of three steps: input preprocessing into binary, segmented mathematical operations, and merging the results. Unlike encryption functions, hash functions are one-way and do not require a key, while encryption is reversible and requires a key. They have extensive applications: file verification (comparing hash values to prevent tampering), password storage (storing hash values for security), data indexing, and data distribution in distributed systems. As a data fingerprint, the key characteristics of hash functions make them indispensable in security and verification.

Read More
Insertion Sort: How Does Insertion Sort Work? A Comparison with Bubble Sort
2025-12-20 10 views Data Structure

This article introduces the fundamental importance of sorting and focuses on two simple sorting algorithms: Insertion Sort and Bubble Sort. Insertion Sort works by gradually building a sorted sequence. Starting from the second element, each element is inserted into its correct position within the already sorted portion (similar to arranging playing cards). Its average time complexity is O(n²), with a best-case complexity of O(n) when the array is already sorted. It has a space complexity of O(1) and is stable, making it suitable for small-scale or nearly sorted data. Bubble Sort, on the other hand, compares adjacent elements and "bubbles" larger elements to the end (like bubbles rising to the surface), determining the position of the largest element in each pass. It also has an average time complexity of O(n²) and a space complexity of O(1), but it is stable and involves more element movements, making it less commonly used in practical applications. Both algorithms have an O(n²) complexity, with Insertion Sort being more efficient, especially when data is nearly sorted. Understanding these algorithms is foundational for learning more complex sorting techniques.

Read More
Binary Search: Applicable Scenarios and Learning Guide for Beginners

This article introduces the binary search algorithm, whose core is to compare the middle element in an ordered array to gradually narrow down the search range and quickly locate the target. It is suitable for scenarios with ordered data, large data volumes, static (rarely modified) content, and the need for rapid search, such as dictionaries or configuration files. The search process uses left and right pointers to determine the middle value mid. Depending on the size of the target relative to the middle value, the pointers are adjusted: if the middle value equals the target, the search is successful; if the target is larger, left is moved right; if smaller, right is moved left, until the target is found or the range is invalid. The core code of the Python iterative implementation uses a loop with left <= right, calculates mid = (left + right) // 2, and handles boundaries to return -1 when the array is empty or the target does not exist. The time complexity is O(log n) (since the range is halved each time), and the space complexity is O(1) (using only constant variables). Key details include expanding the traversal when handling duplicate elements, directly judging single-element arrays, and returning -1 if the target is not found. The "divide and conquer" (reduction and governance) idea of binary search efficiently solves the problem of fast searching in ordered large datasets, making it an important tool in basic algorithms.

Read More
Adjacency Matrix: Another Representation Method for Graphs and a Comparison of Advantages and Disadvantages

An adjacency matrix is a fundamental representation of a graph, essentially an n×n two-dimensional array where rows and columns correspond to the vertices of the graph, and element values indicate the existence or weight of edges between vertices. In an undirected graph, the value 1 represents the presence of an edge, and 0 represents its absence; in a weighted graph, the actual weight value is directly stored. Its advantages include: first, checking the existence of an edge takes only O(1) time, and calculating vertex degrees is efficient (for undirected graphs, it is the sum of a row, while for directed graphs, rows and columns correspond to out-degrees and in-degrees respectively); second, it is suitable for dense graphs (with edge counts close to n²), has high space utilization, and is simple to implement, making it easy for beginners to understand. Disadvantages include: a space complexity of O(n²), which wastes significant space for sparse graphs; traversing adjacent vertices requires O(n) time, making it less efficient than adjacency lists; and insufficient flexibility for dynamically adjusting the number of edges. In summary, the adjacency matrix trades space for time. It is suitable for dense graphs or scenarios requiring frequent edge queries or degree calculations, but unsuitable for sparse graphs or scenarios requiring frequent traversal of adjacent vertices. It serves as a foundational tool for understanding graph structures.

Read More
Union-Find: What is Union-Find? A Method to Solve "Friendship" Problems

Union-Find (Disjoint Set Union, DSU) is an efficient data structure for managing element groups, primarily solving the "Union" (merge groups) and "Find" (check if elements belong to the same group) problems. It is ideal for scenarios requiring quick determination of element membership in a set. At its core, it uses a parent array to maintain parent-child relationships, where each group is represented as a tree with the root node as the group identifier. Initially, each element forms its own group. Key optimizations include **path compression** (shortening the path during Find to make nodes directly point to the root) and **union by rank** (attaching smaller trees to larger trees to prevent the tree from degrading into a linked list), ensuring nearly constant time complexity for operations. The core methods `find` (finds the root and compresses the path) and `union` (merges two groups by attaching the root of the smaller tree to the root of the larger tree) enable efficient group management. Widely applied in network connectivity checks, family relationship queries, minimum spanning trees (via Kruskal's algorithm), and equivalence class problems, Union-Find is a concise and powerful tool for handling grouping scenarios.

Read More
Prefix Sum: How to Quickly Calculate Interval Sum Using Prefix Sum Array?

The prefix sum array is an auxiliary array used to quickly calculate the sum of intervals. It is defined as follows: for the original array A, the prefix sum array S has S[0] = 0, and for k ≥ 1, S[k] is the sum of elements from A[1] to A[k], i.e., S[k] = S[k-1] + A[k]. For example, if the original array A = [1, 2, 3, 4, 5], its prefix sum array S = [0, 1, 3, 6, 10, 15]. The core formula for calculating the sum of an interval is: the sum of elements from the i-th to the j-th element of the original array is S[j] - S[i-1]. For example, to calculate the sum of A[2] to A[4], we use S[4] - S[1] = 10 - 1 = 9, which gives the correct result. The advantages include: preprocessing the S array takes O(n) time, and each interval sum query only takes O(1) time, resulting in an overall complexity of O(n + q) (where q is the number of queries), which is much faster than the O(qn) complexity of direct calculation. It should be noted that index alignment (e.g., adjusting the formula if the original array starts from 0), interval validity, and the space-for-time tradeoff are important considerations. The prefix sum array is implemented through "pre-accumulation".

Read More
Dynamic Programming: An Introduction to Dynamic Programming and Efficient Solutions for the Fibonacci Sequence

The Fibonacci sequence is defined as f(0) = 0, f(1) = 1, and for n > 1, f(n) = f(n-1) + f(n-2). When calculated directly with recursion, the time complexity is O(2^n) due to excessive repeated computations, resulting in extremely low efficiency. Dynamic programming optimizes this by trading space for time: 1. Memoization recursion: Using a memoization array to store already computed results, each subproblem is solved only once, leading to both time and space complexities of O(n). 2. Iterative method: Using only two variables for iterative computation, with time complexity O(n) and space complexity O(1), which is the optimal solution. The core characteristics of dynamic programming are overlapping subproblems (subproblems reappearing) and optimal substructure (the current solution depends on the solutions of subproblems). Its essence is to avoid redundant calculations by storing subproblem results. The Fibonacci sequence is a classic introductory case, and mastering it can be generalized to similar problems such as climbing stairs.

Read More
Balanced Binary Trees: Why Balance Is Needed and A Simple Explanation of Rotation Operations

Binary Search Trees (BST) may degenerate into linked lists due to extreme insertions, causing operation complexity to rise to O(n). Balanced binary trees control balance through the **balance factor** (the height difference between the left and right subtrees of a node), requiring a balance factor of -1, 0, or 1. When unbalanced, **rotation operations** (LL right rotation, RR left rotation, LR left rotation followed by right rotation, RL right rotation followed by left rotation) are used to adjust the structure, keeping the tree height at a logarithmic level (log n) and ensuring that operations such as search, insertion, and deletion maintain a stable complexity of O(log n). Rotations essentially adjust the pivot point to restore the balanced structure of the tree.

Read More
Figure: A Beginner's Guide to the Basic Concepts and Adjacency List Representation of Graphs

A graph consists of vertices (nodes) and edges (connections). Vertices are the basic units, and edges can be directed (digraph) or undirected. Weighted graphs have edges with weights (e.g., distances), while unweighted graphs only retain connection relationships. The adjacency list is an efficient representation method that solves the space waste problem of the adjacency matrix in sparse graphs (where the number of edges is much less than the square of the number of vertices). Its core is that each vertex stores a list of directly connected vertices. For an undirected graph, if vertex 0 is connected to 1, 2, and 3, its adjacency list is [1, 2, 3]. For a weighted graph, the adjacency list can store tuples of "neighbor + weight". The space complexity of an adjacency list is O(V + E) (where V is the number of vertices and E is the number of edges), making it suitable for sparse graphs. It facilitates traversal of neighboring vertices but requires traversing the adjacency list to check if an edge exists between two vertices. Mastering the adjacency list is fundamental for algorithms such as graph traversal and shortest path finding.

Read More
Heap: Structure and Applications, Introduction to Min-Heap and Max-Heap

A heap is a special type of complete binary tree, characterized by the size relationship between parent and child nodes (parent ≤ child for a min-heap, parent ≥ child for a max-heap). It efficiently retrieves extreme values (with the top element being the minimum or maximum), similar to a priority queue. The underlying structure is a complete binary tree, where each level is filled as much as possible, and the last level is filled from left to right. When stored in an array, the left child index is 2i+1, the right child index is 2i+2, and the parent index is (i-1)//2. Basic operations include insertion (appending to the end and then "bubbling up") and deletion (replacing the top element with the last element and then "bubbling down"), both with a time complexity of O(log n). Heaps are widely used in priority queues (e.g., task scheduling), finding the k-th largest element, and Huffman coding. They are a critical structure for efficiently handling extreme value problems.

Read More
Greedy Algorithm: What is the Greedy Algorithm? A Case Study on the Coin Change Problem

Greedy algorithm is an algorithm that makes the optimal choice (local optimum) at each step in the hope of achieving a global optimum. Its core is to satisfy the "greedy choice property"—that the local optimum at each step can lead to the global optimum. A classic application is the coin change problem: taking 25, 10, 5, and 1-cent coins as examples, to make 67 cents, we take 25×2 (50 cents), 10×1 (10 cents), 5×1 (5 cents), and 1×2 (2 cents) in descending order of denominations, totaling 6 coins, which is verified as optimal. However, its limitation is that if the problem does not satisfy the greedy choice property (e.g., with coin denominations [1, 3, 4] to make 6 cents), the greedy approach may fail (greedy would take 4+1+1=3 coins, while the optimal is 3+3=2 coins). Applicable scenarios include reasonable coin denominations (e.g., 25, 10, 5, 1) and activity scheduling (selecting the earliest-ending activities). In conclusion, the greedy algorithm is simple, intuitive, and efficient, but it only applies to problems that satisfy the greedy choice property and does not guarantee the global optimum for all problems.

Read More
Divide and Conquer Algorithm: How Does the Divide and Conquer Idea Solve Problems? The Principle of Merge Sort

The core of the divide-and-conquer algorithm is "divide and conquer," which solves complex problems through three steps: divide (split into smaller subproblems), conquer (recursively solve subproblems), and combine (integrate results). It is suitable for scenarios with recursive structures. Taking array sum calculation as an example, the array is divided, the sum of subarrays is recursively computed, and the total sum is obtained through combination. Merge sort is a typical application: the array is first divided into individual elements (which are inherently ordered), and then the ordered subarrays are merged using the two-pointer technique. Its time complexity is O(n log n) and space complexity is O(n) (requiring a temporary array). Divide-and-conquer simplifies problems through recursion, and merge sort efficiently demonstrates its advantages. It serves as a foundation for understanding recursive and sorting algorithms.

Read More
Recursion: What is Recursion? An Example with the Fibonacci Sequence, Explained for Beginners

This article explains the concept of recursion using everyday examples and classic cases. Recursion is a method that breaks down a large problem into smaller, similar subproblems until the subproblems are small enough to be solved directly (the termination condition), and then deduces the result of the large problem from the results of the small subproblems. The core lies in "decomposition" and "termination". Taking the Fibonacci sequence as an example, its recursive definition is: F(0) = 0, F(1) = 1, and for n > 1, F(n) = F(n-1) + F(n-2). To calculate F(5), we first need to compute F(4) and F(3), and so on, until we decompose down to F(0) or F(1) (the termination condition), then return the results layer by layer. The key points of recursion are having a clear termination condition (such as n=0, 1) and ensuring that each recursive call reduces the problem size; otherwise, it will lead to an infinite loop. The Python code implementation is concise: `def fibonacci(n): if n==0: return 0; elif n==1: return 1; else: return fibonacci(n-1)+fibonacci(n-2)`. Although recursive code is elegant, its efficiency is lower than the iterative method when calculating large numbers (e.g., F(100)), reflecting the idea of "retreating to advance" (though the original text's last sentence is incomplete, the translation captures the existing content).

Read More
Search Algorithms: Differences Between Sequential Search and Binary Search, and Which Is Faster?

The article introduces two basic search algorithms: sequential search and binary search, which are used to locate specific elements in data. Sequential search (linear search) works by comparing elements one by one. It does not require the data to be ordered, with a time complexity of O(n) (where n is the amount of data). Its advantage is simplicity, but its drawback is low efficiency, making it suitable for small data volumes or unordered data. Binary search (half-interval search) requires the data to be sorted. It narrows down the search range by half through comparison, with a time complexity of O(log n). It is highly efficient (e.g., only about 10 comparisons needed when n=1000), but it requires handling boundary conditions, and is suitable for large-sized ordered data. Comparison of the two: Sequential search does not require data ordering and is simple to implement but inefficient; binary search requires ordering and has higher complexity but is faster. The choice depends on data size and ordering: binary search for large ordered data and sequential search for small unordered data.

Read More
Sorting Algorithms: An Introduction to Bubble Sort, Step-by-Step Explanation + Code Examples

Bubble Sort is one of the simplest sorting algorithms in computer science. Its core idea is to repeatedly compare adjacent elements and swap their positions, allowing larger elements to gradually "bubble" up to the end of the array. The basic steps are: the outer loop controls n-1 rounds of comparisons (each round determines the position of one large element), and the inner loop starts from the first element, comparing adjacent elements in sequence; if the previous element is larger and the next is smaller, they are swapped. An optimization is that if no swaps occur in a round, it indicates the array is already sorted, and the process can terminate early. In terms of time complexity, the worst-case scenario (completely reverse ordered) is O(n²), while the best case (already sorted) is O(n). The space complexity is O(1) (only constant extra space is required). This algorithm is simple to implement and easy to understand, making it suitable for sorting small-scale data and serving as a foundational entry point for sorting algorithms.

Read More
Hash Table: How Does a Hash Table Store Data? A Diagram of Collision Resolution Methods

A hash table is a key-value storage structure that maps keys to array bucket positions through a hash function, enabling O(1) efficient lookup, insertion, and deletion. Its underlying structure is an array, where keys are converted into array indices (bucket positions) via a hash function (e.g., "key % array length"), and corresponding values are directly stored at these indices. Collisions occur when different keys yield the same hash value (e.g., student IDs 12 and 22 both %10 to 2 when the array length is 10). Two classic collision resolution methods exist: 1. **Chaining**: Each bucket stores a linked list, with colliding elements appended to the tail of the list. This is simple to implement but requires additional space. 2. **Open Addressing**: Linear probing is a common variant, where the algorithm searches for the next empty bucket (e.g., h → h+1 → h+2 ... for a hash value h). This uses only array operations but may cause clustering. The core components of a hash table are the hash function and collision handling logic, making it a foundational topic in data structure learning.

Read More
Binary Trees: Three Traversal Methods of Binary Trees, Recursive Implementation Made Super Simple

This article introduces three classic traversal methods of binary trees (pre-order, in-order, and post-order), implemented recursively, with the core being clarifying the position of root node access. Each node in a binary tree has at most left and right subtrees. Traversal refers to visiting nodes in a specific order. Recursion is key here, similar to "matryoshka dolls," where the function calls itself with a narrowed scope until empty nodes are encountered, terminating the recursion. The differences between the three traversal orders are: - Pre-order: Root → Left → Right; - In-order: Left → Root → Right; - Post-order: Left → Right → Root. Using an example tree (root 1 with left child 2 and right child 3; node 2 has left child 4 and right child 5), the traversal results are: - Pre-order: 1 2 4 5 3; - In-order: 4 2 5 1 3; - Post-order: 4 5 2 3 1. The core of recursive implementation lies in the termination condition (returning for empty nodes) and recursively traversing left and right subtrees in the traversal order. By clarifying the root position and recursive logic, the traversal process can be clearly understood.

Read More
Tree: What is a Tree Structure? Easily Understand with Real-Life Examples

This article uses a life analogy to explain the "tree" in data structures. The core is that a tree is similar to a tree in life: it has a root node (starting point), child/parent nodes (branches and their source), leaf nodes (no descendants), and subtrees (nodes and their descendants), with the characteristics of being non-linear, branching, and hierarchical. Unlike linear linked lists (single path), trees can have multiple branches (e.g., the root node can have multiple child nodes). Tree structures are ubiquitous in life: family relationships take elders as the root, corporate structures take the CEO as the root, and computer file systems take the disk as the root, all reflecting hierarchical branches. The core advantage of trees is their efficient handling of hierarchical branching problems, such as database indexing, navigation path planning, and game scene construction. Understanding tree structures allows one to master the thinking of handling branching problems. In life, families, companies, and file systems are typical applications of trees.

Read More
Queue: How is the "First-In-First-Out" of Queues Implemented? A Simple Example to Illustrate

A queue is a data structure that follows the "First-In-First-Out" (FIFO) principle. It only allows insertion at the rear and deletion at the front. Key concepts include the front (earliest element) and the rear (latest element), with basic operations being Enqueue (insertion) and Dequeue (deletion). In array-based implementation, a queue requires a front pointer, a rear pointer, and a fixed-capacity array. The queue is empty when front == rear, and full when rear == max_size. During Enqueue, the rear pointer is moved forward to store the new element; during Dequeue, the front pointer is moved forward to retrieve the element. Example Demonstration: For a queue with capacity 5, initially front=0 and rear=0. After enqueuing 1, 2, 3, rear becomes 3, with the queue elements [1, 2, 3]. Dequeuing 1 makes front=1, and enqueuing 4 moves rear to 4. Enqueuing 5 results in a full queue. Dequeuing 2 (front=2) leaves the final queue as [3, 4, 5]. Applications include task scheduling, Breadth-First Search (BFS), printer queues, and network request handling, playing a critical role in data processing and task queuing scenarios.

Read More
Stack: What Does "Last-In-First-Out" Mean? Principle Diagram

This article uses "stacking plates" as an example to explain the core concepts of the data structure "stack". A stack is a linear list where insertions and deletions can only be performed from one end (the top), with the other end being the bottom. Its core feature is "Last-In-First-Out" (LIFO) — the last element added is the first to be removed. Basic operations of a stack include: push (adding an element to the top), pop (removing and returning the top element), top (viewing the top element), and empty (checking if the stack is empty). For example, when stacking plates, new plates are placed on top (push), and the top plate must be taken first (pop), which aligns with LIFO. Stacks are widely applied in life and programming: bracket matching (using the stack to record left brackets, popping to match right brackets), function call stacks (functions called later return first), and browser back functionality (successively popping recently visited webpages). Understanding the "LIFO" feature of stacks helps solve problems like recursion and dynamic programming, making it a foundational tool in data structures.

Read More
Linked List: Difference Between Singly Linked List and Doubly Linked List, Easy for Beginners to Understand

This article uses the example of storing a list of game players to illustrate how linked lists solve the problem of node movement required when deleting intermediate elements from an array. A linked list is a linear structure composed of nodes, where each node contains a data field and a pointer field. It is stored in non - contiguous memory, and only pointers need to be modified during insertion and deletion operations. A singly linked list is the simplest form. Each node only contains a next pointer, allowing for one - way traversal (from head to tail). When inserting or deleting elements, it is necessary to first find the predecessor node and then modify the pointer. It saves memory and is suitable for one - way scenarios (such as queues). A doubly linked list has an additional prev pointer in each node, supporting two - way traversal. During insertion and deletion, operations can be directly performed through the prev and next pointers without needing to find the predecessor node. However, it consumes slightly more memory and is suitable for two - way operations (such as browser history and address books). Comparison of singly and doubly linked lists: The singly linked list has a simple structure and saves memory, while the doubly linked list is fully functional but slightly more memory - intensive. The choice should be based on the requirements: use a singly linked list for one - way operations and a doubly linked list for two - way operations or frequent operations.

Read More
Arrays: Why Are They the Cornerstone of Data Structures? A Must-Learn for Beginners

This article introduces the core position of arrays as a basic data structure. An array is a sequence of elements of the same type, enabling random access through indices (starting from 0). It features simplicity, intuitive design, continuous storage, and efficient index-based access. As a foundational structure, arrays underpin complex data structures like stacks, queues, and hash tables (e.g., stacks use arrays for Last-In-First-Out behavior, while queues utilize circular arrays for First-In-First-Out operations). They also form the basis of multi-dimensional arrays (e.g., matrices). Arrays support fundamental operations such as traversal, search, and sorting, with a random access time complexity of O(1), significantly outperforming linked lists' O(n). However, arrays have limitations: fixed size (static arrays) and inefficient insertion/deletion (requiring element shifting). In summary, arrays serve as the "key to entry" in data structures, and mastering them lays the foundation for learning complex structures and algorithms.

Read More
C++ Static Members: Shared Variables and Functions of a Class

This article introduces the concepts, usage, and precautions of static members (variables and functions) in C++. Static members address the issue that ordinary member variables cannot share data: Static member variables (modified by `static`) belong to the entire class, are stored in the global data area, and are shared by all objects. They require initialization outside the class (e.g., `int Student::count = 0;`) and can be accessed via the class name or an object (e.g., `Student::count`). In the example, the `Student` class uses the static variable `studentCount` to count the number of objects, incrementing it during construction and decrementing it during destruction to demonstrate the sharing feature. Static member functions are also modified by `static`, belong to the class rather than objects, and have no `this` pointer. They can only access static members and can be called via the class name or an object (e.g., `Student::getCount()`). Precautions: Static member variables must be initialized outside the class; static functions cannot directly access non-static members; avoid excessive use of static members to reduce coupling. Summary: Static members implement class-shared data and utility functions, enhancing data consistency and are suitable for global states (e.g., counters). However, their usage scenarios should be reasonably controlled.

Read More
Encapsulation in C++: Hiding Attributes and Exposing Interfaces

This article focuses on C++ encapsulation, with the core principle being "hiding internal details while exposing necessary interfaces." Encapsulation is a key principle in object-oriented programming, similar to how a mobile phone can be used without understanding its internal structure. In C++, access modifiers achieve this: `private` hides a class's internal properties (default), accessible only by the class itself; `public` exposes external interfaces for external calls. The necessity of encapsulation lies in preventing data chaos. For example, if a student class directly exposes attributes like age and scores, they might be set to negative values or out-of-range values. Encapsulation addresses this by using `private` members combined with `public` interfaces, where validation logic (e.g., age must be positive) is embedded in the interfaces to ensure data security. The core benefits of encapsulation are threefold: first, data security by preventing arbitrary external modification; second, centralized logic through unified validation rules in interfaces; third, reduced coupling, as external code only needs to focus on interface calls without understanding internal implementations. In summary, encapsulation serves as a "shield" in C++ class design. By hiding details and exposing interfaces, it ensures data security while making the code modular and easy to maintain.

Read More
C++ from Scratch: Constructors and Object Initialization

Constructors are used to automatically initialize member variables when an object is created, avoiding the trouble of manual assignment. They are special member functions with the same name as the class, no return type, and are automatically called when an object is created. If a constructor is not defined, the compiler generates an empty default constructor. If a parameterized constructor is defined, the default constructor must be manually written (e.g., a parameterless constructor or one with parameters having default values). Initializer lists directly initialize member variables, which is more efficient, and are mandatory for const member variables. It should be noted that constructors cannot have a return type, and the order of the initializer list does not affect the order of member declarations. Constructors ensure that objects have a reasonable initial state, avoiding random values, and enhance code security and maintainability.

Read More