Deep Dive into Heaps in Python¶
Heaps are one of the most elegant and efficient data structures in computer science, yet they're often misunderstood or overlooked. Today, we'll explore Python's heapq
module and dive deep into heap algorithms, their applications, and when to use them.
What is a Heap?¶
A heap is a specialized tree-based data structure that satisfies the heap property:
Heap Property
In a min-heap, for any given node, the node's value is less than or equal to the values of its children.
In a max-heap, for any given node, the node's value is greater than or equal to the values of its children.
Key Characteristics¶
- Complete Binary Tree: All levels are filled except possibly the last, which is filled from left to right
- Heap Property: Parent-child relationship maintains order
- Efficient Operations: Insert and extract in O(log n) time
- Array Representation: Can be efficiently stored in a list/array
Python's heapq Module¶
Python's heapq
module implements a min-heap using regular Python lists:
import heapq
# Create a heap
heap = []
# Add elements
heapq.heappush(heap, 4)
heapq.heappush(heap, 1)
heapq.heappush(heap, 3)
heapq.heappush(heap, 2)
print(heap) # [1, 2, 3, 4]
# Extract minimum
min_val = heapq.heappop(heap)
print(min_val) # 1
print(heap) # [2, 4, 3]
Essential Operations¶
import heapq
# Initialize from existing list
numbers = [4, 1, 3, 2, 16, 9, 10, 14, 8, 7]
heapq.heapify(numbers) # O(n) time complexity
print(numbers) # [1, 2, 3, 4, 7, 9, 10, 14, 8, 16]
# Push and pop in one operation
new_min = heapq.heapreplace(numbers, 5) # Pop min, then push 5
print(new_min) # 1
# Push then pop (more efficient than separate operations)
result = heapq.heappushpop(numbers, 0) # Push 0, then pop min
print(result) # 0
Real-World Applications¶
1. Task Scheduling with Priorities¶
import heapq
from datetime import datetime, timedelta
class Task:
def __init__(self, priority, name, deadline):
self.priority = priority
self.name = name
self.deadline = deadline
def __lt__(self, other):
return self.priority < other.priority
def __repr__(self):
return f"Task('{self.name}', priority={self.priority})"
class TaskScheduler:
def __init__(self):
self.tasks = []
def add_task(self, priority, name, deadline):
task = Task(priority, name, deadline)
heapq.heappush(self.tasks, task)
def get_next_task(self):
if self.tasks:
return heapq.heappop(self.tasks)
return None
def peek_next(self):
return self.tasks[0] if self.tasks else None
# Usage
scheduler = TaskScheduler()
scheduler.add_task(1, "Fix critical bug", datetime.now() + timedelta(hours=2))
scheduler.add_task(3, "Code review", datetime.now() + timedelta(days=1))
scheduler.add_task(2, "Update documentation", datetime.now() + timedelta(hours=6))
print(scheduler.get_next_task()) # Highest priority task
2. Finding K Largest/Smallest Elements¶
import heapq
def find_k_largest(nums, k):
"""Find k largest elements using a min-heap of size k"""
if k >= len(nums):
return sorted(nums, reverse=True)
# Use negative values for max-heap behavior
heap = []
for num in nums:
if len(heap) < k:
heapq.heappush(heap, num)
elif num > heap[0]:
heapq.heapreplace(heap, num)
return sorted(heap, reverse=True)
def find_k_smallest(nums, k):
"""Find k smallest elements efficiently"""
return heapq.nsmallest(k, nums)
# Example usage
numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
print(find_k_largest(numbers, 3)) # [9, 6, 5]
print(find_k_smallest(numbers, 3)) # [1, 1, 2]
3. Dijkstra's Algorithm Implementation¶
import heapq
from collections import defaultdict
def dijkstra(graph, start):
"""Find shortest paths from start to all other vertices"""
distances = defaultdict(lambda: float('inf'))
distances[start] = 0
pq = [(0, start)]
visited = set()
while pq:
current_distance, current_vertex = heapq.heappop(pq)
if current_vertex in visited:
continue
visited.add(current_vertex)
for neighbor, weight in graph[current_vertex]:
distance = current_distance + weight
if distance < distances[neighbor]:
distances[neighbor] = distance
heapq.heappush(pq, (distance, neighbor))
return dict(distances)
# Example graph
graph = {
'A': [('B', 4), ('C', 2)],
'B': [('C', 1), ('D', 5)],
'C': [('D', 8), ('E', 10)],
'D': [('E', 2)],
'E': []
}
shortest_paths = dijkstra(graph, 'A')
print(shortest_paths) # {'A': 0, 'C': 2, 'B': 4, 'D': 9, 'E': 11}
Performance Analysis¶
Operation | Time Complexity | Space Complexity |
---|---|---|
heappush |
O(log n) | O(1) |
heappop |
O(log n) | O(1) |
heapify |
O(n) | O(1) |
nlargest/nsmallest |
O(n log k) | O(k) |
When to Use Heaps¶
✅ Good for: - Priority queues - Finding k largest/smallest elements - Streaming data scenarios - Graph algorithms (Dijkstra, Prim's) - Task scheduling
❌ Not ideal for: - Random access to elements - Searching for arbitrary elements - When you need a max-heap (requires workarounds)
Advanced Techniques¶
Creating a Max-Heap¶
import heapq
class MaxHeap:
def __init__(self):
self.heap = []
def push(self, val):
heapq.heappush(self.heap, -val)
def pop(self):
return -heapq.heappop(self.heap)
def peek(self):
return -self.heap[0] if self.heap else None
def __len__(self):
return len(self.heap)
# Usage
max_heap = MaxHeap()
for val in [3, 1, 4, 1, 5]:
max_heap.push(val)
print(max_heap.pop()) # 5
print(max_heap.pop()) # 4
Heap with Custom Objects¶
import heapq
from dataclasses import dataclass, field
from typing import Any
@dataclass
class PriorityItem:
priority: int
item: Any = field(compare=False)
def __lt__(self, other):
return self.priority < other.priority
# Usage
pq = []
heapq.heappush(pq, PriorityItem(2, "Second task"))
heapq.heappush(pq, PriorityItem(1, "First task"))
heapq.heappush(pq, PriorityItem(3, "Third task"))
while pq:
priority_item = heapq.heappop(pq)
print(f"Priority {priority_item.priority}: {priority_item.item}")
Conclusion¶
Heaps are incredibly versatile data structures that excel in scenarios requiring efficient priority-based operations. Python's heapq
module makes them accessible and easy to use, though understanding the underlying principles helps you leverage their full power.
Key Takeaways:
- Use heaps for priority queues and k-largest/smallest problems
- Remember it's a min-heap - use negative values for max-heap behavior
- O(log n) operations make heaps efficient for dynamic datasets
- Perfect for algorithms like Dijkstra's and A* search
Next time you encounter a problem involving priorities, rankings, or the need to efficiently access extremes in a dataset, consider reaching for a heap!
Have questions about heaps or want to see more advanced applications? Let me know in the comments or reach out on Twitter!