In Python, you’ve probably come across terms like multi-threading, multi-processing, async and event loops. They can be confusing at first. What should we use? When? Why does Python have multiple ways to do the same thing?
In this post, I’ll break it all down in a way that actually makes sense, and to wrap it up, I’ll show you real-world code examples that demonstrate how these tools can improve performance in your system.
Multi-Threading (Good for I/O-Bound Tasks)
Multi-threading is when you run multiple threads inside the same process. But because of Python’s Global Interpreter Lock (GIL), only one thread can execute Python bytecode at a time. This means multi-threading is NOT good for CPU-heavy tasks but can be useful for I/O-bound operations like web scraping, file I/O, and API calls.
Example: Multi-Threading for Downloading Web Pages
import threading
import time
def download_page(url):
print(f"Downloading {url} ...")
time.sleep(2) # Simulate network delay
print(f"Finished {url}")
urls = ["http://example.com/page1", "http://example.com/page2", "http://example.com/page3"]
threads = [threading.Thread(target=download_page, args=(url,)) for url in urls]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
print("All downloads complete!")
- We’re just waiting for the network, so using threads allows the OS to switch between them while a thread is waiting for a I/O task to be finished.
- Threads share memory, making it lightweight.
⛔ Downside: GIL prevents true parallel execution for CPU-bound tasks. So, again, don’t use it for calculation or image/data processing tasks.
Multi-Processing 🖥️ (Best for CPU-Bound Tasks)
Multi-processing, on the other hand, spawns multiple processes, each with its own memory space. This means Python can actually run code in parallel on multiple CPU cores.
Example: Multi-Processing for CPU-Heavy Work
import multiprocessing
def compute_square(n):
return n * n
if __name__ == "__main__":
numbers = [1, 2, 3, 4, 5]
with multiprocessing.Pool(processes=3) as pool:
results = pool.map(compute_square, numbers)
print("Squares:", results)
- Each process runs independently, bypassing the GIL.
- Ideal for CPU-heavy tasks like image processing, machine learning, and data analysis.
In multi-processing, processes don’t share memory, so communication between them requires extra effort. Each process is actually a new instance of the Python interpreter, and each one has its own private memory area. This is different from multi-threading, where threads share the same memory within a single process.
Async Event Loop ⚡ (Best for I/O-Heavy & High-Concurrency Tasks)
Async programming uses an event loop to handle thousands of tasks efficiently without blocking. Instead of waiting (like threads do), an event loop will switch to another task while a task is waiting for I/O. In other word, instead of relying on OS-level thread management like what happens in multi-threading, an event loop switches between tasks cooperatively, meaning a task gives up control voluntarily to the main thread when it encounters an await
statement, then the main thread can pay to another task while the waiting task is not backed. All is done in a single thread.
Example: Async Event Loop for Non-Blocking Tasks
import asyncio
async def task():
print("Start Task")
await asyncio.sleep(3) # Non-blocking wait
print("Task Complete")
async def main():
print("Before Task")
await task()
print("After Task")
asyncio.run(main())
Since all tasks run in the same process and thread, they can access the same global variables or objects in memory. But, just like with regular Python code, if you want to safely share data between tasks, you need to manage synchronization or use other mechanisms like locks or other safe data structures.
- It’s single-threaded but non-blocking.
- Ideal for web scraping, API calls, database queries, and file I/O.
⛔ Like multi-threading it’s not good for CPU-heavy tasks (multi-processing is better for that).
Running Multiple Async Tasks (Concurrency)
Here is an example of two tasks that asyncio will accomplish them and waits for both to be completed using gather()
method.
Example: Running Multiple Async Tasks in Parallel
import asyncio
async def task1():
print("Task 1 Start")
await asyncio.sleep(2)
print("Task 1 Done")
async def task2():
print("Task 2 Start")
await asyncio.sleep(3)
print("Task 2 Done")
async def main():
await asyncio.gather(task1(), task2()) # Run both tasks concurrently
asyncio.run(main())
- Task 1 takes 2 seconds.
- Task 2 takes 3 seconds.
- Total time: Only 3 seconds instead of 5.
When to Use What?
Use Multi-Threading 🧵 If: | Use Multi-Processing 🖥️ If: | Use Async ⚡ If: |
---|---|---|
You have I/O-bound tasks | You have CPU-bound tasks | You have high concurrency I/O tasks |
Need lightweight concurrency | Need true parallel execution | Need thousands of async operations |
Examples: Web scraping, file I/O, database queries | Examples: Machine learning, image processing, data analysis | Examples: APIs, web scraping, real-time applications |
Real World Example Combining Multi-Processing and Async for Heavy I/O + CPU Tasks
To help you fully grasp the advantages of using these tools in real-world scenarios, let’s look at a practical example: fetching shopping cart data from an API (I/O-bound) and then calculating the total price of each cart (CPU-heavy).
Example: Web Scraping + CPU-Intensive Processing
import asyncio
import time
import aiohttp
import multiprocessing
async def fetch_cart(session, cart_id):
url = f"https://dummyjson.com/carts/{cart_id}"
await asyncio.sleep(cart_id) # Simulate network delay for each cart
async with session.get(url) as response:
return await response.json()
def calculate_cart_total_price(cart):
products = cart["products"]
time.sleep(cart["id"]) # Simulate CPU-heavy work for each cart
return cart["id"], sum(product["total"] for product in products)
async def main():
start_time = time.time()
card_ids = [1, 2, 3, 4, 5]
async with aiohttp.ClientSession() as session:
tasks = [fetch_cart(session, url) for url in card_ids]
responses = await asyncio.gather(*tasks)
fetching_elapsed_time = time.time() - start_time
print("All carts fetched in {} seconds, instead of ~{}".format(fetching_elapsed_time, sum(card_ids)))
# Use multi-processing for CPU-intensive processing
with multiprocessing.Pool(processes=5) as pool:
results = pool.map(calculate_cart_total_price, responses)
print("Total price of all carts:", sum(result[1] for result in results))
processing_elapsed_time = time.time() - start_time - fetching_elapsed_time
print("Calculation done in {} seconds instead of ~{}".format(processing_elapsed_time, sum(card_ids)))
total_elapsed_time = time.time() - start_time
print("Total elapsed time: {} seconds, instead of ~{}".format(total_elapsed_time, sum(card_ids)*2))
asyncio.run(main())
In this example, multiple shopping carts are fetched concurrently instead of waiting for each request to complete one by one. This significantly reduces the total time spent on I/O operations. Once all the data is retrieved, we used multi-processing to perform CPU-heavy calculations in parallel across multiple processes, making full use of the available CPU cores.
If we were to fetch carts sequentially without async, each request would block execution until it completed, resulting in a total wait time of approximately 15 seconds (1+2+3+4+5). Similarly, if we processed each cart’s total price one after another without multiprocessing, it would add another 15 seconds, leading to an overall execution time of around 30 seconds. Thanks to async-io and multi-processing, now our optimized approach reduces this to roughly 5-6 seconds. You can try running this code on your machine to experience firsthand how async I/O and multi-processing work together to optimize performance.
Long story short
- 🧵 Multi-threading is great for I/O-bound tasks but is limited by GIL.
- 🖥️ Multi-processing is best for CPU-bound tasks and bypasses GIL.
- ⚡ Async programming is perfect for high concurrency I/O (e.g., handling thousands of requests).
So next time you’re wondering “Which one should I use?”, just ask yourself:
- Is it I/O-heavy? → Use multi-threading or async.
- Is it CPU-heavy? → Use multi-processing.
- You need to handle thousands of concurrent I/O-heavy tasks? → Use async because it is more efficient than making thousand threads.
Happy coding!