Skip to content

Async vs. Threading vs. Multiprocessing in Network Automation

Async vs. Threading vs. Multiprocessing in Network Automation


This post is part of our ongoing series on network automation best practices, grounded in the PRIME Framework and PRIME Philosophy.

Why This Blog Exists

Concurrency is essential for scalable automation—but not all concurrency models are created equal. This post explains the differences between async, threading, and multiprocessing, when to use each, and how the PRIME Framework guides safe, transparent choices.


🚦 PRIME Philosophy: Safety and Transparency

  • Safety: Choose the right concurrency model for your task, avoid race conditions and deadlocks
  • Transparency: Document why and how concurrency is used, make parallelism explicit
  • Measurability: Track outcomes, performance, and failures
  • Ownership: Your team understands and controls concurrency, not just the framework
  • Empowerment: Avoid "magic" parallelism—make it explicit and teachable

Understanding the Models: Async, Threading, Multiprocessing

  • Threading: Multiple threads in one process, good for I/O-bound tasks (e.g., SSH, file I/O). Python's GIL limits CPU-bound scaling. Beware of shared state, race conditions, and thread-unsafe libraries.
  • Multiprocessing: Multiple processes, each with its own Python interpreter. Best for CPU-bound tasks (e.g., parsing, data crunching). True parallelism, but higher memory and startup cost. Use for CPU-heavy parsing, analytics, or when you need process isolation.
  • Async: Non-blocking I/O, best for high-volume, lightweight tasks (e.g., API calls, telemetry). Uses event loop and coroutines. Requires async-capable libraries and a new way of thinking about code structure.

Key Differences:

  • Threading: Simple for I/O, but beware of shared state and race conditions
  • Multiprocessing: True parallelism, but higher memory and startup cost
  • Async: Most efficient for many small, non-blocking tasks; requires async/await code

Deep Dive: Internals & Pitfalls

  • Threading: Python's Global Interpreter Lock (GIL) means only one thread executes Python bytecode at a time. Threads are best for I/O-bound tasks, but not for CPU-bound work. Many network libraries (Netmiko, Paramiko) are not thread-safe.
  • Multiprocessing: Each process has its own memory space and Python interpreter, so the GIL is not a bottleneck. Use for CPU-bound tasks, but beware of serialization (pickling) costs and inter-process communication complexity.
  • Async: Async code uses an event loop to schedule coroutines. No threads or processes are created by default. Async is ideal for high-scale, non-blocking I/O, but requires async libraries (e.g., scrapli, aiohttp) and careful error handling.

Common Pitfalls:

  • Threading: Race conditions, deadlocks, thread-unsafe libraries, debugging complexity
  • Multiprocessing: Serialization errors, high memory usage, slow startup
  • Async: Mixing sync and async code, blocking the event loop, poor error handling

When to Use Each Model

Model Best For Example Use Case
Threading I/O-bound, blocking Parallel SSH sessions
Multiprocessing CPU-bound, heavy tasks Parsing large configs
Async High-volume, lightweight Telemetry collection, APIs

PRIME-Aligned Decision Tree

  1. Is the task CPU-bound?
    • Yes: Use multiprocessing
    • No: Continue
  2. Is the task I/O-bound and blocking?
    • Yes: Use threading (if library is thread-safe)
    • No: Continue
  3. Is the task high-volume, lightweight, and async-capable?
    • Yes: Use async
    • No: Re-examine requirements or refactor

Decision Checklist:

  • Is your task waiting on network or disk? (Threading or Async)
  • Is your task CPU-intensive? (Multiprocessing)
  • Do you need to scale to thousands of concurrent tasks? (Async)
  • Do you need to share state between tasks? (Threading: yes, Async/Multiprocessing: no or limited)
  • Is the library you use thread- or async-safe? (Check docs!)
  • Can you tolerate non-deterministic ordering? (Threading/Async: yes, Multiprocessing: yes)

Example 1: Refactoring for Async

Before (Threading):

1
2
3
from threading import Thread
for device in devices:
    Thread(target=collect_data, args=(device,)).start()

Pitfall: No thread join, no error handling, possible race conditions.

After (Async):

1
2
3
4
import asyncio
async def collect_data(device):
    ...
asyncio.run(asyncio.gather(*(collect_data(d) for d in devices)))

Advanced Async Example: Error Handling and Timeouts

1
2
3
4
5
6
7
import asyncio
async def collect_data(device):
    try:
        return await asyncio.wait_for(actual_collection(device), timeout=10)
    except asyncio.TimeoutError:
        return f"{device}: TIMEOUT"
results = asyncio.run(asyncio.gather(*(collect_data(d) for d in devices)))

Example 2: Multiprocessing for CPU-Bound Tasks

1
2
3
4
5
from multiprocessing import Pool
def parse_config(device):
  ...
with Pool(4) as pool:
  results = pool.map(parse_config, devices)

Advanced Pattern: Process Pool with Error Handling

1
2
3
4
5
6
7
8
from multiprocessing import Pool
def safe_parse(device):
  try:
    return parse_config(device)
  except Exception as e:
    return f"{device}: ERROR {e}"
with Pool(4) as pool:
  results = pool.map(safe_parse, devices)

Advanced Patterns: Error Handling, Debugging, and Monitoring

  • Use thread-safe data structures (queues, locks) for threading
  • Catch and log exceptions in all parallel tasks
  • Use timeouts and retries for async and threaded operations
  • Monitor resource usage (CPU, memory, open connections)
  • Test for race conditions, deadlocks, and memory leaks
  • Use concurrent.futures.ThreadPoolExecutor or ProcessPoolExecutor for modern, robust parallelism
  • For async, prefer libraries with first-class async support (e.g., scrapli, aiohttp)
  • Always document concurrency assumptions and test at scale

PRIME in Action: Choosing Safely

  • Document concurrency choices in code and runbooks
  • Test for race conditions and deadlocks
  • Monitor performance and failures
  • Review concurrency models as requirements change
  • Prefer explicit, documented parallelism over "magic" concurrency
  • Use PRIME to guide design, implementation, and review of all concurrent automation

Summary: Blog Takeaways

  • Use threading for I/O, multiprocessing for CPU, async for high-volume I/O
  • PRIME principles help you choose and document concurrency safely
  • Always test and monitor parallel automation
  • Deeply understand the limitations and risks of each concurrency model
  • Use advanced patterns (timeouts, error handling, resource monitoring) for production-grade reliability


📣 Want More?