Async vs. Threading vs. Multiprocessing in Network Automation

Async vs. Threading vs. Multiprocessing in Network Automation¶

This post is part of our ongoing series on network automation best practices, grounded in the PRIME Framework and PRIME Philosophy.

Why This Blog Exists¶

Concurrency is essential for scalable automation—but not all concurrency models are created equal. This post explains the differences between async, threading, and multiprocessing, when to use each, and how the PRIME Framework guides safe, transparent choices.

🚦 PRIME Philosophy: Safety and Transparency¶

Safety: Choose the right concurrency model for your task, avoid race conditions and deadlocks
Transparency: Document why and how concurrency is used, make parallelism explicit
Measurability: Track outcomes, performance, and failures
Ownership: Your team understands and controls concurrency, not just the framework
Empowerment: Avoid "magic" parallelism—make it explicit and teachable

Understanding the Models: Async, Threading, Multiprocessing¶

Threading: Multiple threads in one process, good for I/O-bound tasks (e.g., SSH, file I/O). Python's GIL limits CPU-bound scaling. Beware of shared state, race conditions, and thread-unsafe libraries.
Multiprocessing: Multiple processes, each with its own Python interpreter. Best for CPU-bound tasks (e.g., parsing, data crunching). True parallelism, but higher memory and startup cost. Use for CPU-heavy parsing, analytics, or when you need process isolation.
Async: Non-blocking I/O, best for high-volume, lightweight tasks (e.g., API calls, telemetry). Uses event loop and coroutines. Requires async-capable libraries and a new way of thinking about code structure.

Key Differences:

Threading: Simple for I/O, but beware of shared state and race conditions
Multiprocessing: True parallelism, but higher memory and startup cost
Async: Most efficient for many small, non-blocking tasks; requires async/await code

Deep Dive: Internals & Pitfalls¶

Threading: Python's Global Interpreter Lock (GIL) means only one thread executes Python bytecode at a time. Threads are best for I/O-bound tasks, but not for CPU-bound work. Many network libraries (Netmiko, Paramiko) are not thread-safe.
Multiprocessing: Each process has its own memory space and Python interpreter, so the GIL is not a bottleneck. Use for CPU-bound tasks, but beware of serialization (pickling) costs and inter-process communication complexity.
Async: Async code uses an event loop to schedule coroutines. No threads or processes are created by default. Async is ideal for high-scale, non-blocking I/O, but requires async libraries (e.g., scrapli, aiohttp) and careful error handling.

Common Pitfalls:

Threading: Race conditions, deadlocks, thread-unsafe libraries, debugging complexity
Multiprocessing: Serialization errors, high memory usage, slow startup
Async: Mixing sync and async code, blocking the event loop, poor error handling

When to Use Each Model¶

Model	Best For	Example Use Case
Threading	I/O-bound, blocking	Parallel SSH sessions
Multiprocessing	CPU-bound, heavy tasks	Parsing large configs
Async	High-volume, lightweight	Telemetry collection, APIs

PRIME-Aligned Decision Tree¶

Is the task CPU-bound?
- Yes: Use multiprocessing
- No: Continue
Is the task I/O-bound and blocking?
- Yes: Use threading (if library is thread-safe)
- No: Continue
Is the task high-volume, lightweight, and async-capable?
- Yes: Use async
- No: Re-examine requirements or refactor

Decision Checklist:

Is your task waiting on network or disk? (Threading or Async)
Is your task CPU-intensive? (Multiprocessing)
Do you need to scale to thousands of concurrent tasks? (Async)
Do you need to share state between tasks? (Threading: yes, Async/Multiprocessing: no or limited)
Is the library you use thread- or async-safe? (Check docs!)
Can you tolerate non-deterministic ordering? (Threading/Async: yes, Multiprocessing: yes)

Example 1: Refactoring for Async¶

Before (Threading):

from threading import Thread
for device in devices:
    Thread(target=collect_data, args=(device,)).start()

Pitfall: No thread join, no error handling, possible race conditions.

After (Async):

import asyncio
async def collect_data(device):
    ...
asyncio.run(asyncio.gather(*(collect_data(d) for d in devices)))

Advanced Async Example: Error Handling and Timeouts¶

import asyncio
async def collect_data(device):
    try:
        return await asyncio.wait_for(actual_collection(device), timeout=10)
    except asyncio.TimeoutError:
        return f"{device}: TIMEOUT"
results = asyncio.run(asyncio.gather(*(collect_data(d) for d in devices)))

Example 2: Multiprocessing for CPU-Bound Tasks¶

from multiprocessing import Pool
def parse_config(device):
  ...
with Pool(4) as pool:
  results = pool.map(parse_config, devices)

Advanced Pattern: Process Pool with Error Handling¶

from multiprocessing import Pool
def safe_parse(device):
  try:
    return parse_config(device)
  except Exception as e:
    return f"{device}: ERROR {e}"
with Pool(4) as pool:
  results = pool.map(safe_parse, devices)

Advanced Patterns: Error Handling, Debugging, and Monitoring¶

Use thread-safe data structures (queues, locks) for threading
Catch and log exceptions in all parallel tasks
Use timeouts and retries for async and threaded operations
Monitor resource usage (CPU, memory, open connections)
Test for race conditions, deadlocks, and memory leaks
Use concurrent.futures.ThreadPoolExecutor or ProcessPoolExecutor for modern, robust parallelism
For async, prefer libraries with first-class async support (e.g., scrapli, aiohttp)
Always document concurrency assumptions and test at scale

PRIME in Action: Choosing Safely¶

Document concurrency choices in code and runbooks
Test for race conditions and deadlocks
Monitor performance and failures
Review concurrency models as requirements change
Prefer explicit, documented parallelism over "magic" concurrency
Use PRIME to guide design, implementation, and review of all concurrent automation

Summary: Blog Takeaways¶

Use threading for I/O, multiprocessing for CPU, async for high-volume I/O
PRIME principles help you choose and document concurrency safely
Always test and monitor parallel automation
Deeply understand the limitations and risks of each concurrency model
Use advanced patterns (timeouts, error handling, resource monitoring) for production-grade reliability

Asyncio for Network Automation (Expert) — Master Python's asyncio for scalable, event-driven workflows.
Threading in Network Automation — When to use threading and when to avoid it.
Deep Dive: CDP Network Audit — See real-world threaded discovery and parallel execution.
Secure Credential Vaulting (Expert) — Avoid concurrency-related credential leaks.