Why Nornir
Why Nornir? Understanding the Problem and Solutionยถ
"From 30 Minutes to 3 Minutes โ Why Enterprise Networks Need Parallel Automation"ยถ
You've completed the Beginner Tutorials and successfully built a multi-device config backup script. It works great for 10 devices, even 50 devices. But what if your organisation has 500 devices? Or 5,000?
In this tutorial, we'll uncover the critical scalability problem with your current approach, demonstrate how it manifests in real networks, and introduce Nornirโthe solution designed for enterprise automation.
Important: This tutorial is conceptual. We're NOT writing production code yet. We're understanding the problem so that Nornir's solution makes sense.
๐ฏ What You'll Learnยถ
By the end of this tutorial, you'll understand:
- โ Why loops are fundamentally limited for device operations
- โ The mathematical principle of parallelization (Amdahl's Law)
- โ Real-world performance impact: sequential vs. parallel
- โ Nornir's architecture and why it's designed differently
- โ The cost/benefit tradeoff of adding framework complexity
- โ When Nornir is the right choice (and when it isn't)
๐ด The Problem: Sequential Bottleneckยถ
Let's revisit your Beginner Tutorial #3:
# From Tutorial #3 โ The Serial Approach
for device in devices:
hostname, filename, size, status = backup_device_config(device, backup_dir)
What this does:
- Connect to Device #1
- Retrieve config (5 seconds of network I/O)
- Save to file (1 second)
- Disconnect
- THEN move to Device #2
- Repeat...
The fundamental issue: While the script waits for Device #1's network response, your CPU is completely idle. It can't fetch Device #2's configโit's stuck waiting.
Real-World Impactยถ
Let's do some math:
Scenario: Enterprise network with 300 devicesยถ
Per-device timing:
- SSH connection: 2 seconds
show running-configexecution: 3 seconds (network latency)- File write: 1 second
- Total per device: ~6 seconds
Sequential approach (Tutorial #3):
300 devices ร 6 seconds = 1,800 seconds = 30 MINUTES
Parallel approach (Nornir):
6 seconds ร 10 concurrent connections = 0.6 seconds per "round"
300 รท 10 = 30 rounds
30 ร 0.6 = 18 seconds (worst case, can be faster with optimisation)
Real-world result: The same job takes 30 minutes with your current script but only 2-3 minutes with Nornir.
That's a 10-15x speedup.
๐ Visualizing Sequential vs. Parallelยถ
Sequential Execution (Tutorial #3 Approach)ยถ
Device 1: [=====...wait for network.....=====] โ
Device 2: [=====...wait for network.....=====] โ
Device 3: [=====...wait for network.....=====] โ
Device 4: [=====...wait for network.....=====] โ
Time: โโโโโโโโโโโโโโโโโโโโโโโโโโโโ (10 minutes for 4 devices)
CPU: โโโโโโโโโโโโโโโโโโโโโโโโโโโโ (CPU idle ~90% of the time)
Notice: While Device 1 waits for the network, Devices 2, 3, 4 aren't even started. The CPU is idle.
Parallel Execution (Nornir Approach)ยถ
Device 1: [=====network=====]
Device 2: [ โ overlapping =====network=====]
Device 3: [ โ overlapping =====network=====]
Device 4: [ โ overlapping =====network=====]
Time: โโโโโโโโโโโโ (3 minutes for 4 devices)
CPU: โโโโโโโโโโโโ (CPU efficiently scheduling I/O)
Notice: While Device 1 waits for the network, Devices 2, 3, 4 are fetching simultaneously. The network is fully utilized.
Task Execution Flow Comparisonยถ
Sequential Task Flowยถ
flowchart TD
Start([Start Backup Job]) --> D1[Connect Device 1]
D1 --> F1[Fetch Config]
F1 --> S1[Save File]
S1 --> D2[Connect Device 2]
D2 --> F2[Fetch Config]
F2 --> S2[Save File]
S2 --> D3[Connect Device 3]
D3 --> F3[Fetch Config]
F3 --> S3[Save File]
S3 --> D4[Connect Device 4]
D4 --> F4[Fetch Config]
F4 --> S4[Save File]
S4 --> End([Job Complete])
style D1 fill:#ffcccc
style D2 fill:#ffcccc
style D3 fill:#ffcccc
style D4 fill:#ffcccc
style F1 fill:#ccccff
style F2 fill:#ccccff
style F3 fill:#ccccff
style F4 fill:#ccccff
Parallel Task Flow (Nornir)ยถ
flowchart TD
Start([Start Backup Job]) --> Pool["Connection Pool Initialized
(up to 10 workers)"]
Pool --> D1[Device 1:
Connect + Fetch]
Pool --> D2[Device 2:
Connect + Fetch]
Pool --> D3[Device 3:
Connect + Fetch]
Pool --> D4[Device 4:
Connect + Fetch]
D1 --> Save[Results Aggregated]
D2 --> Save
D3 --> Save
D4 --> Save
Save --> End([Job Complete])
style Pool fill:#ccffcc
style D1 fill:#ffffcc
style D2 fill:#ffffcc
style D3 fill:#ffffcc
style D4 fill:#ffffcc
style Save fill:#ccffcc
๐งฎ The Math: Amdahl's Lawยถ
Why doesn't this scale infinitely? There's a mathematical ceiling:
Amdahl's Law:
Speedup = 1 / [(1 - P) + (P / N)]
Where:
P = percentage of task that can be parallelized (e.g., 0.95 for network ops)
N = number of parallel processors/threads
For network operations (which are ~95% parallel):
- 10 parallel connections: 8.3x speedup
- 20 parallel connections: 13.3x speedup
- 50 parallel connections: 26x speedup
- 100 parallel connections: 47x speedup (diminishing returns visible here)
Practical takeaway: You get massive gains up to ~10-20 concurrent connections, then diminishing returns. But even diminishing returns beat sequential by a mile.
๐ Why Your Tutorial #3 Script Doesn't Scaleยถ
Your current multi-device-config-backup.py uses this pattern:
for device in devices:
# Connect
# Collect config
# Save file
# Move to next device (don't start next until this is done)
This is sequential iteration. It's simple, it's clear, it's great for learningโbut it's a dead-end for enterprise scale.
The Limitationsยถ
| Aspect | Tutorial #3 | Enterprise Need |
|---|---|---|
| Max devices | 50-100 (before slowness) | 500-5000+ |
| Expected runtime | 10+ minutes | 2-3 minutes |
| Code complexity | Simple loops | Framework (Nornir) |
| Failure isolation | Per-device try/catch | Unified result aggregation |
| Extensibility | Hard (one-off changes) | Easy (reusable tasks) |
| Team reusability | One script per job | Shared task library |
โ๏ธ Interlude: Why Not Just Use Threading in Python?ยถ
You might think: "Why learn Nornir? Can't I just add threading to Tutorial #3?"
You could, but here's why that's a bad idea. If you want the production-safe alternative, move next into Nornir Fundamentals and then Advanced Nornir Patterns.
import threading
# This creates threadsโbut threads in Python don't truly parallelize due to GIL
def backup_with_threading(devices):
threads = []
for device in devices:
t = threading.Thread(target=backup_device_config, args=(device,))
threads.append(t)
t.start()
for t in threads:
t.join()
Problems:
- Python's GIL (Global Interpreter Lock) โ Threads don't actually run in parallel; they take turns
- Result aggregation โ Where does output go? How do you collect all results?
- Result aggregation โ No unified error handling
- Credentials โ Thread-safe password management gets complex
- Scalability โ Creating 500 threads crashes Python
Nornir doesn't use threading. It uses async I/O (via asyncio), which allows true concurrent operations without the GIL limitations.
๐๏ธ Nornir's Architectureยถ
Nornir solves this problem by building a task-based framework instead of a script-based one.
Core Conceptsยถ
1. Tasks (not loops)ยถ
Instead of:
for device in devices:
do_something(device)
You write:
@task
def backup_config(task):
# This function runs once per device, in parallel
config = task.run(netmiko_task, ...)
return result
2. Inventory (not hardcoded or CSV)ยถ
Nornir abstracts device information:
# inventory/hosts.yaml
device1:
hostname: 192.168.1.1
groups:
- ios_devices
vars:
privileged: true
device2:
hostname: 192.168.1.2
groups:
- ios_devices
3. Runner (not manual iteration)ยถ
Nornir's runner automatically:
- Loads all devices from inventory
- Executes tasks in parallel
- Collects results
- Handles failures
4. Result Aggregation (not scattered output)ยถ
result = nornir.run(backup_task)
# Built-in result object:
result[device_id].result # The return value
result[device_id].failed # Did it fail?
result[device_id].exception # What went wrong?
The benefit: Nornir handles all the parallel complexity for you. You focus on the business logic.
๐ Architecture Comparisonยถ
Tutorial #3 (Sequential Script Architecture)ยถ
main()
โโโ read_inventory() [CSV]
โโโ for each device:
โ โโโ backup_device_config()
โ โ โโโ SSH connect
โ โ โโโ send_command()
โ โ โโโ Write file
โ โ โโโ Return (hostname, filename, size, status)
โ โโโ Collect results in list
โโโ create_backup_manifest()
Characteristics:
- Linear control flow
- One device at a time
- Results scattered (some in variables, some in files)
- Hard to reuse (tied to specific task logic)
Nornir (Task-Based Parallel Architecture)ยถ
Nornir Instance
โโโ Inventory Manager
โ โโโ Loads devices from YAML/Netbox/API
โโโ Task Registry
โ โโโ backup_config @task
โ โโโ validate_config @task
โ โโโ compare_configs @task
โโโ Runner
โโโ Parallel task execution (connection pool)
โโโ Middleware pipeline
โโโ Result aggregation
โโโ Plugin system
Characteristics:
- Task-based (functional programming)
- Parallel by default
- Unified result object
- Highly reusable (tasks are libraries)
๐ก When to Use Nornirยถ
Use Nornir Whenยถ
โ
Scale matters (50+ devices)
โ
Performance matters (tight backup windows)
โ
Complexity exists (multi-step workflows, compliance checks)
โ
Teams collaborate (shared task libraries)
โ
Enterprise requirements (audit trails, integration, reliability)
โ
Future growth (will your network grow?)
Use Tutorial #3 When:ยถ
โ
Quick one-off script
โ
Very small network (<10 devices)
โ
Learning automation basics (Tutorial #3 is perfect for this)
โ
No performance requirements
๐ Detailed Comparison: Approaches to Multi-Device Automationยถ
The table below breaks down how different approaches compare across real-world concerns:
| Aspect | Tutorial #3 (Sequential) | Threading (DIY) | Nornir (Framework) | Ansible (Alternative) |
|---|---|---|---|---|
| Learning curve | Easy | Moderate | Moderate | Moderate-Hard |
| Max devices | ~100 | ~50 (GIL limits) | 500-5000+ | 1000+ |
| Runtime (100 devices) | 10 min | 2-3 min* | 1-2 min | 2-3 min |
| Code complexity | Low | High | Moderate | High |
| Error isolation | Try/catch per device | Thread local storage | Native (per-host) | Native (per-host) |
| Credential management | Hardcoded/env vars | Thread-safe needed | Secure pattern | Vault support |
| Team reusability | One-off scripts | Hard (threading logic) | Easy (task libraries) | Easy (playbooks) |
| Extensibility | Hard | Very hard | Easy | Easy |
| Logging | Messy in parallel | Race conditions | Clean/unified | Clean/unified |
| Integration | Manual (APIs, DBs) | Manual | Plugin system | Module system |
| Production-ready | No | Rarely | Yes | Yes |
| Maintenance burden | Low initially, high later | Very high | Moderate | Moderate |
- Threading performance varies wildly due to GIL contention
โ ๏ธ Real-World Gotchas & Edge Casesยถ
Gotcha #1: The 3 AM Production Outageยถ
Scenario: Your sequential script has been running fine for 6 months. Your network grows 10x. Now backups that took 30 minutes take 5 hours.
The problem: You didn't anticipate scale early.
The lesson: Planning for scale isn't premature optimisationโit's professional development.
Gotcha #2: The Failing Device That Kills Everythingยถ
Sequential script (unprotected):
for device in devices:
backup_device(device) # If device 47 fails, 48-100 never run
Real scenario: Device 47 has SSH timeout. Your backup never completes. Management asks "why weren't the other 53 devices backed up?"
Solution: Framework-level error isolation (Nornir handles this automatically)
Gotcha #3: Credentials Leak Into Logsยถ
Common mistake:
print(f"Connecting with {username}:{password}") # # โ NEVER DO THIS!
In parallel environments, this becomes even more visible. Nornir's logging patterns protect you from this.
Gotcha #4: Device Dependency Chainsยถ
Real scenario: Before backing up an access switch, you need to pull its inventory from your IPAM system.
1. Call IPAM API for device list
2. Parallel: Back up each device
3. Parallel: Validate each backup
4. Merge results for compliance report
Sequential: Can't start step 2 until step 1 completes (correct!)
Threading DIY: Race conditions if not careful
Nornir: Built-in patterns for this (Tutorial #3 covers this!)
Gotcha #5: Memory Exhaustion with Large Device Countsยถ
Scenario: You parallelize all 5,000 devices at once.
What happens:
- 5,000 SSH connections ร 4MB per connection = 20GB RAM
- Python crashes
- Takes you 2 hours to figure out why
The fix: Connection pools with "max workers" limiting (Nornir: num_workers: 50)
๐ Practical Decision Treeยถ
Use this to decide which approach is right now:
Do you have network devices to manage with scripts?
โ
โโ YES: How many?
โ โ
โ โโ Fewer than 10: Use Tutorial #3
โ โ (Simple is good!)
โ โ
โ โโ 10-50 devices: Use Tutorial #3 now, plan Nornir later
โ โ (You have time before performance matters)
โ โ
โ โโ 50+ devices: Use Nornir now
โ (Performance matters, complexity is justified)
โ
โโ ALSO CONSIDER:
โ
โโ Will this run more than once? โ Plan for reuse
โโ Will your network grow? โ Plan for scale
โโ Will your ops team use this? โ Plan for maintainability
โโ Is this business-critical? โ Plan for reliability
๐ You've Got Options, But They're Differentยถ
Honest truth: There's no "best" tool. There's the right tool for your current situation.
- Tutorial #3 is your "learn automation" tool
- Threading is your "never use this" tool (seriously, don't โ use Nornir when you need controlled parallelism)
- Nornir is your "production ready" tool
- Ansible is your "infrastructure as code" tool
They're solving different problems at different scales. Nornir solves this problem (parallel network device operations) extremely well.
๐งช Interactive Learning Checkpointยถ
Before moving on, ask yourself:
-
Do you understand why loops alone won't work for many devices?
- If no: Re-read "The Problem: Sequential Bottleneck"
- If yes: โ Move forward
-
Can you explain parallel execution to someone?
- If no: Study the Mermaid diagrams and ASCII art above
- If yes: โ Move forward
-
Do you know when you'd use Nornir vs. Tutorial #3?
- If no: Review the "When to Use" section
- If yes: โ You're ready for Tutorial #2
Stuck? This is that moment where concepts should click. Take 10 minutes and re-read any section that confused you. This foundation matters for everything coming next.
๐ฏ The Production Realityยถ
In real organisations, here's what happens:
Month 1: "Let's automate config backups!"
โ Build Tutorial #3 script
โ Works great!
Month 3: "We added offices in Asia and Europe. Backups now take 90 minutes."
โ "Hmm, let me add threading..."
โ Threads cause issues...
Month 6: "Can we also do compliance checking? And integrate with our ticketing system?"
โ "The script is spiraling... This needs a redesign..."
โ This is where you wish you'd started with Nornir
๐ Under the Hood: Why Nornir Worksยถ
Nornir uses asyncio (Python's asynchronous I/O library) under the hood:
# Parallel execution with asyncio (simplified)
import asyncio
async def backup_device(device):
# While this device waits for SSH, other devices run
await asyncio.sleep(3) # Simulates network I/O
return f"Backed up {device}"
async def backup_all(devices):
# Create tasks for all devices (don't wait yet)
tasks = [backup_device(d) for d in devices]
# Now run ALL tasks concurrently
results = await asyncio.gather(*tasks)
return results
# All 4 devices run in ~3 seconds (parallel)
# Not 12 seconds (sequential)
Nornir abstracts this complexity, so you write simple task functions and Nornir handles the async execution automatically.
๐ Real Enterprise Exampleยถ
Telecom company with 2,500 Cisco devicesยถ
Old approach (Tutorial #3):
Backup job scheduled: 2:00 AM
Expected completion: 4:30 AM (150 minutes)
Maintenance window: 2:00-6:00 AM โ Fits
With Nornir:
Backup job scheduled: 2:00 AM
Expected completion: 2:12 AM (12 minutes)
Maintenance window: 2:00-6:00 AM โ Fits comfortably
Plus: Can now run more audits/checks in same window!
The business value: 20 minutes used by automation instead of 2+ hours = real cost savings.
๐ง The Learning Curveยถ
Truth: Nornir IS more complex than Tutorial #3.
But complexity serves a purpose:
Difficulty vs. Power
Tutorial Difficulty: โ (low)
Tutorial Power: โ (limited by scale)
Nornir Difficulty: โโโโ (moderate)
Nornir Power: โโโโโโโโโโโโโโโโ (enterprise scale)
The cost/benefit: Adding moderate complexity early saves enormous complexity later (no threading hacks, no refactoring).
๐ฎ What's Coming Nextยถ
In Tutorial #2: Nornir Fundamentals, we'll:
- Install Nornir and dependencies
- Create your first inventory file
- Write your first
@taskfunction - Run it against 5+ devices in parallel
- See the performance benefit firsthand
Spoiler: You'll write basically the same logic as Tutorial #3, but Nornir will parallelize it automatically.
๐ฏ Key Takeawayยถ
If you're automating networks at any significant scale:
Sequential scripts = Training wheels
Nornir = Real enterprise tool
You don't need to choose immediately. But if you're building anything more than a quick proof-of-concept, learning Nornir is an investment that pays dividends.
๐ฌ Your Perspectiveยถ
As someone building this for the first time, here's my honest take:
- Nornir feels more complex when you first see it (it is)
- BUT it's designed specifically for your problem (parallel network ops)
- AND the payoff is huge (10-20x faster)
- AND once you understand it, it becomes your default tool
๐ Before You Continueยถ
Make sure you have:
- โ Completed all Beginner Tutorials
- โ Successfully run Tutorial #3 on at least 5 devices
- โ Observed how long it takes (30+ min for many devices)
- โ Understood the sequential bottleneck
When you're ready, Tutorial #2 โ will teach you to solve this problem with Nornir.
๐ Questions Before Moving On?ยถ
"Do I really need Nornir?"
- If you have <20 devices and no growth expected: Probably not โ Tutorial #3 is enough
- If you have >50 devices or growth expected: Absolutely yes
- If you're in between: Consider learning it so you're prepared as you scale
"Will I still use the Tutorial #3 approach?"
- Yes, for quick/one-off scripts
- But for anything you'll run more than once or scale: Nornir
"Is Nornir hard to learn?"
- Moderate difficulty (Tutorial #2 makes it accessible)
- But the concepts are universal (async I/O, task-based automation)
- Worth the investment
Continue to Tutorial #2: Nornir Fundamentals โ
โ Back to Intermediate Tutorials
Need help applying this in a live Cisco environment?
If you want this pattern implemented, governed, or adapted for your estate, use the contact page to start a discovery conversation or review how Nautomation Prime delivers engagements.