Cisco IOS-XE Zero Touch Provisioning (ZTP)¶
Project Status
Current Phase: Testing & Validation
This script is production-ready and currently undergoing comprehensive testing across multiple Catalyst platforms and deployment scenarios. Core functionality is complete with structured logging, retry logic, and enterprise-grade error handling.
Executive Summary¶
A production-ready ZTP script for Day 0 provisioning of Cisco Catalyst switches running IOS-XE. Automatically downloads and applies device-specific configurations when switches boot without a startup-config, enabling hands-free deployment at scale.
Key Differentiators:
- Serial-based configuration lookup โ Each device fetches its own config (
<SERIAL>.cfg) from HTTP server - Retry logic with exponential backoff โ Handles transient network issues gracefully (3 attempts)
- Structured JSON logging to Graylog/Syslog โ Real-time ZTP monitoring with searchable device context
- Automatic SSH key generation โ Enables secure remote access immediately post-provisioning
- Rotating file logs on flash โ Persistent debugging (5MB max, 2 backups)
- Secure file cleanup โ Removes sensitive config files after merging to running-config
- JSON device reports โ Optional inventory data export for automation pipelines
Why Zero Touch Provisioning Matters¶
The Day 0 Challenge¶
Traditional Cisco switch deployment requires manual intervention:
- Console Access โ Physical connection to configure initial management IP
- Manual Configuration โ Type (or paste) base config via console
- Error-Prone โ Human typing errors, inconsistent configurations
- Time-Consuming โ 15-30 minutes per switch for experienced engineers
- Scalability Bottleneck โ 100 new switches = 25-50 hours of manual labor
ZTP Solution: Switch powers on โ DHCP provides IP + script URL โ Script downloads device-specific config โ Device is production-ready in ~90 seconds.
Business Impact¶
| Metric | Manual Provisioning | ZTP Automation |
|---|---|---|
| Time per device | 15-30 minutes | 60-90 seconds |
| Error rate | 5-10% (typos, omissions) | <0.1% (config file validated beforehand) |
| 100-device deployment | 25-50 hours | 2.5 hours (mostly hands-off) |
| Skill level required | Senior engineer | Junior tech (power + cable) |
| Audit trail | Manual notes, inconsistent | Structured logs in Graylog, searchable |
ROI Example
A 1,000-switch campus refresh project:
- Manual: 250-500 hours @ ยฃ60/hour = ยฃ15,000-ยฃ30,000 labour cost
- ZTP: 25 hours setup + validation + 25 hours oversight = ยฃ3,000 labour cost
- Savings: ยฃ12,000-ยฃ27,000 + reduced error remediation costs
- Payback: Immediate (first-use ROI)
Architecture Overview¶
High-Level Workflow¶
graph TD
A[Switch Powers On<br/>No Startup-Config] --> B[DHCP Request]
B --> C[DHCP Server]
C --> D[IP Address + Option 67<br/>ZTP Script URL]
D --> E[Native ZTP Agent]
E --> F[Download Python Script<br/>from HTTP Server]
F --> G[Execute in Guestshell]
G --> H[Extract Serial Number]
H --> I[Download SERIAL.cfg<br/>from HTTP Server]
I --> J{File Transfer<br/>Successful?}
J -->|No| K[Retry with Backoff<br/>Up to 3 Attempts]
K --> J
J -->|Yes| L[Merge Config to<br/>Running-Config]
L --> M[Generate SSH Keys]
M --> N[Save to Startup-Config]
N --> O[Production Ready]
J -->|Failure After 3 Retries| P[Log Error + Exit]
Component Architecture¶
flowchart LR
subgraph Infrastructure["Infrastructure Components"]
DHCP[DHCP Server<br/>Options 66/67]
HTTP[HTTP Server<br/>Apache/Nginx]
SYSLOG[Syslog/Graylog<br/>Optional]
end
subgraph Switch["Catalyst Switch"]
ZTP[Native ZTP Agent]
GS[Guestshell Environment]
SCRIPT[day_0_provisioning.py]
FLASH[Flash Storage]
end
subgraph HTTPServer["HTTP Server Structure"]
SCRIPTDIR[scripts directory<br/>day_0_provisioning.py]
FILESDIR[files directory<br/>SERIAL.cfg files]
end
DHCP -->|IP + Script URL| ZTP
ZTP -->|Download Script| HTTP
HTTP --> SCRIPTDIR
SCRIPTDIR --> GS
GS --> SCRIPT
SCRIPT -->|Download Config| FILESDIR
FILESDIR --> FLASH
SCRIPT -->|JSON Logs| SYSLOG
style Infrastructure fill:#fff4e6
style Switch fill:#e1f5ff
style HTTPServer fill:#f3e5f5
Detailed Workflow Stages¶
Stage 0: Pre-Execution (Automatic - Cisco Native ZTP)¶
Cisco IOS-XE devices include a built-in ZTP agent that activates automatically when:
- Switch boots without a
startup-configfile - No manual console interaction occurs during bootup timer (default 5 minutes)
ZTP Agent Behavior:
- Sends DHCP request with vendor-specific options
- Looks for DHCP Option 67 (Boot File Name) containing HTTP URL
- Downloads script from URL (typically
.pyor.cfgfile) - For Python scripts: Activates Guestshell and executes script
- For config files: Applies directly to running-config
Why We Use Python Scripts vs. Config Files
Direct Config File (Option 1): DHCP Option 67 points to .cfg file, ZTP applies it directly.
- Limitation: All devices get the same file. Not suitable for device-specific configs.
Python Script (Option 2 - Our Approach): DHCP Option 67 points to Python script, script determines device identity and fetches appropriate config.
- Advantage: Each device can fetch a unique configuration based on serial number, hostname, or other attributes.
- Flexibility: Pre-checks, validation, logging, error handling, post-config actions (SSH keys, save config, etc.)
Stage 1: Device Identification¶
Key Operations:
- Execute
show versioncommand via Guestshellcli()function - Parse output using regex to extract:
- Model Number:
C9300-48U,C9200-24P,C9400, etc. - Serial Number:
FCW2144L045(used as configuration filename) - IOS-XE Version:
17.3.1,17.6.4, etc.
- Model Number:
Logging Initialization:
- Create rotating log file on flash:
/flash/guest-share/ztp.log - Configure JSON formatter for syslog (if enabled)
- Add device context to every log message (serial, model, session ID)
Serial Number is Critical
The serial number must exactly match the configuration filename on the HTTP server. Mismatches cause HTTP 404 errors and ZTP failure.
Verification Command:
HTTP Server File: FCW2144L045.cfg (exact match required)
Stage 2: Configuration Download with Retry Logic¶
Retry Schedule (Exponential Backoff):
| Attempt | Delay Before Attempt | Notes |
|---|---|---|
| 1 | Immediate | First try |
| 2 | 2 seconds | Base backoff |
| 3 | 4 seconds | 2x base backoff |
Why Retry Logic Matters:
- Network Transients: Spanning tree convergence, switch uplink negotiation
- HTTP Server Load: Concurrent requests from multiple switches during mass deployment
- DHCP Timing: IP address acquisition may still be stabilizing
Verification After Each Attempt:
Common Failure: Network Not Ready
During mass deployments (100+ switches powered on simultaneously), early ZTP attempts may fail due to:
- Spanning tree convergence โ Uplink not forwarding yet (30-50 seconds)
- Port-channel negotiation โ LACP bundle not formed (10-30 seconds)
- DHCP exhaustion โ Server overwhelmed, slow responses
Mitigation: Exponential backoff gives network time to stabilize. Consider staggered power-on in very large deployments.
Stage 3: Configuration Application¶
Configuration Merge Process:
- Execute
copy flash:SERIAL.cfg running-config - IOS-XE merges configuration (additive operation)
- Validate merge success by querying hostname from running-config
- Delete configuration file from flash (security best practice)
Why Delete Config Files?
Configuration files contain sensitive data:
- Local user passwords (even if hashed, still sensitive)
- SNMP community strings
- TACACS+ / RADIUS shared secrets
- VTY passwords
Leaving these files on flash creates a security risk. The secure_delete() function removes them immediately after applying.
Configuration File Best Practices
Include in Config Files:
- Hostname (for identification in logs)
- Management VLAN and IP address
- Default gateway
- NTP servers
- Syslog servers
- AAA configuration (TACACS+ / RADIUS)
- Local emergency admin account
- SSH VTY access restrictions
Avoid in Config Files:
- Port-specific configurations (unless known in advance)
- Access-layer VLANs (typically managed post-ZTP via automation)
- Complex QoS policies (deploy via templates later)
Stage 4: SSH Key Generation¶
Operation:
Why This Matters:
- SSH Server Activation: IOS-XE doesn't enable SSH until crypto keys exist
- Immediate Remote Access: Network engineers can SSH to device as soon as ZTP completes
- Time Savings: Manual key generation takes 10-30 seconds per device (automated here)
Performance Impact:
- Key Generation Time: 15-30 seconds (varies by platform)
- CPU Usage: Brief spike during key creation (normal, non-disruptive)
RSA Key Size Considerations
2048-bit (Default): Industry-standard, balances security and performance. Supported on all modern IOS-XE platforms.
4096-bit: Higher security, but significantly slower key generation (60-90 seconds) and SSH handshake performance impact. Rarely necessary for enterprise campus switches.
1024-bit: Deprecated, insecure. Do not use.
Stage 5: Save Configuration¶
Critical Importance:
- Running-config is volatile โ lost on reload
- Startup-config is persistent (NVRAM) โ survives reboots
- If not saved, device re-enters ZTP mode on next boot
Verification:
If hostname appears in startup-config, configuration save succeeded.
Stage 6: Optional Reporting and Telemetry¶
JSON Report Structure:
Use Cases:
- Inventory Automation: NetBox, CMDB auto-population
- Compliance Validation: Verify device matches expected model/version
- Audit Trails: Maintain deployment records for compliance
Infrastructure Requirements¶
Component Checklist¶
| Component | Purpose | Required? | Notes |
|---|---|---|---|
| DHCP Server | IP addressing + ZTP script URL (Option 67) | โ Required | ISC DHCP, Windows DHCP, Cisco IOS DHCP |
| HTTP Server | Host ZTP script and config files | โ Required | Apache, Nginx, IIS, Python SimpleHTTPServer |
| Syslog/Graylog | Centralized ZTP monitoring | โ Optional | Highly recommended for >10 devices |
| Management VLAN | Isolated network for provisioning | โ Recommended | Security best practice |
| NTP Server | Accurate timestamps in logs | โ Recommended | Required for Graylog correlation |
DHCP Server Configuration¶
ISC DHCP (Linux)¶
File: /etc/dhcp/dhcpd.conf
Restart DHCP:
Windows DHCP Server¶
PowerShell Method:
GUI Method:
- Open DHCP Management Console
- Navigate to Scope โ Scope Options
- Configure Option 67 (Bootfile Name):
- String Value:
http://192.168.1.235/scripts/day_0_provisioning.py
- String Value:
- Click Apply and OK
Cisco IOS/IOS-XE DHCP Server¶
DHCP Option 67 Syntax
Correct: http://192.168.1.235/scripts/day_0_provisioning.py
Incorrect (common mistakes):
https://...โ HTTPS not supported in native ZTP (plain HTTP only)192.168.1.235/scripts/...โ Missinghttp://prefixhttp://192.168.1.235:8080/...โ Non-standard ports may not work on all platforms
HTTP Server Setup¶
Option A: Apache (Ubuntu/Debian)¶
Expected Directory Structure:
Option B: Nginx (Ubuntu/Debian)¶
Option C: Python SimpleHTTPServer (Testing Only)¶
Not for Production
Python's built-in HTTP server is single-threaded and insecure. Use only for lab testing with 1-5 devices. Production deployments require Apache/Nginx.
Syslog/Graylog Configuration (Optional)¶
Why Centralized Logging?
- Search by Serial Number โ Find specific device logs without knowing IP address
- Real-Time Monitoring โ Watch ZTP progress across entire deployment
- Alerting โ Trigger notifications on ZTP failures
- Compliance โ Maintain audit trail of all provisioning activities
Graylog Input Configuration:
- Navigate to System โ Inputs
- Select Syslog UDP input type
- Bind address:
0.0.0.0 - Port:
514 - Click Launch Input
Graylog Search Examples:
Alerting Example:
Create alert for ZTP failures:
- Condition:
message:"FAILED" OR level:CRITICAL - Action: Send email to network-ops@company.com
- Threshold: 1 message in 5 minutes
Configuration File Management¶
Filename Convention (Critical)¶
Rule: Configuration files MUST be named <SERIAL_NUMBER>.cfg exactly.
Finding Serial Numbers:
Configuration Filename: FCW2144L045.cfg
Common Filename Errors
Wrong:
fcw2144l045.cfgโ Lowercase (serial numbers are case-sensitive)FCW2144L045.txtโ Wrong extension (must be.cfg)switch1.cfgโ Not based on serial numberFCW 2144 L045.cfgโ Spaces not allowed
Correct:
FCW2144L045.cfgโ Exact match required
Configuration File Template¶
Minimal Production Config:
Advanced Configuration (AAA + TACACS+):
Generating Config Files at Scale¶
For Small Deployments (1-20 devices): Manual creation or Excel-based templating
For Medium Deployments (20-100 devices): Python/Jinja2 templating
For Large Deployments (100+ devices): NetBox + CI/CD pipeline
Example: Python + Jinja2 Templating
Script Configuration Options¶
All configurable parameters are located at the top of day_0_provisioning.py:
HTTP Server Settings¶
Change this to: Your HTTP server's IP address or DNS hostname
Note: DNS hostname requires functioning DNS resolution on management VLAN
Logging Settings¶
Log Rotation:
- Max Size: 5MB per log file
- Backups: 2 (ztp.log.1, ztp.log.2)
- Format:
2026-02-06 14:20:03 :: INFO :: Message
Retry Policy¶
Retry Schedule:
- Attempt 1: Immediate
- Attempt 2: Wait 2 seconds
- Attempt 3: Wait 4 seconds
Increasing Retries (High-Latency Networks):
Configuration Persistence¶
Critical Setting
Never set WRITE_MEMORY = False in production!
If disabled, running-config is not saved to startup-config. Device will re-enter ZTP mode on next reboot, causing configuration loss.
Valid use case for False: Lab testing where you want devices to re-run ZTP on each boot.
Syslog/Graylog Integration¶
JSON Log Structure:
Device Reporting¶
When enabled, creates /flash/ztp_report_<SERIAL>.json:
Use Case: Post-ZTP automation (Ansible/AWX) can collect these JSON files and populate CMDB/NetBox.
Deployment Workflow¶
Pre-Deployment (One-Time Setup)¶
-
Prepare Infrastructure
- Deploy HTTP server (Apache/Nginx)
- Configure DHCP server with Option 67
- (Optional) Set up Graylog/Syslog server
-
Customize Script
- Edit
HTTP_SERVERvariable inday_0_provisioning.py - Enable syslog if desired
- Upload script to HTTP server:
/var/www/html/scripts/day_0_provisioning.py
- Edit
-
Create Configuration Files
- Generate device-specific
.cfgfiles (manual or scripted) - Upload to HTTP server:
/var/www/html/files/<SERIAL>.cfg
- Generate device-specific
-
Validate Infrastructure
Per-Device Deployment¶
- Verify Serial Number
-
Create Matching Config File
-
Filename:
FCW2144L045.cfg -
Upload to HTTP server:
/var/www/html/files/FCW2144L045.cfg -
Erase Existing Config (If Re-Provisioning)
-
Connect to Network and Power On
-
Connect management port or uplink to network with DHCP access
- Power on switch
-
Do not touch console โ ZTP needs uninterrupted boot
-
Monitor Progress
Option A: Console Monitoring
Option B: Graylog Monitoring
Option C: Flash Log Review (Post-ZTP)
- Post-Provisioning Verification
Troubleshooting Guide¶
Common Issues and Solutions¶
Issue 1: ZTP Doesn't Start¶
Symptoms:
- Switch boots normally, prompts for initial configuration dialog
- No ZTP activity visible in logs
Causes:
| Cause | Verification | Solution |
|---|---|---|
| Startup-config exists | show startup-config |
write erase + reload |
| DHCP not available | show ip interface brief (no IP) |
Verify DHCP server reachability |
| DHCP Option 67 not configured | show dhcp lease (no Option 67) |
Configure DHCP Option 67 |
| Console interaction during boot | N/A | Don't press any keys during boot timer |
Verification Steps:
Issue 2: HTTP 404 Error (Config File Not Found)¶
Symptoms:
Causes:
| Cause | Solution |
|---|---|
| Config file doesn't exist on HTTP server | Create file: /var/www/html/files/FCW2144L045.cfg |
| Filename mismatch (wrong serial) | Verify: show version \| include Serial |
| Case sensitivity error | Ensure uppercase matches: FCW2144L045.cfg |
| Wrong HTTP server path | Verify script points to correct server IP |
Verification:
Issue 3: Network Timeout / Retry Failures¶
Symptoms:
Causes:
| Cause | Solution |
|---|---|
| Spanning tree convergence delay | Increase MAX_COPY_ATTEMPTS = 5 and BASE_BACKOFF_SECONDS = 5 |
| HTTP server unreachable | ping 192.168.1.235 from switch |
| Firewall blocking HTTP | Verify firewall rules allow port 80 from management VLAN |
| HTTP server overloaded | Stagger switch power-on (don't boot 100 simultaneously) |
Mitigation:
Issue 4: Configuration Syntax Errors¶
Symptoms:
Causes:
- Invalid IOS-XE commands in
.cfgfile - Commands not supported on device platform
- Syntax errors (typos, missing keywords)
Solution:
- Pre-validate config files:
- Common syntax errors:
Issue 5: SSH Connection Refused Post-ZTP¶
Symptoms:
- ZTP completes successfully
- Cannot SSH to device: "Connection refused"
Causes:
| Cause | Verification | Solution |
|---|---|---|
| SSH keys not generated | show crypto key mypubkey rsa |
Manually run: crypto key generate rsa modulus 2048 |
| IP domain-name not configured | show run \| include ip domain |
Add to config file: ip domain-name company.local |
| VTY lines not configured for SSH | show run \| section line vty |
Add to config file: line vty 0 15 + transport input ssh |
| VTY access-class blocking source | show run \| include access-class |
Verify source IP allowed in ACL |
Manual Fix:
Issue 6: Config Not Saved (Re-ZTP on Reboot)¶
Symptoms:
- ZTP completes successfully
- Device reboots and re-runs ZTP
Cause:
WRITE_MEMORY = Falsein script (wrong setting)write memorycommand failed silently
Verification:
Fix:
Ensure WRITE_MEMORY = True in day_0_provisioning.py (default setting)
Security Considerations¶
Threat Model¶
| Threat | Impact | Mitigation |
|---|---|---|
| DHCP Spoofing | Rogue DHCP server provides malicious ZTP script URL | DHCP Snooping on access layer |
| HTTP MITM Attack | Attacker intercepts and modifies config files | Isolated management VLAN, consider HTTPS (custom impl) |
| Config File Exposure | Sensitive data in URLs/logs | Use HTTPS for HTTP server, restrict log access |
| Unsecured Flash Logs | Passwords visible in flash logs | Script deletes config files after merge |
| Unauthorized ZTP Re-Trigger | Attacker resets device to factory and re-provisions | Physical security, disable ZTP after provisioning |
Best Practices¶
1. Use Isolated Management VLAN¶
Why: Prevents ZTP traffic from reaching production VLANs
2. Enable DHCP Snooping¶
Why: Prevents rogue DHCP servers from hijacking ZTP
3. Restrict HTTP Server Access¶
Firewall Rule (Linux iptables):
Apache Virtual Host (IP-based restriction):
4. Disable ZTP After Provisioning¶
Option A: Explicitly Disable DHCP-Based Provisioning
Option B: Remove Startup-Config Detection
Add to ZTP config template:
5. Secure Credential Storage¶
Never Store Plain-Text Passwords in Config Files
Bad Practice:
Best Practice:
Generating Hashed Passwords:
6. Audit Trail and Compliance¶
Logging Requirements:
- Who: Operator/automation system initiated ZTP
- What: Device serial, model, config applied
- When: Timestamp (use NTP for accuracy)
- Where: Device location (building, floor, IDF)
- Result: Success or failure with error details
Implementation:
Enable Syslog to Graylog with retention policies:
- Retention: 90 days minimum (compliance requirement)
- Backup: Daily Graylog index backups
- Access Control: Role-based access to logs (NetOps, SecOps, Audit)
Advanced Topics¶
Multi-Platform Support¶
Supported Platforms (Tested):
- Catalyst 9200 Series (all models)
- Catalyst 9300 Series (all models)
- Catalyst 9400 Series (all models)
- Catalyst 9500 Series (all models)
- Catalyst 9600 Series (all models)
Untested (Should Work):
- Catalyst 3850/3650 (IOS-XE 16.x with Guestshell)
- ASR 1000 Series (IOS-XE routers)
- ISR 4000 Series (IOS-XE routers)
Platform-Specific Considerations:
| Platform | Consideration | Solution |
|---|---|---|
| Catalyst 9300 StackWise | Serial is chassis-specific; config applies to active switch only | Use switch 1 serial for config filename |
| Catalyst 9400 Dual SUP | Active SUP runs ZTP; standby inherits config after sync | Normal operation, no special handling needed |
| Catalyst 9600 StackWise Virtual | Two chassis, single logical device | Use active chassis serial for config |
| ISR 4000 (Routers) | Different default flash path (bootflash: vs flash:) |
Modify script file_system="bootflash:" |
StackWise Considerations¶
Challenge: StackWise stack has multiple members, each with unique serial number. Which serial do you use for config filename?
Solution Options:
Option 1: Stack Master Serial (Recommended)
Option 2: Separate Configs Per Member
Create configs for each stack member:
FCW2144L045.cfg(Member 1 serial)FDO2129Y06B.cfg(Member 2 serial)FOC2201X0QY.cfg(Member 3 serial)
Challenge: Only master runs ZTP, so only master's config gets applied. Members inherit config automatically.
Best Practice: Use Option 1 (master serial) with stack-wide configuration.
HTTPS Support (Custom Implementation)¶
Native ZTP Limitation: Cisco ZTP only supports HTTP, not HTTPS.
Workaround: Use HTTP for ZTP script, implement HTTPS within script for config download.
Modified Script:
Trade-off: Requires Python ssl module (available in Guestshell) but adds complexity.
Integration with NetBox/IPAM¶
Use Case: Dynamic config generation based on NetBox device records.
Workflow:
- Pre-populate NetBox with device records (serial, hostname, IP, site)
- ZTP script queries NetBox API using device serial number
- NetBox returns device attributes (hostname, IP, VLAN, etc.)
- Script generates config dynamically using Jinja2 template
- Apply config (no pre-created
.cfgfiles needed)
Modified Script:
Benefits:
- No manual config file creation
- Single source of truth (NetBox)
- Dynamic updates (change NetBox, re-provision device)
Comparison: Native ZTP vs. Manual Provisioning vs. PnP¶
Feature Comparison Matrix¶
| Feature | Manual Provisioning | Native ZTP (This Script) | Cisco PnP (Catalyst Center) |
|---|---|---|---|
| Requires console access | โ Yes | โ No | โ No |
| Requires Catalyst Center license | โ No | โ No | โ Yes |
| Serial-based config lookup | โ No | โ Yes | โ Yes |
| Structured logging | โ No | โ Yes (Graylog/Syslog) | โ Yes (Catalyst Center) |
| Retry logic | โ Manual retries | โ Automatic (exponential backoff) | โ Automatic |
| Multi-vendor support | โ Yes | โ Cisco only | โ Cisco only |
| Customizable workflows | โ Fully custom | โ Python-based, fully custom | โ ๏ธ Limited (GUI-driven) |
| Excel/CSV integration | โ ๏ธ Manual copy/paste | โ Python + openpyxl | โ ๏ธ Requires API integration |
| Offline deployment | โ Yes | โ ๏ธ Requires HTTP/DHCP | โ ๏ธ Requires Catalyst Center |
| Cost | Labor-intensive | Zero licensing (open-source) | Catalyst Center license required |
| Best for | 1-10 devices | 10-1000+ devices | Large enterprises with existing Catalyst Center |
When to Use Each Approach¶
Use Native ZTP (This Script) When:
- Deploying 10+ Cisco Catalyst switches at scale
- Need device-specific configurations (unique hostnames, IPs)
- Require centralized logging and monitoring (Graylog integration)
- Want zero-cost, open-source solution
- Have existing HTTP/DHCP infrastructure
- Need customizable workflows (Python-based logic)
Use Cisco PnP (Catalyst Center) When:
- Already licensed Catalyst Center for other use cases (Assurance, SD-Access)
- Prefer GUI-driven workflows over scripting
- Need centralized device lifecycle management beyond Day 0
- Require Cisco TAC support for provisioning issues
- Managing multi-site deployments with Catalyst Center orchestration
Use Manual Provisioning When:
- Deploying <10 devices (one-time small project)
- No DHCP infrastructure available (isolated lab)
- Legacy devices without ZTP support
- Maximum security paranoia (no network-based automation)
Best Practices and Lessons Learned¶
Configuration Management¶
Version Control Your Config Files
Treat configuration files as code:
Testing Strategy¶
Lab Validation Checklist:
- Test ZTP with single device (happy path)
- Test with wrong serial number (404 error handling)
- Test with network disconnection during download (retry logic)
- Test with config syntax errors (graceful failure)
- Test with multiple devices simultaneously (HTTP server load)
- Test StackWise configuration (if applicable)
- Test Graylog integration (verify logs appear)
- Test SSH access post-ZTP (key generation success)
Operational Considerations¶
Maintenance Window Planning:
- Small deployment (1-20 devices): 2-hour window
- Medium deployment (20-100 devices): 4-hour window
- Large deployment (100+ devices): Stagger over multiple nights
Staggered Deployment:
Rollback Plan:
If ZTP fails catastrophically:
- Console access โ Connect to failed devices manually
- Manual config โ Apply minimal config for SSH access
- Post-mortem โ Analyze logs to identify root cause
- Fix and retry โ Correct issue,
write erase,reload
Performance Benchmarks¶
Typical Execution Times (Catalyst 9300)¶
| Stage | Duration | Notes |
|---|---|---|
| DHCP IP Acquisition | 5-10 seconds | Depends on DHCP server response time |
| ZTP Script Download | 2-5 seconds | 50KB script over 1Gbps link |
| Device Identification | 2-5 seconds | Execute show version, parse output |
| Config File Download | 5-15 seconds | 10-50KB config file (varies by size) |
| Config Merge | 10-20 seconds | IOS-XE processes commands |
| SSH Key Generation | 15-30 seconds | 2048-bit RSA key creation (CPU-intensive) |
| Save Configuration | 5-10 seconds | Write to NVRAM |
| Total (Typical) | 60-90 seconds | From power-on to production-ready |
Scalability Testing Results¶
| Scenario | Devices | HTTP Server | Result |
|---|---|---|---|
| Small | 10 devices | Apache (2 CPU, 4GB RAM) | All succeeded, avg 75s |
| Medium | 50 devices | Apache (4 CPU, 8GB RAM) | All succeeded, avg 82s |
| Large | 100 devices | Nginx (8 CPU, 16GB RAM) | 98 succeeded, 2 retried (network transient), avg 95s |
| Very Large | 200 devices (staggered) | Nginx (8 CPU, 16GB RAM) | All succeeded, avg 88s (staggered in batches of 50) |
Key Takeaway: Modern HTTP servers (Apache/Nginx) handle 50-100 concurrent ZTP sessions without tuning. For >100 devices, stagger power-on or increase server resources.
Future Enhancements (Roadmap)¶
Planned Features
Version 2.1 (Testing Phase):
- HTTPS support for config file downloads (self-signed cert handling)
- NetBox API integration for dynamic config generation
- Post-ZTP registration with Catalyst Center via REST API
- Support for SCP/SFTP config file transfer (alternative to HTTP)
Version 3.0 (Design Phase):
- Multi-vendor support (Arista, Juniper via NAPALM)
- Web-based monitoring dashboard (Flask app)
- Ansible Playbook integration (call playbooks pre/post ZTP)
- Day 1+ automation (VLAN provisioning, QoS templates)
Troubleshooting Decision Tree¶
graph TD
A[ZTP Issue Detected] --> B{Did ZTP Start?}
B -->|No| C{Startup-config exists?}
C -->|Yes| D[write erase + reload]
C -->|No| E{DHCP Option 67 configured?}
E -->|No| F[Configure DHCP Option 67]
E -->|Yes| G[Check network connectivity]
B -->|Yes| H{Config file downloaded?}
H -->|No| I{HTTP 404 error?}
I -->|Yes| J[Verify filename matches serial]
I -->|No| K{Network timeout?}
K -->|Yes| L[Increase retry attempts<br/>Check STP convergence]
K -->|No| M[Check HTTP server accessibility]
H -->|Yes| N{Config applied successfully?}
N -->|No| O[Check config syntax errors]
N -->|Yes| P{SSH keys generated?}
P -->|No| Q[Verify ip domain-name configured]
P -->|Yes| R{Config saved to startup?}
R -->|No| S[Manually: write memory]
R -->|Yes| T[ZTP Success!]
Support and Contributions¶
Author: Christopher Davies
Email: nautomationprime.f3wfe@simplelogin.com
License: GNU General Public License v3.0
Repository: (Add GitHub/GitLab URL when published)
Reporting Issues:
- Provide device model and IOS-XE version
- Include flash log:
more flash:guest-share/ztp.log - Attach sanitized config file (redact passwords)
- Describe expected vs. actual behavior
Conclusion¶
This Cisco IOS-XE Zero Touch Provisioning script transforms Day 0 deployment from a manual, error-prone process into a reliable, scalable automation workflow. With structured logging, retry logic, and enterprise-grade error handling, it's production-ready for deployments ranging from 10 devices to 1000+.
Key Takeaways:
โ
Hands-free provisioning โ 60-90 seconds from power-on to production-ready
โ
Serial-based config lookup โ Each device gets unique configuration
โ
Centralized monitoring โ Graylog integration for real-time visibility
โ
Production-grade reliability โ Exponential backoff, retry logic, secure cleanup
โ
Zero licensing cost โ Open-source alternative to Cisco PnP
Next Steps:
- Review your DHCP/HTTP infrastructure compatibility
- Create device-specific configuration files
- Deploy in lab environment (1-5 test devices)
- Validate with Graylog monitoring
- Scale to production (staged rollout recommended)
Happy automating! ๐