Cisco Compliance Audit
Deep Dive: Cisco IOS-XE Compliance Audit¶
"Policy-Driven Compliance, Engineered for Real Networks."¶
Version Alignment
This deep dive reflects Cisco IOS-XE Compliance Auditor v4.0 (March 2026) and includes the remediation lifecycle workflow (--remediation-list, approve/reject, apply, bulk apply) and ROI reporting features.
The Cisco IOS-XE Compliance Audit tool is a role-aware, policy-driven audit framework for Cisco switching and routing estates. It connects to devices (directly or through a jump host), collects operational and configuration state, classifies every interface by intent, runs 90+ toggleable compliance checks, and generates actionable reports with remediation commands.
This is one of the most comprehensive projects in the Nautomation Prime ecosystem, and this guide is intentionally thorough so your team can move from "we ran a script" to "we can defend every check and every result."
✨ Why This Tool Matters¶
Most compliance scripts fail in production because they are:
- Hardcoded and brittle
- Blind to topology and role context
- Too noisy for operations teams
- Weak on remediation guidance
This auditor solves that with:
- Policy-as-data in YAML: Every check can be enabled or disabled
- Role-aware logic: Access vs core vs SD-WAN vs industrial behavior
- Port-intent classification: ACCESS, TRUNK_UPLINK, TRUNK_DOWNLINK, TRUNK_ENDPOINT, UNUSED, ROUTED, and more
- Operational output: Rich console summaries, HTML dashboards, JSON, CSV, and per-device remediation scripts
- Remediation lifecycle workflow: Review packs, approvals, change-ticket linkage, expiry control, and guarded apply operations
- Bulk operations:
--remediation-approve-alland--remediation-apply-allfor scalable change windows - ROI reporting: Optional estimated time/value saved in console, JSON, and HTML outputs
- Offline mode: Validate policies against saved command outputs without live SSH
🎯 PRIME Philosophy in Practice¶
1. Transparency Over Magic¶
Checks are explicit and traceable. Every finding maps to a check key in YAML and a specific evaluation path in the engine.
2. Hardened for Production¶
The auditor uses concurrent workers, optional jump-host access, fallback parsing strategies, and safe failure behavior so one bad device does not invalidate an entire run.
3. Policy Before Code¶
Audit standards live in compliance_config.yaml, not hidden in Python conditionals. Teams can evolve policy without rewriting tooling.
4. Actionable Outcomes¶
A failed finding includes remediation intent, and the tool can compile per-device remediation snippets to accelerate fix cycles.
🧱 Project Architecture¶
Runtime Flow¶
graph TD
A[python -m compliance_audit] --> B[auditor.py]
B --> C[Load YAML policy + inventory]
B --> D[Credentials + optional jump host]
B --> E[ThreadPool device workers]
E --> F[collector.py gather show outputs]
F --> G[Genie parse + structured data]
G --> H[port_classifier.py classify interfaces]
H --> I[compliance_engine.py run enabled checks]
I --> J[report.py outputs]
J --> K[Console summary]
J --> L[HTML/JSON/CSV]
J --> M[Remediation + delta]
📦 Prerequisites and Platform Notes¶
- Python 3.10+
- SSH reachability to targets (direct or via jump host)
- Privileged access for command collection
- Dependencies from
requirements.txt
Windows + PyATS/Genie
Native Windows is not the recommended runtime for full Genie parsing support. Use WSL on Windows for production runs.
Install pattern:
🚀 Quick Start¶
🧭 CLI Reference (Operationally Important Flags)¶
Most useful real-world options:
--categories management_plane control_planeto run scoped audits--dry-run ./saved_outputsfor validation in change windows and CI--fail-threshold 80for pipeline quality gates--csv/--no-csvfor explicit report behavior--remediation-list pendingto view queued review packs--remediation-approve PACK_ID --approver NAME --ticket-id CHG_IDfor approval control--remediation-apply PACK_ID --apply-dry-runbefore any production push--remediation-apply-allfor approved bulk operations-vor-vvfor run-time diagnostics
🆕 What's New in v4.0¶
Key enhancements reflected in this deep dive update:
- Enterprise remediation lifecycle: Review packs are generated, tracked, and governed through approval and apply states.
- Ticket-aware approvals: Optional enforcement of change ticket IDs during approvals.
- Risk controls: High-risk command blocks are denied by default unless explicitly allowed.
- Preflight drift and identity checks: Apply paths can verify findings still fail and target hostname matches expected identity.
- Bulk lifecycle operations: Approve-all and apply-all workflows for large estates.
- ROI instrumentation: Optional effort/value estimation embedded in reports.
⚙️ Configuration Model¶
The tool is centered around a primary YAML policy file and a separate inventory file.
1. Audit Settings¶
Examples include:
max_workerscollect_timeoutoutput_dirhtml_report,json_report,csv_reportparking_vlan,native_vlan
These govern runtime behavior and report generation, not compliance logic itself.
2. Connection Settings¶
Controls transport behavior:
- Jump-host usage
- Device type
- Credential backend mode
- Retry and timeout tuning
3. Inventory¶
Device list is maintained separately from policy, which is essential at scale.
4. Classification Settings¶
- Hostname role-code mapping
- Endpoint-neighbor signature patterns
These values drive trunk direction inference and role-specific checks.
5. Compliance Policy¶
Every check follows the same pattern:
This allows governance teams to tailor standards without code edits.
🧠 Core Engine Concepts¶
1) Structured Collection First¶
collector.py gathers key show commands and parses them into structured models (Genie preferred, with fallback behavior when unavailable).
This provides stable inputs for compliance checks and avoids fragile single-line CLI scraping.
2) Parse Running Config Into Queryable Sections¶
The running config is transformed into:
- Global lines
- Per-interface blocks
- Per-line-config blocks (e.g., VTY/console)
This gives the engine consistent helpers for checks like "present globally" vs "present on interface".
3) Classify Every Interface by Intent¶
port_classifier.py combines signals from:
- STP root-port state
- CDP/LLDP neighbor identity
- Hostname role parsing
- EtherChannel mapping
- Interface config and operational metadata
Result: checks are applied to the right interfaces for the right reasons.
4) Execute Enabled Checks by Category¶
compliance_engine.py runs check families only when enabled:
- Management plane
- Control plane
- Data plane
- Role-specific checks
This avoids policy drift between intended standards and actual enforcement.
🧬 Code Walkthrough: Why the Implementation Looks Like This¶
This section is the "under the hood" explanation many engineers ask for: not just what the tool does, but why the code is structured this way.
How to read this section
Snippets below are intentionally simplified to focus on the design pattern. They represent the production structure and decision logic used by the project.
1) CLI Entry Point and Exit Behavior¶
The entrypoint keeps the interface thin and delegates implementation detail to the orchestrator.
Why this design¶
- The CLI only parses intent and routes to
run_audit(...). - Quality-gate semantics are explicit via exit codes.
- This makes the tool CI-friendly: policy violations can block merges or releases.
Trade-off¶
- The process-level pass/fail is simple and strict.
- If teams need nuanced gating (for example, allow WARN but not FAIL in certain categories), that policy should be added intentionally rather than hidden in ambiguous CLI behavior.
2) Orchestrator Pattern and Concurrency Safety¶
The orchestrator builds per-device jobs and executes them with a thread pool.
Why this design¶
- Each device is isolated as an independent job.
- One failing device does not collapse the whole run.
- Throughput scales predictably with
max_workers.
Operational effect¶
- Large inventories complete quickly.
- Run outcomes stay deterministic enough for operations reporting.
Trade-off¶
- More concurrency increases pressure on jump hosts and AAA backends.
- The tool caps workers and keeps job payloads explicit to reduce accidental overload.
3) ParsedConfig Model: Avoid Regex Chaos¶
Instead of scanning full running config text repeatedly, the parser creates queryable sections.
Why this design¶
- Check code remains readable and testable.
- Management-plane, interface-plane, and line-level checks use a shared abstraction.
- Fewer parsing edge cases leak into compliance methods.
Design decision¶
- The parser treats many indented non-interface lines as globally searchable lines to preserve practical matching for router, crypto, and nested blocks.
4) Signal Fusion for Port Classification¶
The classifier does not trust a single signal. It combines STP, CDP/LLDP, EtherChannel, and interface metadata.
Why this design¶
- STP root-port signal is strong but not always complete.
- CDP/LLDP hostname signal adds role context.
- EtherChannel awareness avoids evaluating member links independently when policy should apply to the logical bundle.
Failure mode prevented¶
- Without this fusion, trunk direction can be misclassified, which leads directly to incorrect root-guard decisions.
5) Policy-Driven Check Execution¶
Checks are method-based, but all enablement is policy-driven.
Why this design¶
- New checks can be added without rewriting framework flow.
- Category filtering from CLI naturally maps to engine behavior.
- Teams can disable checks in YAML without code edits.
Trade-off¶
- There is intentional verbosity in check methods.
- That verbosity is a feature: explicit checks are easier to audit and safer to modify.
6) Finding Model: Standardized Audit Currency¶
Every check emits a normalized finding object.
Why this design¶
- A single schema powers console, HTML, CSV, JSON, and remediation generation.
- Reporting layers stay thin because they consume one consistent model.
- The remediation field converts detection into immediate action guidance.
7) Direction-Aware Guard Logic (Critical Example)¶
This is a signature implementation detail and a strong example of policy with topology context.
Why this design¶
- Security controls are not binary; they are context-dependent.
- The WARN path for unknown direction avoids false certainty.
Operational value¶
- Prevents dangerous guidance that would break spanning-tree stability.
8) Native VLAN Validation with Structured-Then-Fallback Logic¶
The trunk native VLAN check attempts structured data first, then falls back to interface config parsing.
Why this design¶
- Structured parsing gives better fidelity when available.
- Fallback logic keeps the check useful in imperfect collection conditions.
Trade-off¶
- Fallback parsing is less authoritative, so uncertain states become WARN rather than hard FAIL.
9) Remediation Script Generation Strategy¶
The remediation builder only includes FAIL findings with remediation commands and then organizes commands by scope.
Why this design¶
- Keeps output practical for engineers during maintenance windows.
- Prevents duplicate command spam.
- Preserves interface-level context where needed.
Important caution¶
- Generated snippets should still pass change control and peer review before deployment in production.
10) Delta Reporting for Continuous Compliance¶
The report layer compares the current JSON with the previous baseline.
Why this design¶
- Compliance is a trend, not a one-time score.
- Delta outputs make improvements and regressions visible and measurable.
Governance benefit¶
- Teams can prove that remediation work actually reduced risk over time.
11) Credential Chain and Operator Experience¶
Credential handling follows a strict lookup order: keyring, environment variables, then prompt.
Why this design¶
- Supports both fully automated and interactive operations.
- Avoids hardcoded secrets in config files.
- Can become hands-free after first secure run when keyring mode is enabled.
12) Offline Mode as a First-Class Engineering Path¶
Dry-run mode reads command outputs from files and runs the same engine path.
Why this design¶
- Reduces risk in policy development.
- Enables repeatable testing and CI validation.
- Decouples standards engineering from live network access constraints.
13) Design Principles You Can Reuse in Other Automation Projects¶
If you are building your own automation framework, these patterns are worth copying:
- Policy-as-data rather than hardcoded checks
- Normalized finding model consumed by all report channels
- Signal fusion for topology-aware decisions
- Structured-first, fallback-second parsing for robustness
- Delta tracking to measure posture changes over time
- Separation of orchestration, collection, classification, evaluation, reporting
These are the reasons this implementation scales beyond a lab script into a platform pattern.
🔎 Line-by-Line Spotlights: 5 Critical Checks¶
This is the practical "show me exactly how it thinks" section.
Each spotlight below breaks down:
- The logic path used by the check
- Why the design decision exists
- What operational outcome it creates
Spotlight 1: Root Guard (Direction-Aware STP Safety)¶
How to read this:
- Interface role is decided first; the check never assumes all trunks are equal.
- Downlinks are expected to enforce root guard.
- Uplinks must not enforce root guard, because that can block valid root behavior.
- Unknown direction downgrades certainty to WARN.
Why this design:
- STP controls are topology-dependent.
- A strict but context-aware model avoids both false PASS and dangerous FAIL guidance.
Operational outcome:
- Prevents outages caused by accidental root guard on uplinks.
- Surfaces real risk on downlinks without over-asserting where context is incomplete.
Operator Checklist
- Pre-check: Confirm the interface role classification (
TRUNK_UPLINKvsTRUNK_DOWNLINK) from the report before changing STP guard settings. - Change: Apply
spanning-tree guard rootonly on validated downlinks, and remove it from validated uplinks. - Post-check: Re-run the audit and verify downlinks show PASS for root guard while uplinks show PASS for no root guard.
- Safety check: If role remains unknown, do not enforce guard changes until topology intent is confirmed.
Spotlight 2: Native VLAN Validation (Structured Data With Safe Fallback)¶
How to read this:
- Try structured parser output first.
- If unavailable, parse interface config lines.
- If still unknown, emit WARN rather than hard FAIL.
Why this design:
- Structured parser data is preferred for accuracy.
- Fallback keeps checks useful in partial-data scenarios.
- WARN-on-unknown prevents false confidence.
Operational outcome:
- Better resilience during incomplete collection or parser variance.
- Fewer noisy false negatives when evidence quality is mixed.
Operator Checklist
- Pre-check: Verify expected native VLAN policy in YAML for uplinks, downlinks, and endpoint trunks.
- Change: Correct native VLAN on mismatched trunks using the defined policy value, not ad-hoc values.
- Post-check: Re-run the audit and ensure
trunk_native_vlanfindings move from FAIL or WARN to PASS. - Hygiene check: Investigate recurring WARN states to improve parser fidelity or command coverage.
Spotlight 3: DHCP Snooping Trust (Role-Based Interface Intent)¶
How to read this:
- Policy decides where trust should exist, not hardcoded assumptions.
- Check compares desired state against observed state.
- Failure includes exact remediation command.
Why this design:
- Some environments trust uplinks only; others trust specific downlinks too.
- The model supports both without rewriting engine logic.
Operational outcome:
- Reduces mis-scoped trust that can weaken DHCP protections.
- Keeps policy portable across sites with different designs.
Operator Checklist
- Pre-check: Validate trust intent in policy (
on_uplinks,on_downlinks) against your DHCP relay and gateway topology. - Change: Apply
ip dhcp snooping trustonly where policy indicates trust is required. - Post-check: Confirm audit findings align with intended trust boundaries and no extra trusted ports remain.
- Risk check: Review unexpected trusted interfaces manually before closing the change.
Spotlight 4: Unused Port Hardening (Defense-in-Depth by Default)¶
How to read this:
- This is a layered control stack, not a single condition.
- Each control is independently toggleable in policy.
- Each miss creates a specific failure with direct corrective action.
Why this design:
- Unused ports are frequent ingress points for misconfiguration and abuse.
- Independent toggles let governance teams phase controls without code forks.
Operational outcome:
- Hardens dormant edge interfaces consistently.
- Improves auditability because every missed layer is explicit.
Operator Checklist
- Pre-check: Confirm parking VLAN and unused-port standards for the site to avoid breaking reserved operational ports.
- Change: Apply shutdown, parking VLAN, BPDU guard, and CDP/LLDP restrictions in one controlled template pass.
- Post-check: Re-run the audit and verify all unused-port controls pass as a bundle.
- Exception check: Document approved exceptions in policy rather than leaving ports partially hardened.
Spotlight 5: Remediation Script Builder (Actionable Output, Not Just Findings)¶
How to read this:
- Only FAIL findings with remediation are considered.
- Commands are split into global and interface-scoped blocks.
- Duplicates are removed before rendering.
- Output is ready for controlled operational use.
Why this design:
- Engineers need fix-ready artifacts, not just pass/fail data.
- Scope separation avoids command ordering confusion.
Operational outcome:
- Speeds up remediation during change windows.
- Reduces human transcription errors from manual report reading.
Operator Checklist
- Pre-check: Review generated remediation commands line-by-line and remove anything outside approved change scope.
- Change: Apply remediation in a maintenance window, preferably in staged blocks (global first, then interface blocks).
- Post-check: Re-run the audit immediately to verify FAIL findings were resolved and no regressions were introduced.
- Governance check: Attach the generated script and post-audit results to the change record for traceability.
What These 5 Spotlights Demonstrate¶
Across all five examples, the core pattern is the same:
- Infer context first (role, direction, evidence quality)
- Apply policy second (enabled rules and expected states)
- Produce an actionable finding third (clear detail + remediation)
That sequence is exactly why this auditor works well in real enterprise operations.
🔬 How Port Classification Avoids False Positives¶
A major challenge in network compliance is avoiding wrong conclusions on trunk links. This project handles that with a layered signal model.
Trunk Direction Logic¶
Primary and secondary signals are combined:
- STP root-port election (strong signal)
- Neighbor role from CDP/LLDP hostname parsing (context signal)
This helps correctly label ports as:
TRUNK_UPLINKTRUNK_DOWNLINKTRUNK_UNKNOWNTRUNK_ENDPOINT
Why This Matters¶
Security checks are direction-sensitive. Example:
- Root guard on downlinks: expected
- Root guard on uplinks: dangerous and should fail
Without direction awareness, many "compliance" tools generate misleading guidance.
🌩️ Storm Control and STP Guard Enforcement¶
Two notable strengths of this auditor are speed-aware and direction-aware validations.
Storm Control¶
Checks can enforce threshold behavior based on interface speed tiers (10G/1G/100M), reducing one-size-fits-none policy mistakes.
BPDU Guard and Root Guard Matrix¶
Operational intent is encoded clearly:
- BPDU guard expected on access ports
- Root guard expected on downlink trunks
- Root guard on uplinks flagged as failure
- Unknown-direction trunks may produce warn-level findings for review
This is exactly the kind of nuanced behavior needed for enterprise-safe automation.
🛡️ Compliance Coverage by Domain¶
The check library spans governance domains rather than isolated commands.
Management Plane (Examples)¶
- Service hardening
- SSH hardening
- AAA/TACACS/RADIUS posture
- SNMP restrictions (including public/private community handling)
- Logging/NTP standards
- Banner and local account hygiene
- VTY and console control standards
Control Plane (Examples)¶
- STP global posture and priority behavior
- VTP mode requirements
- DHCP snooping controls
- Dynamic ARP inspection controls
- UDLD / errdisable / CoPP-related controls
Data Plane (Examples)¶
- Access-port hardening (portfast/BPDU guard/nonegotiate/port-security)
- Trunk policy (allowed VLAN pruning, native VLAN expectations)
- DHCP snooping trust and DAI trust by direction
- Unused-port lockdown patterns
- Routed-interface security checks
Role-Specific (Examples)¶
- Core should be STP root where expected
- Access should not be root
- Access uplink redundancy via port-channel
- Additional role-bound checks for specialized topologies
🧾 Reporting and Artifacts¶
This project is built to support both operators and auditors.
Console Summary¶
A compact score-driven table keeps terminal output readable while still surfacing pass/fail posture by device.
JSON (Per Device)¶
Machine-readable artifact for downstream pipelines and baselining.
CSV (Consolidated)¶
Cross-device tabular export suitable for governance dashboards, spreadsheets, and data ingest.
HTML Reports¶
- Per-device interactive pages
- Consolidated dashboard for multi-device audits
- Filtering, searching, and collapsible sections
Remediation Script Generation¶
For fail findings with known fixes, the tool can produce ready-to-apply IOS-XE snippets.
Important implementation detail:
- Commands are grouped globally and per-interface
- Duplicates are removed
- Port-channel members are remediated at the logical Port-channel where appropriate
📈 Baseline and Delta Tracking¶
The reporting pipeline can compare current results against the most recent baseline JSON for the same device.
This gives change visibility such as:
- New failures introduced
- Failures resolved
- Score movement over time
For teams running continuous compliance, this is essential for proving improvement rather than just generating snapshots.
🔐 Credential Strategy¶
Credential lookup order is intentionally practical:
- OS keyring (when enabled)
- Environment variables
- Interactive prompt fallback
When keyring mode is active, prompted/env credentials can be written back for future non-interactive runs.
Enable-secret support is environment-driven when required for privileged workflows.
🧪 Dry-Run Mode for Safer Testing¶
Use --dry-run to validate policies against saved command outputs.
This is useful for:
- CI pipelines
- Pre-production policy tuning
- Reproducible bug reports
- Secure environments where live device access is restricted
Expected structure:
🧩 Extending the Auditor Safely¶
Add a New Global Check¶
- Add a new key in the relevant YAML section with
enabled: true - Implement logic in the corresponding engine method
- Use existing helper patterns for consistency (
_present,_absent, finding model)
Add a New Per-Interface Check¶
- Define policy node under
data_plane - Implement in
_check_access_port,_check_trunk_port,_check_unused_port, or_check_routed_port - Reuse interface helper matching patterns for deterministic behavior
Add a New Device Role¶
- Extend
hostname_rolesin YAML - Add role-specific policy nodes
- Add/extend logic only if simple policy toggles are insufficient
🧯 Troubleshooting Patterns¶
Common issues and high-confidence fixes:
- Genie parser unavailable: install PyATS/Genie in Linux/macOS/WSL runtime
- No devices audited: verify inventory path or use explicit
--device - Role not parsed: use
hostname:ipformat for stronger role inference - TRUNK_UNKNOWN proliferation: verify CDP/LLDP visibility and hostname standards
- Connection failures: confirm SSH enablement, jump-host path, and credentials
🏁 Production Rollout Playbook¶
Recommended phased adoption:
- Start in dry-run mode with saved outputs
- Run management-plane only to validate baseline policy assumptions
- Enable full categories and review false positives with operations
- Adopt fail thresholds in CI/CD or pre-change validation
- Track deltas weekly and measure risk reduction trendlines
📋 Runbook Summary (Change Window Ready)¶
Use this as a single operational workflow that combines all five spotlight controls.
Need a compact printable version?
Phase 1: Pre-Change Validation¶
- Run the auditor and export current HTML, JSON, and CSV outputs.
- Confirm interface direction classification for all trunk controls before applying STP guard changes.
- Verify policy values for native VLAN, parking VLAN, and trust intent in YAML.
- Identify unknown-direction or unknown-native-VLAN findings and mark them for manual review.
- Prepare remediation script output, then review and prune to approved scope.
Phase 2: Controlled Implementation¶
- Apply root guard changes only where direction is validated.
- Correct trunk native VLAN mismatches using policy-defined values.
- Apply DHCP snooping trust only on policy-approved interfaces.
- Enforce unused-port hardening as a complete bundle (shutdown, parking VLAN, BPDU guard, CDP/LLDP restrictions).
- Execute remediation commands in staged order: global configuration first, interface blocks second.
Phase 3: Post-Change Verification¶
- Re-run the auditor immediately after changes.
- Confirm target findings moved from FAIL or WARN to PASS.
- Confirm no new failures were introduced in adjacent controls.
- Review consolidated HTML dashboard for cross-device regressions.
- Validate score movement and delta summary for each changed device.
Phase 4: Evidence and Governance Closure¶
- Attach before-and-after reports to the change record.
- Include generated remediation script and final executed command set.
- Document approved exceptions directly in policy, not as undocumented operational drift.
- Schedule follow-up run to confirm controls remain stable after normal operations resume.
- Capture lessons learned and update site-specific policy defaults.
Fast Pass Criteria
- No critical FAIL findings in changed scope.
- No unexpected score regressions on unaffected devices.
- Delta report shows resolved findings greater than or equal to new failures.
- Change record contains complete evidence package.
v4.0 Operational Shift
In addition to command-level remediation scripts, v4.0 introduces a governed remediation lifecycle (review packs -> approval -> apply). For day-to-day operations, use the one-page runbook linked above.
Final Takeaway¶
This project is not just a compliance checker. It is a full compliance platform pattern:
- Policy-driven
- Context-aware
- Report-rich
- Extensible
- Safe to operate at scale
If your goal is to move from ad-hoc standards checks to engineering-grade compliance automation, this is one of the strongest reference implementations currently in the Nautomation Prime portfolio.
Mission Alignment: This deep dive reflects the PRIME Framework focus on measurable outcomes, operational safety, and transparent engineering decisions that teams can sustain long-term.