Skip to content

Nautomation Prime

DevOps and Observability for Network Automation: CI/CD, GitOps, and Monitoring

DevOps & Observability

DevOps and Observability for Network Automation: CI/CD, GitOps, and Monitoring¶

Published: March 1, 2026
Author: Nautomation Prime Team

Why This Tutorial Exists¶

Enterprise automation is more than scripts—it requires production-grade pipelines, version control, safe rollouts, and comprehensive observability. This tutorial covers CI/CD, GitOps, observability architecture, structured logging, metrics, and alerting, aligned with the PRIME Framework.

Prerequisites¶

Advanced Python and networking knowledge
Familiarity with Git, Docker, and container concepts
Understanding of CI/CD tools (GitHub Actions, GitLab CI, Jenkins)
Basic knowledge of monitoring tools (Prometheus, Grafana)

DevOps Architecture: Multi-Stage Pipeline¶

Source Control (Git)
    ↓
CI: Lint, Test, Build
    ↓
CD: Stage → Approve → Production
    ↓
Observability: Logs, Metrics, Alerts

Part 1: GitHub Actions Multi-Stage CI/CD¶

name: Network Automation CI/CD
on:
  push:
    branches: [ main ]

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      - run: pip install -r requirements-dev.txt
      - run: flake8 src/
      - run: black src/ --check
      - run: mypy src/ --strict

  test:
    runs-on: ubuntu-latest
    needs: lint
    steps:
      - uses: actions/checkout@v4
      - run: pip install -r requirements-dev.txt
      - run: pytest tests/unit/ --cov=src

  deploy:
    runs-on: ubuntu-latest
    needs: test
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
      - uses: actions/checkout@v4
      - run: python scripts/deploy.py

Part 2: Structured Logging¶

import structlog

structlog.configure(
    processors=[
        structlog.stdlib.filter_by_level,
        structlog.stdlib.add_logger_name,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer()
    ],
    logger_factory=structlog.stdlib.LoggerFactory(),
)

logger = structlog.get_logger()
logger.info("automation_started", change_id="CHG0001", devices=5)

Part 3: Prometheus Metrics¶

from prometheus_client import Counter, Histogram, start_http_server

automation_runs = Counter(
    'network_automation_runs_total',
    'Total automation runs',
    ['status']
)

automation_duration = Histogram(
    'network_automation_duration_seconds',
    'Automation execution time',
    buckets=(1, 5, 10, 30, 60, 300)
)

start_http_server(8000)
automation_runs.labels(status='success').inc()
automation_duration.observe(15.5)

Part 4: Alerting Rules¶

groups:
  - name: network_automation
    rules:
      - alert: HighErrorRate
        expr: rate(network_automation_runs_total{status="failed"}[5m]) > 0.1
        annotations:
          summary: "High automation error rate"

      - alert: JobTimeout
        expr: increase(network_automation_runs_total{status="timeout"}[1h]) > 5
        annotations:
          summary: "Multiple timeouts detected"

Key Takeaways¶

✅ Multi-stage CI/CD prevents errors - Lint, test, then deploy
✅ Structured logging enables investigation - JSON format for searching
✅ Metrics provide visibility - Performance and error tracking
✅ Alerts enable proactive response - Early problem detection
✅ Audit trails ensure compliance - Complete change history

PRIME in Action¶

✅ Safety: Multi-stage gates prevent production incidents
✅ Measuring: Metrics track automation performance
✅ Empowerment: Teams manage deployments via GitOps
✅ Re-engineer: Data drives continuous improvement

📣 Want More?¶