CI/CD
DevOps
Expert
GitOps
Network Automation
Observability
Tutorial
DevOps & Observability
DevOps and Observability for Network Automation: CI/CD, GitOps, and Monitoring
Published: March 1, 2026
Author: Nautomation Prime Team
Why This Tutorial Exists
Enterprise automation is more than scripts βit requires production-grade pipelines, version control, safe rollouts, and comprehensive observability. This tutorial covers CI/CD, GitOps, observability architecture, structured logging, metrics, and alerting , aligned with the PRIME Framework.
Prerequisites
Advanced Python and networking knowledge
Familiarity with Git, Docker, and container concepts
Understanding of CI/CD tools (GitHub Actions, GitLab CI, Jenkins)
Basic knowledge of monitoring tools (Prometheus, Grafana)
DevOps Architecture: Multi-Stage Pipeline
Source Control (Git)
β
CI: Lint, Test, Build
β
CD: Stage β Approve β Production
β
Observability: Logs, Metrics, Alerts
Part 1: GitHub Actions Multi-Stage CI/CD
name : Network Automation CI/CD
on :
push :
branches : [ main ]
jobs :
lint :
runs-on : ubuntu-latest
steps :
- uses : actions/checkout@v4
- uses : actions/setup-python@v4
with :
python-version : '3.11'
- run : pip install -r requirements-dev.txt
- run : flake8 src/
- run : black src/ --check
- run : mypy src/ --strict
test :
runs-on : ubuntu-latest
needs : lint
steps :
- uses : actions/checkout@v4
- run : pip install -r requirements-dev.txt
- run : pytest tests/unit/ --cov=src
deploy :
runs-on : ubuntu-latest
needs : test
if : github.ref == 'refs/heads/main'
environment : production
steps :
- uses : actions/checkout@v4
- run : python scripts/deploy.py
Part 2: Structured Logging
import structlog
structlog . configure (
processors = [
structlog . stdlib . filter_by_level ,
structlog . stdlib . add_logger_name ,
structlog . processors . TimeStamper ( fmt = "iso" ),
structlog . processors . JSONRenderer ()
],
logger_factory = structlog . stdlib . LoggerFactory (),
)
logger = structlog . get_logger ()
logger . info ( "automation_started" , change_id = "CHG0001" , devices = 5 )
Part 3: Prometheus Metrics
from prometheus_client import Counter , Histogram , start_http_server
automation_runs = Counter (
'network_automation_runs_total' ,
'Total automation runs' ,
[ 'status' ]
)
automation_duration = Histogram (
'network_automation_duration_seconds' ,
'Automation execution time' ,
buckets = ( 1 , 5 , 10 , 30 , 60 , 300 )
)
start_http_server ( 8000 )
automation_runs . labels ( status = 'success' ) . inc ()
automation_duration . observe ( 15.5 )
Part 4: Alerting Rules
groups :
- name : network_automation
rules :
- alert : HighErrorRate
expr : rate(network_automation_runs_total{status="failed"}[5m]) > 0.1
annotations :
summary : "High automation error rate"
- alert : JobTimeout
expr : increase(network_automation_runs_total{status="timeout"}[1h]) > 5
annotations :
summary : "Multiple timeouts detected"
Key Takeaways
β
Multi-stage CI/CD prevents errors - Lint, test, then deploy
β
Structured logging enables investigation - JSON format for searching
β
Metrics provide visibility - Performance and error tracking
β
Alerts enable proactive response - Early problem detection
β
Audit trails ensure compliance - Complete change history
PRIME in Action
β
Safety: Multi-stage gates prevent production incidents
β
Measuring: Metrics track automation performance
β
Empowerment: Teams manage deployments via GitOps
β
Re-engineer: Data drives continuous improvement
π£ Want More?