Learn

Navigate through learn topics

DevOps & CI/CD

Understanding DevOps culture, continuous integration, delivery and modern development practices

Last updated: 8/15/2025

Master the practices, tools and culture that bridge development and operations, enabling teams to deliver software faster and more reliably.

What is DevOps?

The Core Philosophy

Breaking down silos between Dev and Ops

DevOps isn't just tools - it's a cultural shift that brings development and operations teams together to deliver value faster and more reliably.

Real-world analogy: Traditional IT is like a relay race where developers hand off code to operations. DevOps is like a football team where everyone works together towards the same goal, with shared responsibility for success.

The Three Ways of DevOps

1. Flow (Left to Right)
   Dev → Test → Deploy → Monitor
   Fast flow of work from development to production

2. Feedback (Right to Left)
   Monitor → Alert → Fix → Improve
   Fast feedback from production to development

3. Continuous Learning
   Experiment → Learn → Share → Improve
   Culture of experimentation and learning

CI/CD Pipeline

Continuous Integration (CI)

Merge code frequently, test automatically

CI is about integrating code changes frequently and detecting problems early.

# Example CI Pipeline
name: Continuous Integration
on: [push, pull_request]

jobs:
  test:
    steps:
      - Checkout code
      - Install dependencies
      - Run linting
      - Run unit tests
      - Run integration tests
      - Generate coverage report
      - Build application
      - Upload artifacts

Continuous Delivery (CD)

Always ready to deploy

Code is always in a deployable state, but deployment is a manual decision.

Code → Build → Test → Stage → [Manual Approval] → Production

Continuous Deployment

Automatic deployment to production

Every change that passes tests goes to production automatically.

Code → Build → Test → Stage → Production (automatic)

Version Control

Git Workflows

Git Flow

Structured branching model

master (production)
    ↓
develop (integration)
    ├── feature/user-auth
    ├── feature/payment
    └── feature/search
    
hotfix/critical-bug → master & develop
release/v2.0 → master & develop

GitHub Flow

Simplified workflow

main
  ├── feature-branch-1
  ├── feature-branch-2
  └── feature-branch-3
  
1. Create branch from main
2. Make changes
3. Open pull request
4. Review and test
5. Merge to main
6. Deploy

GitLab Flow

Environment branches

main
  ↓
pre-production
  ↓
production

Features merge to main
Main deploys to pre-production
Pre-production promotes to production

Branching Strategies

Feature Branches:

git checkout -b feature/new-feature
# Work on feature
git push origin feature/new-feature
# Create PR/MR

Release Branches:

git checkout -b release/2.0.0
# Final testing and bug fixes
git tag -a v2.0.0 -m "Version 2.0.0"

Hotfix Branches:

git checkout -b hotfix/critical-bug production
# Fix bug
git merge --no-ff hotfix/critical-bug

Build Automation

Build Tools

Make

Classic build automation

# Makefile
.PHONY: build test deploy

build:
    npm install
    npm run build

test: build
    npm test

deploy: test
    ./scripts/deploy.sh

clean:
    rm -rf dist node_modules

Gradle

Modern build automation

// build.gradle
task build {
    dependsOn 'compile', 'test'
}

task test {
    doLast {
        println 'Running tests...'
    }
}

task deploy(dependsOn: build) {
    doLast {
        println 'Deploying application...'
    }
}

Package Managers

Language-specific:

# JavaScript/Node.js
npm install
yarn add package

# Python
pip install package
poetry add package

# Ruby
gem install package
bundle install

# Java
mvn install
gradle build

# Go
go get package

Testing Strategies

Testing Pyramid

         /\
        /  \  E2E Tests (Few)
       /    \  - Full user flows
      /──────\
     /        \  Integration Tests (Some)
    /          \  - Component interactions
   /────────────\
  /              \  Unit Tests (Many)
 /                \  - Individual functions
/──────────────────\

Types of Testing

Unit Testing

Test individual components

// Jest example
describe('Calculator', () => {
  test('adds 1 + 2 to equal 3', () => {
    expect(add(1, 2)).toBe(3);
  });
  
  test('multiplies 3 * 4 to equal 12', () => {
    expect(multiply(3, 4)).toBe(12);
  });
});

Integration Testing

Test component interactions

# pytest example
def test_api_endpoint():
    response = client.post('/api/users', json={
        'name': 'John Doe',
        'email': 'john@example.com'
    })
    assert response.status_code == 201
    assert response.json()['id'] is not None

End-to-End Testing

Test complete user journeys

// Cypress example
describe('User Registration', () => {
  it('allows user to sign up', () => {
    cy.visit('/signup');
    cy.get('#email').type('user@example.com');
    cy.get('#password').type('SecurePass123!');
    cy.get('#submit').click();
    cy.url().should('include', '/dashboard');
  });
});

Test Automation

# GitHub Actions test automation
name: Test Suite
on: [push, pull_request]

jobs:
  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - run: npm test:unit
  
  integration-tests:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:13
    steps:
      - uses: actions/checkout@v2
      - run: npm test:integration
  
  e2e-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - run: npm test:e2e

Infrastructure as Code (IaC)

Configuration Management

Ansible

Agentless automation

---
- name: Configure web servers
  hosts: webservers
  become: yes
  
  tasks:
    - name: Install nginx
      apt:
        name: nginx
        state: present
    
    - name: Start nginx service
      service:
        name: nginx
        state: started
        enabled: yes
    
    - name: Deploy website
      copy:
        src: ./dist/
        dest: /var/www/html/

Puppet

Declarative configuration

class webserver {
  package { 'nginx':
    ensure => installed,
  }
  
  service { 'nginx':
    ensure  => running,
    enable  => true,
    require => Package['nginx'],
  }
  
  file { '/var/www/html/index.html':
    content => 'Hello World',
    require => Package['nginx'],
  }
}

Chef

Ruby-based configuration

# Recipe
package 'nginx' do
  action :install
end

service 'nginx' do
  action [:enable, :start]
end

template '/etc/nginx/nginx.conf' do
  source 'nginx.conf.erb'
  notifies :restart, 'service[nginx]'
end

Infrastructure Provisioning

Terraform

Multi-cloud provisioning

# main.tf
provider "aws" {
  region = "us-west-2"
}

resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t2.micro"
  
  tags = {
    Name = "WebServer"
    Environment = "Production"
  }
}

resource "aws_s3_bucket" "assets" {
  bucket = "my-app-assets"
  acl    = "public-read"
}

Pulumi

IaC with real programming languages

import * as aws from "@pulumi/aws";

const bucket = new aws.s3.Bucket("my-bucket", {
    website: {
        indexDocument: "index.html",
    },
});

const instance = new aws.ec2.Instance("web-server", {
    ami: "ami-0c55b159cbfafe1f0",
    instanceType: "t2.micro",
});

export const bucketName = bucket.id;
export const instanceIp = instance.publicIp;

Containerisation & Orchestration

Docker in DevOps

# Multi-stage build for production
FROM node:16 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:16-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/index.js"]

Kubernetes Deployments

# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-deployment
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: app
        image: myapp:latest
        ports:
        - containerPort: 3000
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10

Helm Charts

Kubernetes package manager

# values.yaml
replicaCount: 3
image:
  repository: myapp
  tag: "1.0.0"
  pullPolicy: IfNotPresent

service:
  type: LoadBalancer
  port: 80

ingress:
  enabled: true
  hosts:
    - host: myapp.example.com
      paths: ["/"]

Monitoring & Observability

The Three Pillars

Metrics

Numerical measurements over time

# Prometheus metrics
http_requests_total{method="GET", endpoint="/api/users"} 1234
http_request_duration_seconds{quantile="0.99"} 0.123
memory_usage_bytes 536870912

Logs

Discrete events

{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "ERROR",
  "service": "payment-service",
  "message": "Payment processing failed",
  "user_id": "12345",
  "error": "Insufficient funds",
  "trace_id": "abc-123-def"
}

Traces

Request flow through services

User Request
  └─> API Gateway (2ms)
      └─> Auth Service (5ms)
      └─> User Service (10ms)
          └─> Database (15ms)
      └─> Payment Service (50ms)
          └─> External Payment API (45ms)
Total: 82ms

Monitoring Stack

Prometheus + Grafana

Metrics and visualisation

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'app'
    static_configs:
      - targets: ['localhost:9090']
    
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

ELK Stack

Elasticsearch, Logstash, Kibana

# logstash.conf
input {
  beats {
    port => 5044
  }
}

filter {
  grok {
    match => { "message" => "%{COMMONAPACHELOG}" }
  }
  
  date {
    match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "logs-%{+YYYY.MM.dd}"
  }
}

APM (Application Performance Monitoring)

Tools:

  • New Relic
  • Datadog
  • AppDynamics
  • Dynatrace
// Datadog APM example
const tracer = require('dd-trace').init();

app.get('/api/users/:id', (req, res) => {
  const span = tracer.startSpan('get_user');
  span.setTag('user.id', req.params.id);
  
  getUserFromDatabase(req.params.id)
    .then(user => {
      span.setTag('user.found', true);
      res.json(user);
    })
    .catch(error => {
      span.setTag('error', true);
      span.setTag('error.message', error.message);
      res.status(404).json({ error: 'User not found' });
    })
    .finally(() => {
      span.finish();
    });
});

Security in DevOps (DevSecOps)

Shift Left Security

Security early in the pipeline

# Security scanning in CI
name: Security Scan
on: [push]

jobs:
  sast:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Run SAST scan
        run: |
          npm audit
          snyk test
  
  container-scan:
    runs-on: ubuntu-latest
    steps:
      - name: Scan Docker image
        run: trivy image myapp:latest
  
  secrets-scan:
    runs-on: ubuntu-latest
    steps:
      - name: Scan for secrets
        run: gitleaks detect --source .

Security Tools

SAST (Static Application Security Testing):

  • SonarQube
  • Checkmarx
  • Fortify

DAST (Dynamic Application Security Testing):

  • OWASP ZAP
  • Burp Suite
  • Acunetix

Dependency Scanning:

# Various tools
npm audit              # Node.js
safety check          # Python
bundle audit          # Ruby
mvn dependency-check  # Java

Secrets Management

HashiCorp Vault

# Store secret
vault kv put secret/myapp/db password=secretpass

# Retrieve secret
vault kv get secret/myapp/db

# Dynamic secrets
vault write database/creds/my-role

AWS Secrets Manager

import boto3

client = boto3.client('secretsmanager')

# Get secret
response = client.get_secret_value(SecretId='prod/myapp/db')
secret = response['SecretString']

Deployment Strategies

Blue-Green Deployment

Two identical environments

Current State:
  Blue (v1.0) ← Live Traffic
  Green (v2.0) ← Deploy & Test

Switch:
  Blue (v1.0) ← Standby
  Green (v2.0) ← Live Traffic

Rollback if needed:
  Blue (v1.0) ← Live Traffic
  Green (v2.0) ← Fix issues

Canary Deployment

Gradual rollout

Stage 1: 5% traffic to v2.0
Stage 2: 25% traffic to v2.0
Stage 3: 50% traffic to v2.0
Stage 4: 100% traffic to v2.0

Feature Flags

Deploy code without releasing features

if (featureFlag.isEnabled('new-checkout')) {
  return <NewCheckoutFlow />;
} else {
  return <OldCheckoutFlow />;
}

Automation Tools

Jenkins

The automation server

// Jenkinsfile
pipeline {
    agent any
    
    stages {
        stage('Build') {
            steps {
                sh 'npm install'
                sh 'npm run build'
            }
        }
        
        stage('Test') {
            parallel {
                stage('Unit Tests') {
                    steps {
                        sh 'npm run test:unit'
                    }
                }
                stage('Integration Tests') {
                    steps {
                        sh 'npm run test:integration'
                    }
                }
            }
        }
        
        stage('Deploy') {
            when {
                branch 'main'
            }
            steps {
                sh './deploy.sh'
            }
        }
    }
    
    post {
        always {
            cleanWs()
        }
        success {
            slackSend(message: "Build successful!")
        }
        failure {
            slackSend(message: "Build failed!")
        }
    }
}

GitLab CI/CD

Integrated CI/CD

# .gitlab-ci.yml
stages:
  - build
  - test
  - deploy

variables:
  DOCKER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA

build:
  stage: build
  script:
    - docker build -t $DOCKER_IMAGE .
    - docker push $DOCKER_IMAGE

test:
  stage: test
  script:
    - npm test
  coverage: '/Coverage: \d+\.\d+%/'

deploy:
  stage: deploy
  script:
    - kubectl set image deployment/app app=$DOCKER_IMAGE
  environment:
    name: production
    url: https://yourdomain.com
  only:
    - main

GitHub Actions

Native GitHub automation

name: CI/CD Pipeline
on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    
    services:
      postgres:
        image: postgres:13
        env:
          POSTGRES_PASSWORD: postgres
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
    
    steps:
      - uses: actions/checkout@v3
      
      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18'
          cache: 'npm'
      
      - name: Install dependencies
        run: npm ci
      
      - name: Run tests
        run: npm test
        env:
          DATABASE_URL: postgresql://postgres:postgres@localhost/test
      
      - name: Build application
        run: npm run build
      
      - name: Upload artifacts
        uses: actions/upload-artifact@v3
        with:
          name: dist
          path: dist/
  
  deploy:
    needs: build-and-test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    
    steps:
      - name: Download artifacts
        uses: actions/download-artifact@v3
        with:
          name: dist
      
      - name: Deploy to production
        run: |
          echo "Deploying to production..."
          # Deployment commands here

Site Reliability Engineering (SRE)

SLIs, SLOs and SLAs

Measuring reliability

SLI (Service Level Indicator):

Availability = (Successful Requests / Total Requests) × 100
Latency P99 = 99% of requests complete within X ms

SLO (Service Level Objective):

- 99.9% availability (43.2 minutes downtime/month)
- P99 latency < 200ms
- Error rate < 0.1%

SLA (Service Level Agreement):

If availability < 99.9%:
  - 10% credit for 99.0-99.9%
  - 25% credit for 95.0-99.0%
  - 50% credit for < 95.0%

Error Budgets

Balancing reliability and velocity

Monthly Error Budget = (1 - SLO) × Total Time
99.9% SLO = 0.1% × 43,200 minutes = 43.2 minutes

If error budget consumed:
  - Freeze feature releases
  - Focus on reliability improvements
  - Post-mortem for incidents

Chaos Engineering

Breaking things on purpose

# Chaos Monkey configuration
chaos:
  enabled: true
  schedule: "0 9-17 * * 1-5"  # Weekdays 9-5
  
  experiments:
    - name: "Random Pod Failure"
      type: "pod-failure"
      probability: 0.1
    
    - name: "Network Latency"
      type: "network-delay"
      delay: "100ms"
      probability: 0.2
    
    - name: "CPU Stress"
      type: "resource-stress"
      cpu: 80
      duration: "5m"

Documentation as Code

API Documentation

# OpenAPI/Swagger
openapi: 3.0.0
info:
  title: User API
  version: 1.0.0

paths:
  /users/{id}:
    get:
      summary: Get user by ID
      parameters:
        - name: id
          in: path
          required: true
          schema:
            type: integer
      responses:
        200:
          description: User found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/User'

Architecture Decision Records (ADRs)

# ADR-001: Use PostgreSQL for primary database

## Status
Accepted

## Context
We need a reliable, scalable database for our application.

## Decision
We will use PostgreSQL as our primary database.

## Consequences
- Strong consistency guarantees
- Excellent JSON support
- Requires DBA knowledge for optimisation
- Need to plan for scaling (read replicas, partitioning)

DevOps Metrics

DORA Metrics

Key performance indicators

  1. Deployment Frequency

    • How often code is deployed
  2. Lead Time for Changes

    • Time from commit to production
  3. Mean Time to Recovery (MTTR)

    • Time to restore service
  4. Change Failure Rate

    • Percentage of deployments causing failure

Measuring Success

// Track deployment metrics
const deployment = {
  timestamp: new Date(),
  version: process.env.VERSION,
  duration: deploymentTime,
  status: 'success',
  rollback: false
};

metrics.record('deployment.frequency', 1);
metrics.record('deployment.duration', deploymentTime);
metrics.record('deployment.success_rate', status === 'success' ? 1 : 0);

Getting Started with DevOps

Personal DevOps Project

Step 1: Set up version control

git init
git remote add origin https://github.com/username/project

Step 2: Create CI pipeline

# Simple CI starter
name: CI
on: [push]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - run: npm test

Step 3: Add monitoring

// Basic health check
app.get('/health', (req, res) => {
  res.json({
    status: 'healthy',
    timestamp: new Date(),
    uptime: process.uptime()
  });
});

Step 4: Automate deployment

#!/bin/bash
# Simple deployment script
npm test || exit 1
docker build -t myapp .
docker push myapp
kubectl set image deployment/app app=myapp:latest

Summary

DevOps transforms how teams build, deploy and operate software. It's not just about tools - it's about culture, collaboration and continuous improvement.

Key takeaways:

  • Automate everything that can be automated
  • Measure everything that matters
  • Fail fast, recover faster
  • Security is everyone's responsibility
  • Documentation is part of the code