Learn

Navigate through learn topics

SQL Databases

Deep dive into SQL database systems, relational theory and practical implementation strategies

Last updated: 8/15/2025

Master the world of SQL databases - from relational theory and ACID properties to advanced querying, optimisation and enterprise database management.

What are SQL Databases?

The Relational Database Foundation

Structured data storage with mathematical precision

SQL databases are built on the solid foundation of relational algebra and set theory. They store data in tables with rows and columns, where relationships between data are defined through mathematical principles rather than physical connections. This mathematical foundation provides predictable behaviour, data integrity and powerful querying capabilities.

Real-world analogy: Think of SQL databases like a well-organised filing cabinet system. Each drawer (table) contains related documents (records) and you can create connections between documents using reference numbers (foreign keys). The system ensures that if you move a document, all references to it are updated automatically, maintaining consistency across the entire filing system.

Relational Database Theory

The Mathematical Foundation

Set theory and relational algebra

Relational model principles:

  • Tables (Relations): Collections of tuples (rows)
  • Attributes (Columns): Properties of entities
  • Tuples (Rows): Individual records
  • Domains: Sets of valid values for attributes
  • Keys: Unique identifiers for records

Relational algebra operations:

-- Selection (WHERE clause)
σ(condition)(table)

-- Projection (SELECT columns)
π(columns)(table)

-- Join (combining tables)
table1 ⋈ table2

-- Union (combining result sets)
table1 ∪ table2

-- Difference (records in table1 but not table2)
table1 - table2

Normalisation Theory

Eliminating data redundancy

First Normal Form (1NF):

  • Each attribute contains atomic values
  • No repeating groups or arrays
  • Primary key identified

Example of 1NF violation:

-- VIOLATION: Repeating groups
CREATE TABLE users (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    phone_numbers VARCHAR(500) -- "555-1234, 555-5678"
);

-- CORRECT: Atomic values
CREATE TABLE users (
    id INT PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE user_phones (
    id INT PRIMARY KEY,
    user_id INT REFERENCES users(id),
    phone_number VARCHAR(20)
);

Second Normal Form (2NF):

  • In 1NF
  • All non-key attributes depend on the entire primary key
  • No partial dependencies

Example of 2NF violation:

-- VIOLATION: Partial dependency
CREATE TABLE orders (
    order_id INT,
    product_id INT,
    product_name VARCHAR(100), -- Depends only on product_id
    quantity INT,
    PRIMARY KEY (order_id, product_id)
);

-- CORRECT: Separate tables
CREATE TABLE orders (
    order_id INT,
    product_id INT,
    quantity INT,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (product_id) REFERENCES products(id)
);

CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(100)
);

Third Normal Form (3NF):

  • In 2NF
  • No transitive dependencies
  • Non-key attributes don't depend on other non-key attributes

Example of 3NF violation:

-- VIOLATION: Transitive dependency
CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT,
    department_name VARCHAR(100), -- Depends on department_id
    department_location VARCHAR(100) -- Depends on department_id
);

-- CORRECT: Eliminate transitive dependency
CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT REFERENCES departments(id)
);

CREATE TABLE departments (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    location VARCHAR(100)
);

SQL Database Types

Traditional RDBMS

Mature, battle-tested systems

PostgreSQL:

  • Strengths: ACID compliance, advanced features, extensibility
  • Use cases: Complex applications, data warehousing, GIS
  • Advanced features: JSON support, full-text search, custom functions
  • Extensions: PostGIS (spatial), pgvector (vector search)

MySQL:

  • Strengths: Speed, reliability, wide adoption
  • Use cases: Web applications, online transaction processing
  • Storage engines: InnoDB (ACID), MyISAM (speed), Memory
  • Clustering: MySQL Cluster, Group Replication

SQL Server:

  • Strengths: Microsoft ecosystem integration, enterprise features
  • Use cases: Windows applications, business intelligence
  • Advanced features: Always On, In-Memory OLTP, PolyBase
  • Cloud: Azure SQL Database, Managed Instance

Oracle Database:

  • Strengths: Enterprise-grade, high performance, advanced security
  • Use cases: Large enterprises, mission-critical applications
  • Advanced features: Real Application Clusters, Data Guard
  • Cloud: Oracle Cloud Infrastructure

Modern SQL Databases

Next-generation relational systems

CockroachDB:

  • Features: Distributed SQL, global consistency
  • Use cases: Multi-region applications, cloud-native
  • Compatibility: PostgreSQL wire protocol
  • Scaling: Horizontal scaling with ACID transactions

TiDB:

  • Features: MySQL compatible, distributed architecture
  • Use cases: Large-scale applications, HTAP workloads
  • Architecture: Hybrid transactional/analytical processing
  • Scaling: Automatic sharding and rebalancing

YugabyteDB:

  • Features: PostgreSQL compatible, distributed SQL
  • Use cases: Cloud-native applications, microservices
  • Consistency: Strong consistency with global distribution
  • Performance: High throughput with low latency

Advanced SQL Features

Window Functions

Analytical queries with context

Basic window functions:

-- Row numbering within partitions
SELECT 
    name,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rank_in_dept
FROM employees;

-- Running totals
SELECT 
    date,
    sales,
    SUM(sales) OVER (ORDER BY date) as running_total
FROM daily_sales;

-- Moving averages
SELECT 
    date,
    sales,
    AVG(sales) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as moving_avg
FROM daily_sales;

Advanced window functions:

-- Lead and lag for time series analysis
SELECT 
    date,
    sales,
    LAG(sales, 1) OVER (ORDER BY date) as previous_day_sales,
    LEAD(sales, 1) OVER (ORDER BY date) as next_day_sales,
    (sales - LAG(sales, 1) OVER (ORDER BY date)) / LAG(sales, 1) OVER (ORDER BY date) * 100 as growth_percent
FROM daily_sales;

Common Table Expressions (CTEs)

Recursive and reusable queries

Basic CTEs:

WITH monthly_totals AS (
    SELECT 
        DATE_TRUNC('month', order_date) as month,
        SUM(total_amount) as monthly_revenue
    FROM orders
    GROUP BY DATE_TRUNC('month', order_date)
),
monthly_growth AS (
    SELECT 
        month,
        monthly_revenue,
        LAG(monthly_revenue) OVER (ORDER BY month) as prev_month_revenue,
        (monthly_revenue - LAG(monthly_revenue) OVER (ORDER BY month)) / LAG(monthly_revenue) OVER (ORDER BY month) * 100 as growth_percent
    FROM monthly_totals
)
SELECT * FROM monthly_growth WHERE growth_percent > 10;

Recursive CTEs:

-- Employee hierarchy
WITH RECURSIVE employee_hierarchy AS (
    -- Base case: top-level employees
    SELECT id, name, manager_id, 1 as level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive case: subordinates
    SELECT e.id, e.name, e.manager_id, eh.level + 1
    FROM employees e
    JOIN employee_hierarchy eh ON e.manager_id = eh.id
)
SELECT 
    REPEAT('  ', level - 1) || name as hierarchy_display,
    level
FROM employee_hierarchy
ORDER BY level, name;

JSON and XML Support

Modern data types in SQL

PostgreSQL JSON operations:

-- JSON column with operators
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    attributes JSONB
);

-- Insert JSON data
INSERT INTO products (name, attributes) VALUES (
    'Laptop',
    '{"brand": "Dell", "cpu": "Intel i7", "ram": "16GB", "storage": "512GB SSD"}'
);

-- Query JSON data
SELECT name, attributes->>'brand' as brand
FROM products
WHERE attributes->>'cpu' = 'Intel i7';

-- JSON path queries
SELECT name, attributes->'storage' as storage
FROM products
WHERE attributes @> '{"ram": "16GB"}';

-- JSON aggregation
SELECT 
    attributes->>'brand' as brand,
    COUNT(*) as product_count,
    AVG((attributes->>'price')::numeric) as avg_price
FROM products
GROUP BY attributes->>'brand';

Performance Optimisation

Indexing Strategies

Speed up your queries

B-tree indexes:

-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (order matters)
CREATE INDEX idx_users_name_email ON users(last_name, first_name, email);

-- Partial index (only index relevant rows)
CREATE INDEX idx_active_users ON users(id) WHERE status = 'active';

-- Covering index (include all needed columns)
CREATE INDEX idx_user_profile ON users(id, first_name, last_name, email, status);

-- Unique index with constraint
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

Specialised indexes:

-- Full-text search index
CREATE INDEX idx_products_description_fts ON products USING gin(to_tsvector('english', description));

-- Spatial index (PostGIS)
CREATE INDEX idx_locations_geom ON locations USING gist(geom);

-- Hash index for equality comparisons
CREATE INDEX idx_users_id_hash ON users USING hash(id);

-- BRIN index for large tables with natural ordering
CREATE INDEX idx_orders_date_brin ON orders USING brin(order_date);

Query Optimisation

Making your SQL faster

EXPLAIN analysis:

-- PostgreSQL detailed analysis
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT u.name, COUNT(o.id) as order_count
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC;

-- MySQL query analysis
EXPLAIN FORMAT=JSON
SELECT u.name, COUNT(o.id) as order_count
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC;

Query optimisation techniques:

-- Use EXISTS instead of IN for large datasets
SELECT u.name
FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.user_id = u.id AND o.total_amount > 1000
);

-- Avoid SELECT * - specify needed columns
SELECT u.id, u.name, u.email
FROM users u
WHERE u.status = 'active';

-- Use LIMIT with ORDER BY for pagination
SELECT id, name, created_at
FROM users
ORDER BY created_at DESC
LIMIT 20 OFFSET 40;

-- Use appropriate JOIN types
SELECT u.name, o.total_amount
FROM users u
LEFT JOIN orders o ON u.id = o.user_id  -- Include users without orders
WHERE u.status = 'active';

Database Administration

User Management

Controlling access and permissions

Creating and managing users:

-- PostgreSQL user management
CREATE USER app_user WITH PASSWORD 'secure_password';
CREATE USER readonly_user WITH PASSWORD 'readonly_pass';

-- Grant permissions
GRANT CONNECT ON DATABASE myapp TO app_user;
GRANT USAGE ON SCHEMA public TO app_user;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_user;

-- Create role for specific permissions
CREATE ROLE data_analyst;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO data_analyst;
GRANT SELECT ON ALL TABLES IN SCHEMA analytics TO data_analyst;

Row-level security:

-- Enable RLS on table
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

-- Create policy for user orders
CREATE POLICY user_orders ON orders
    FOR ALL
    USING (user_id = current_user_id());

-- Create policy for managers
CREATE POLICY manager_orders ON orders
    FOR ALL
    USING (
        EXISTS (
            SELECT 1 FROM employees e
            WHERE e.id = current_user_id()
            AND e.role = 'manager'
        )
    );

Backup and Recovery

Protecting your data

PostgreSQL backup strategies:

# Logical backup (pg_dump)
pg_dump -h localhost -U username -d database_name > backup.sql

# Physical backup (pg_basebackup)
pg_basebackup -h localhost -U username -D /backup/path -Ft -z -P

# Continuous archiving setup
# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'cp %p /archive/%f'

MySQL backup strategies:

# Logical backup (mysqldump)
mysqldump -h localhost -u username -p database_name > backup.sql

# Physical backup (Percona XtraBackup)
xtrabackup --backup --target-dir=/backup/path

# Binary log backup
mysqlbinlog --start-datetime="2024-01-01 00:00:00" \
    --stop-datetime="2024-01-02 00:00:00" \
    /var/lib/mysql/mysql-bin.* > binlog_backup.sql

Enterprise Features

High Availability

Keeping your database running

PostgreSQL replication:

-- Primary server configuration
# postgresql.conf
wal_level = replica
max_wal_senders = 3
max_replication_slots = 3

# pg_hba.conf
host replication replicator 192.168.1.0/24 md5

-- Standby server configuration
# recovery.conf
standby_mode = on
primary_conninfo = 'host=192.168.1.10 port=5432 user=replicator password=repl_pass'
restore_command = 'cp /archive/%f %p'

MySQL Group Replication:

-- Enable Group Replication plugin
INSTALL PLUGIN group_replication SONAME 'group_replication.so';

-- Configure group replication
SET GLOBAL group_replication_group_name = "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa";
SET GLOBAL group_replication_start_on_boot = OFF;
SET GLOBAL group_replication_local_address = "192.168.1.10:33061";
SET GLOBAL group_replication_group_seeds = "192.168.1.10:33061,192.168.1.11:33061,192.168.1.12:33061";

-- Start group replication
START GROUP_REPLICATION;

Partitioning

Managing large tables efficiently

Range partitioning:

-- Partition by date range
CREATE TABLE orders (
    id SERIAL,
    order_date DATE,
    customer_id INT,
    total_amount DECIMAL(10,2)
) PARTITION BY RANGE (order_date);

-- Create partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE orders_2024_02 PARTITION OF orders
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- Automatic partition creation
CREATE OR REPLACE FUNCTION create_monthly_partition(table_name text, start_date date)
RETURNS void AS $$
DECLARE
    partition_name text;
    end_date date;
BEGIN
    partition_name := table_name || '_' || to_char(start_date, 'YYYY_MM');
    end_date := start_date + interval '1 month';
    
    EXECUTE format('CREATE TABLE %I PARTITION OF %I FOR VALUES FROM (%L) TO (%L)',
                   partition_name, table_name, start_date, end_date);
END;
$$ LANGUAGE plpgsql;

Hash partitioning:

-- Partition by user ID hash
CREATE TABLE user_data (
    id SERIAL,
    user_id INT,
    data_type VARCHAR(50),
    data_value TEXT
) PARTITION BY HASH (user_id);

-- Create hash partitions
CREATE TABLE user_data_0 PARTITION OF user_data FOR VALUES WITH (modulus 4, remainder 0);
CREATE TABLE user_data_1 PARTITION OF user_data FOR VALUES WITH (modulus 4, remainder 1);
CREATE TABLE user_data_2 PARTITION OF user_data FOR VALUES WITH (modulus 4, remainder 2);
CREATE TABLE user_data_3 PARTITION OF user_data FOR VALUES WITH (modulus 4, remainder 3);

Getting Started

Setting Up Your First SQL Database

Step 1: Choose your database system

  • PostgreSQL: Advanced features, extensibility
  • MySQL: Speed, wide adoption
  • SQLite: Embedded, lightweight
  • SQL Server: Windows integration

Step 2: Install and configure

  • Install database server
  • Create initial database
  • Set up users and permissions
  • Configure basic settings

Step 3: Design your schema

  • Identify entities and relationships
  • Normalise your data
  • Plan indexes and constraints
  • Consider performance implications

Step 4: Implement and test

  • Create tables and relationships
  • Insert sample data
  • Test queries and performance
  • Optimise based on results

Learning Path

Beginner:

  1. Learn basic SQL syntax
  2. Understand relational concepts
  3. Practice with simple queries
  4. Design basic schemas

Intermediate:

  1. Master advanced SQL features
  2. Learn indexing strategies
  3. Understand query optimisation
  4. Implement security measures

Advanced:

  1. Design high-performance schemas
  2. Implement replication and clustering
  3. Optimise for scale
  4. Manage enterprise deployments

Learning Resources

Documentation

  • PostgreSQL Documentation
  • MySQL Documentation
  • SQL Server Documentation

Online Courses

  • SQL for Data Science courses
  • Database Design courses

Books

  • SQL Performance Explained
  • Database Design for Mere Mortals
  • PostgreSQL: Up and Running
  • High Performance MySQL

Summary

SQL databases remain the foundation of modern data management, providing reliability, consistency and powerful querying capabilities.

Key takeaways:

  • Understand relational theory and normalisation
  • Choose the right SQL database for your needs
  • Implement proper indexing and optimisation strategies
  • Use advanced SQL features for complex queries
  • Plan for high availability and scalability
  • Maintain security and compliance standards

Remember: SQL databases are mature, reliable and powerful. Master the fundamentals first, then explore advanced features to build robust, scalable applications!

Related Topics

Learn more about database systems and data management: