API Rate Limiting

Understanding API rate limiting - how to control request frequency, prevent abuse, and maintain system performance

Last updated: 8/15/2025

API Rate Limiting

Imagine your favourite coffee shop. If everyone rushed the counter at once shouting orders, chaos would break out. Instead, the shop serves customers one at a time, keeping things smooth and fair.

That's exactly why APIs use rate limiting.

This article explores how rate limiting works, why it's essential for API performance and security, and how to implement and work with rate-limited APIs effectively.

What Is Rate Limiting?

Rate limiting means setting a maximum number of requests a user (or system) can make in a certain amount of time. It's like having a bouncer at a nightclub who controls how many people can enter at once.

Example Rules You Might See:

100 requests per minute per user
10,000 requests per day per application
5 requests per second per IP address
1,000 requests per hour per API key

The Bouncer Analogy:

You can get in, but not all at once
If you try to rush past, you'll get stopped
There's a queue system to keep things orderly
VIP customers (authenticated users) might get priority access

Why Do APIs Limit You?

Fairness

Rate limiting prevents one user or application from hogging all the available resources. Without it, a single client could make thousands of requests while others wait in line.

Example Scenario:

User A: Makes 1,000 requests in 1 minute
User B: Makes 10 requests in 1 minute
User C: Makes 5 requests in 1 minute

Without rate limiting: User A dominates the system
With rate limiting: All users get fair access

Performance

APIs need to maintain consistent response times and prevent servers from melting under heavy load. Rate limiting ensures the system can handle requests efficiently.

Performance Benefits:

Consistent response times
Predictable server load
Better user experience
Reduced server crashes

Security

Rate limiting blocks brute force attacks where attackers try billions of password guesses or overwhelm systems with automated requests.

Security Threats Prevented:

Password brute forcing
DDoS attacks
API abuse and scraping
Account enumeration attacks

Cost Control

Cloud services charge for bandwidth and compute resources. Rate limiting keeps cloud bills predictable by preventing excessive usage.

Cost Considerations:

Bandwidth usage limits
Compute resource consumption
Database query costs
Third-party service charges

How Rate Limiting Works

Request Tracking

When you hit an API, the server tracks how many requests you've made within the specified time window. This tracking happens in real-time using various storage mechanisms.

Tracking Methods:

// In-memory tracking (simple but not scalable)
const requestCounts = new Map();

// Redis tracking (scalable and persistent)
const redis = require('redis');
const client = redis.createClient();

// Database tracking (persistent across server restarts)
const db = require('./database');

Rate Limit Headers

When you exceed the limit, the API responds with an HTTP error code and helpful headers:

HTTP Status Code:

429 Too Many Requests - You've exceeded the limit

Rate Limit Headers:

X-RateLimit-Limit: 100          # Your maximum requests per minute
X-RateLimit-Remaining: 0        # You've used them all up
X-RateLimit-Reset: 1699999999   # Unix timestamp when you can try again
X-RateLimit-Reset-Time: 2024-01-15T10:30:00Z  # Human-readable reset time
Retry-After: 60                 # Seconds to wait before retrying

Analogy: The barista saying, "Sorry, you've already had 5 free refills - come back tomorrow."

Rate Limiting Algorithms

Different APIs use various algorithms to implement rate limiting:

Fixed Window

// Simple but can allow bursts at window boundaries
const windowStart = Math.floor(Date.now() / 60000) * 60000; // 1-minute windows
const key = `rate_limit:${userId}:${windowStart}`;

Sliding Window

// More accurate but more complex
const now = Date.now();
const windowSize = 60000; // 1 minute
const key = `rate_limit:${userId}`;

// Remove expired entries and count remaining
const requests = await redis.zremrangebyscore(key, 0, now - windowSize);
const count = await redis.zcard(key);

Token Bucket

// Allows bursts up to bucket capacity
const tokens = await redis.get(`tokens:${userId}`);
if (tokens > 0) {
  await redis.decr(`tokens:${userId}`);
  // Allow request
} else {
  // Rate limit exceeded
}

Leaky Bucket

// Smooths out request bursts
const queue = await redis.lpop(`queue:${userId}`);
if (queue) {
  // Process request
} else {
  // Rate limit exceeded
}

Rate Limiting Implementation

Basic Rate Limiting Middleware

Here's how to implement rate limiting in an Express.js application:

const rateLimit = require('express-rate-limit');
const RedisStore = require('rate-limit-redis');
const redis = require('redis');

// Create Redis client
const redisClient = redis.createClient({
  host: process.env.REDIS_HOST,
  port: process.env.REDIS_PORT,
  password: process.env.REDIS_PASSWORD
});

// Global rate limiter
const globalLimiter = rateLimit({
  store: new RedisStore({
    client: redisClient,
    prefix: 'global_rate_limit:'
  }),
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100, // Limit each IP to 100 requests per windowMs
  message: {
    error: 'Too many requests from this IP, please try again later.'
  },
  standardHeaders: true, // Return rate limit info in the `RateLimit-*` headers
  legacyHeaders: false, // Disable the `X-RateLimit-*` headers
  handler: (req, res) => {
    res.status(429).json({
      error: 'Too many requests',
      retryAfter: Math.ceil(req.rateLimit.resetTime / 1000)
    });
  }
});

// Apply to all routes
app.use(globalLimiter);

User-Specific Rate Limiting

For authenticated users, you can implement more sophisticated rate limiting:

// User-specific rate limiter
const userLimiter = rateLimit({
  store: new RedisStore({
    client: redisClient,
    prefix: 'user_rate_limit:'
  }),
  windowMs: 60 * 1000, // 1 minute
  max: (req) => {
    // Different limits based on user tier
    if (req.user.tier === 'premium') return 1000;
    if (req.user.tier === 'standard') return 100;
    return 10; // Free tier
  },
  keyGenerator: (req) => {
    // Use user ID instead of IP
    return req.user ? req.user.id : req.ip;
  },
  skip: (req) => {
    // Skip rate limiting for certain routes
    return req.path.startsWith('/health') || req.path.startsWith('/metrics');
  }
});

// Apply to protected routes
app.use('/api/protected', userLimiter);

Custom Rate Limiting Logic

For more complex scenarios, you can implement custom rate limiting:

// Custom rate limiting middleware
const customRateLimit = (options) => {
  return async (req, res, next) => {
    const key = `rate_limit:${options.prefix}:${req.ip}`;
    const limit = options.limit;
    const window = options.window;
    
    try {
      const current = await redis.incr(key);
      
      if (current === 1) {
        // First request in this window
        await redis.expire(key, window);
      }
      
      if (current > limit) {
        // Rate limit exceeded
        const ttl = await redis.ttl(key);
        
        res.set({
          'X-RateLimit-Limit': limit,
          'X-RateLimit-Remaining': 0,
          'X-RateLimit-Reset': Math.floor(Date.now() / 1000) + ttl,
          'Retry-After': ttl
        });
        
        return res.status(429).json({
          error: 'Rate limit exceeded',
          retryAfter: ttl
        });
      }
      
      // Set rate limit headers
      res.set({
        'X-RateLimit-Limit': limit,
        'X-RateLimit-Remaining': limit - current,
        'X-RateLimit-Reset': Math.floor(Date.now() / 1000) + (await redis.ttl(key))
      });
      
      next();
    } catch (error) {
      console.error('Rate limiting error:', error);
      next(); // Continue on error
    }
  };
};

// Usage
app.use('/api/sensitive', customRateLimit({
  prefix: 'sensitive',
  limit: 5,
  window: 300 // 5 minutes
}));

Working with Rate-Limited APIs

Respecting Rate Limits

When building against APIs with rate limiting, follow these best practices:

Don't Spam Requests

// Bad: Making requests as fast as possible
for (let i = 0; i < 1000; i++) {
  fetch('https://api.example.com/data');
}

// Good: Spacing out requests
const delay = (ms) => new Promise(resolve => setTimeout(resolve, ms));

for (let i = 0; i < 1000; i++) {
  await fetch('https://api.example.com/data');
  await delay(100); // Wait 100ms between requests
}

Use Caching

// Cache API responses to avoid repeated requests
const cache = new Map();

async function fetchWithCache(url) {
  if (cache.has(url)) {
    const { data, timestamp } = cache.get(url);
    // Cache for 5 minutes
    if (Date.now() - timestamp < 5 * 60 * 1000) {
      return data;
    }
  }
  
  const response = await fetch(url);
  const data = await response.json();
  
  cache.set(url, { data, timestamp: Date.now() });
  return data;
}

Implement Exponential Backoff

async function fetchWithRetry(url, maxRetries = 3) {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await fetch(url);
      
      if (response.status === 429) {
        // Rate limited - wait and retry
        const retryAfter = response.headers.get('Retry-After') || Math.pow(2, attempt);
        console.log(`Rate limited, waiting ${retryAfter} seconds...`);
        await delay(retryAfter * 1000);
        continue;
      }
      
      return response;
    } catch (error) {
      if (attempt === maxRetries) throw error;
      
      // Exponential backoff
      const waitTime = Math.pow(2, attempt) * 1000;
      console.log(`Request failed, waiting ${waitTime}ms before retry...`);
      await delay(waitTime);
    }
  }
}

Batch Requests

// Instead of multiple individual requests
const userIds = [1, 2, 3, 4, 5];
const users = [];

for (const id of userIds) {
  const user = await fetch(`/api/users/${id}`);
  users.push(user);
}

// Use batch endpoint if available
const users = await fetch('/api/users/batch', {
  method: 'POST',
  body: JSON.stringify({ ids: userIds })
});

Handling Rate Limit Responses

When you receive a 429 response, handle it gracefully:

async function handleRateLimit(response) {
  if (response.status === 429) {
    const retryAfter = response.headers.get('Retry-After');
    const rateLimitInfo = {
      limit: response.headers.get('X-RateLimit-Limit'),
      remaining: response.headers.get('X-RateLimit-Remaining'),
      reset: response.headers.get('X-RateLimit-Reset')
    };
    
    console.log('Rate limited:', rateLimitInfo);
    
    if (retryAfter) {
      console.log(`Waiting ${retryAfter} seconds before retry...`);
      await delay(parseInt(retryAfter) * 1000);
      return true; // Retry the request
    }
    
    // Calculate wait time from reset timestamp
    const resetTime = parseInt(rateLimitInfo.reset) * 1000;
    const waitTime = Math.max(0, resetTime - Date.now());
    
    if (waitTime > 0) {
      console.log(`Waiting ${Math.ceil(waitTime / 1000)} seconds until rate limit resets...`);
      await delay(waitTime);
      return true; // Retry the request
    }
  }
  
  return false; // Don't retry
}

Real-World Examples

GitHub API

GitHub implements sophisticated rate limiting based on authentication:

Anonymous Users:

60 requests per hour
Strict enforcement with clear error messages

Authenticated Users:

5,000 requests per hour
Higher limits for GitHub Apps
Different limits for different API endpoints

Rate Limit Headers:

X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999
X-RateLimit-Reset: 1642233600
X-RateLimit-Used: 1

Twitter API (X)

Twitter's API has different rate limiting tiers:

Free Tier:

300 requests per 15-minute window
1,500 requests per day

Paid Tiers:

Higher limits based on subscription level
Different limits for different endpoints
Real-time rate limit monitoring

Stripe API

Stripe enforces rate limits to prevent accidental double-charging:

Standard Limits:

100 requests per second
Different limits for different operations
Automatic retry logic for failed requests

Rate Limit Response:

{
  "error": {
    "type": "rate_limit_error",
    "message": "Too many requests made to the API too quickly.",
    "retry_after": 60
  }
}

Advanced Rate Limiting Strategies

Dynamic Rate Limiting

Some APIs adjust limits based on user behaviour:

// Dynamic rate limiting based on user reputation
const dynamicLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: (req) => {
    const user = req.user;
    if (!user) return 10; // Anonymous users
    
    // Adjust limits based on user reputation
    if (user.reputation > 1000) return 1000;
    if (user.reputation > 100) return 100;
    if (user.reputation > 10) return 50;
    return 10;
  },
  keyGenerator: (req) => req.user?.id || req.ip
});

Geographic Rate Limiting

Limit requests based on geographic location:

// Rate limiting by country
const geoLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: (req) => {
    const country = req.headers['cf-ipcountry'] || 'unknown';
    
    // Different limits for different countries
    const limits = {
      'US': 1000,
      'GB': 800,
      'DE': 600,
      'default': 100
    };
    
    return limits[country] || limits.default;
  },
  keyGenerator: (req) => req.headers['cf-ipcountry'] || req.ip
});

Endpoint-Specific Limits

Different endpoints can have different rate limits:

// Strict limits for sensitive operations
app.post('/api/login', rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 5, // Only 5 login attempts per 15 minutes
  message: 'Too many login attempts, please try again later.'
}));

// Higher limits for read operations
app.get('/api/public-data', rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 1000 // 1000 requests per minute
}));

// Moderate limits for write operations
app.post('/api/data', rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100 // 100 requests per minute
}));

Monitoring and Analytics

Rate Limit Metrics

Track rate limiting effectiveness:

// Rate limit monitoring middleware
const rateLimitMonitor = (req, res, next) => {
  const start = Date.now();
  
  res.on('finish', () => {
    if (res.statusCode === 429) {
      // Log rate limit violations
      console.log('Rate limit exceeded:', {
        ip: req.ip,
        userAgent: req.get('User-Agent'),
        path: req.path,
        method: req.method,
        timestamp: new Date().toISOString()
      });
      
      // Send metrics to monitoring service
      metrics.increment('rate_limit.violations', {
        ip: req.ip,
        path: req.path
      });
    }
  });
  
  next();
};

app.use(rateLimitMonitor);

Rate Limit Analytics

Analyse rate limiting patterns:

// Track rate limit usage
const trackRateLimitUsage = (req, res, next) => {
  const key = `rate_limit:${req.ip}`;
  
  res.on('finish', async () => {
    try {
      const remaining = res.get('X-RateLimit-Remaining');
      const limit = res.get('X-RateLimit-Limit');
      
      if (remaining !== undefined && limit !== undefined) {
        const usage = ((limit - remaining) / limit) * 100;
        
        // Track usage patterns
        await redis.hincrby('rate_limit_usage', req.ip, 1);
        await redis.hset('rate_limit_percentages', req.ip, usage);
      }
    } catch (error) {
      console.error('Error tracking rate limit usage:', error);
    }
  });
  
  next();
};

Best Practices

For API Providers

Set Appropriate Limits

Base limits on your system's capacity
Consider different user tiers
Monitor and adjust based on usage patterns

Provide Clear Feedback

Include helpful rate limit headers
Give specific retry-after times
Explain why limits exist

Implement Graceful Degradation

Allow some requests to exceed limits during emergencies
Provide alternative endpoints for high-volume users
Consider implementing request queuing

For API Consumers

Plan Your Requests

Understand the rate limits before implementation
Implement proper error handling
Use caching to minimise API calls

Monitor Your Usage

Track rate limit responses
Implement alerting for approaching limits
Plan for rate limit increases

Be a Good Citizen

Don't try to circumvent rate limits
Contact API providers for higher limits if needed
Report bugs or issues responsibly

Conclusion

API rate limiting is the traffic control system of the internet. Without it, APIs would crash under overload, just like a coffee shop with 100 customers yelling at once.

The key to working effectively with rate-limited APIs is:

Understand the Limits: Know what limits exist and why they're in place
Implement Proper Handling: Use exponential backoff, caching, and batching
Monitor and Adapt: Track your usage and adjust your approach
Be Respectful: Don't try to game the system or exceed reasonable limits

So if you ever see a 429 Too Many Requests, don't panic - it just means the bouncer is doing their job. Wait a little, try again later, or find smarter ways to use the API.

Rate limiting exists to keep the internet fair, fast, and secure for everyone. By understanding and working with these limits, you can build robust applications that respect API boundaries while maximising their value.

Learn

Learn

API Rate Limiting

API Rate Limiting

What Is Rate Limiting?

Why Do APIs Limit You?

Fairness

Performance

Security

Cost Control

How Rate Limiting Works

Request Tracking

Rate Limit Headers

Rate Limiting Algorithms

Rate Limiting Implementation

Basic Rate Limiting Middleware

User-Specific Rate Limiting

Custom Rate Limiting Logic

Working with Rate-Limited APIs

Respecting Rate Limits

Handling Rate Limit Responses

Real-World Examples

GitHub API

Twitter API (X)

Stripe API

Advanced Rate Limiting Strategies

Dynamic Rate Limiting

Geographic Rate Limiting

Endpoint-Specific Limits

Monitoring and Analytics

Rate Limit Metrics

Rate Limit Analytics

Best Practices

For API Providers

For API Consumers

Conclusion

Further Reading