Browse Performance and Optimization Patterns

Circuit Breakers: Preventing Cascading Failures

Circuit Breakers are designed to prevent cascading failures in distributed systems by managing the flow of requests to a service that may be failing. They offer a way to gracefully handle errors in a way that allows systems to recover and maintain overall system stability.

Introduction

In distributed systems or microservices architectures, services often communicate with each other over a network, making them inherently susceptible to failure. Circuit Breakers are a critical design pattern used to prevent cascading failures. When one service fails, it might provoke failures in other services due to retry storms or long waiting times, leading to a system-wide failure. Circuit Breakers help in managing these failures by acting as a switch that stops the flow of requests when a service is recognized to be failing.

The Circuit Breaker Pattern in Clojure

Key Components

  1. Closed State: Requests are allowed through, and the Circuit Breaker monitors failures. If failures reach a certain threshold, it trips to the Open state.
  2. Open State: Requests are blocked to prevent further failures. After a timeout period, the Circuit Breaker allows a limited number of test requests and transitions to a Half-Open state.
  3. Half-Open State: A limited number of requests are sent through to test the service health. If they succeed, the breaker goes back to Closed; otherwise, it returns to Open.

Clojure Implementation

 1(ns circuit-breaker.core)
 2
 3(defrecord CircuitBreaker [state failure-count threshold max-timeout timeout])
 4
 5(defn create-circuit-breaker [threshold max-timeout]
 6  (->CircuitBreaker :closed 0 threshold max-timeout 0))
 7
 8(defn current-time [] (System/currentTimeMillis))
 9
10(defn trip-open [breaker]
11  (assoc breaker :state :open :timeout (+ (current-time) (:max-timeout breaker))))
12
13(defn trip-closed [breaker]
14  (assoc breaker :state :closed :failure-count 0))
15
16(defn trip-half-open [breaker]
17  (assoc breaker :state :half-open :failure-count 0))
18
19(defn allow-request? [breaker]
20  (cond
21    (= (:state breaker) :closed) true
22    (= (:state breaker) :open) (>= (current-time) (:timeout breaker))
23    (= (:state breaker) :half-open) true
24    :else false))
25
26(defn handle-failure [breaker]
27  (let [failures (inc (:failure-count breaker))]
28    (if (>= failures (:threshold breaker))
29      (trip-open breaker)
30      (assoc breaker :failure-count failures))))
31
32(defn handle-success [breaker]
33  (if (= (:state breaker) :half-open)
34    (trip-closed breaker)
35    breaker))
36
37(defn make-request
38  [breaker request-fn]
39  (if (allow-request? breaker)
40    (try
41      (request-fn)
42      (handle-success breaker)
43      (catch Exception e
44        (handle-failure breaker)))
45    breaker))

Explanation

  • The CircuitBreaker record holds the state, failure count, threshold, max timeout, and current timeout time.
  • create-circuit-breaker initializes a new Circuit Breaker.
  • trip-open, trip-closed, and trip-half-open manage state transitions.
  • allow-request? determines if a request should be allowed based on the current state.
  • handle-failure increments the failure count and potentially trips the breaker to Open.
  • handle-success moves the breaker back to Closed on successful requests during the Half-Open state.
  • make-request encapsulates the logic for making requests with Circuit Breaker considerations.

Circuit Breaker in Action

Here’s a simple demonstration of using our Circuit Breaker:

 1(defn unreliable-service []
 2  (if (< (rand) 0.8) ; Simulating 80% failure
 3    (throw (Exception. "Service Error"))
 4    (println "Service Call Success")))
 5
 6(let [breaker (create-circuit-breaker 3 5000)]
 7  (dotimes [_ 10]
 8    (println "Request:")
 9    (let [result (make-request breaker unreliable-service)]
10      (println "Breaker State:" (:state result))
11      (println "Failures:" (:failure-count result)))))

Mermaid Diagram

    stateDiagram-v2
	    [*] --> Closed
	    Closed --> Open : Failure > Threshold
	    Open --> HalfOpen : Timeout
	    HalfOpen --> Closed : Success
	    HalfOpen --> Open : Failure
	    Closed --> Closed : Success
	    Open --> Open : Requests Blocked

Diagram Explanation

  • Closed State: Represents normal operation where the service is in a stable condition.
  • Open State: Triggered when failures surpass a set threshold, blocking requests for a defined timeout.
  • Half-Open State: Allows a few requests to test if the service has recovered. Based on success, it either transitions to Closed or stays Open.
  • Retry Pattern: Complements the Circuit Breaker by handling transient failures through controlled retries.
  • Bulkhead Pattern: Partitions resources to prevent a failure in one part of the system from cascading to others.
  • Timeout Pattern: Establishes limits on how long an operation can run before it’s stopped, often working alongside Circuit Breakers.

Additional Resources

Summary

The Circuit Breaker Pattern in Clojure provides a robust mechanism for preventing cascading failures in a distributed system. It helps in maintaining system resilience by controlling error handling over network-bound operations. This pattern is pivotal in designing scalable and fault-tolerant systems, making it an essential part of the toolkit for handling modern cloud-native architectures.