Probabilistic proof of Queueing Theory
Queueing theory is a branch of probability theory that studies the behavior of systems where entities (customers, jobs, etc.) arrive and wait in a queue to be served. It provides mathematical tools to analyze and optimize the performance of various real-world systems such as call centers, hospitals, computer networks, and manufacturing processes.
A queueing system consists of the following fundamental components:
- Arrival process: The stochastic process describing how entities arrive at the system, characterized by inter-arrival time distributions.
- Service process: The stochastic process describing how long it takes to serve each entity, characterized by service time distributions.
- Queue: The buffer where entities wait before being served.
- Servers: Resources that process entities from the queue.
- Queue capacity (K): The maximum number of entities that can wait in the queue (default: ∞).
- Population size (N): The size of the source population that generates arrivals (default: ∞).
- Queue discipline (D): The rule determining which entity to serve next (default: FIFO - First In, First Out).
To describe queueing systems systematically, Kendall’s notation (proposed by D. G. Kendall in 1953) is widely used. The standard form is A/S/c/K/N/D, where:
- A denotes the arrival process distribution
- S denotes the service time distribution
- c is the number of parallel servers
- K, N, D are often omitted when they take default values (∞, ∞, FIFO)
Common symbols for distributions include:
- M (Markovian/Memoryless): Exponential distribution, implying Poisson arrivals
- D (Deterministic): Fixed, constant values
- G (General): Arbitrary distribution
- E_k (Erlang): Erlang distribution with shape parameter k
In this post, we analyze the M/M/c system, where both the arrival process follows a Poisson process with rate $\lambda$ and the service times follow an exponential distribution with rate $\mu$. We derive the stationary distribution and key performance metrics of this system using probabilistic methods.
M/M/c System
In the M/M/c queueing system, arrivals follow a Poisson process and service times follow an exponential distribution. As discussed in the previous post, when service times are exponentially distributed, the number of service completions follows a Poisson distribution.
The key performance metrics we aim to compute for this system are:
- Average utilization: $\rho$ - the fraction of time each server is busy
- Average number of entities in the system: $L$ - expected queue length plus entities being served
- Average time in the system: $W$ - expected total time an entity spends from arrival to departure
Let $\lambda$ denote the arrival rate and $\mu$ denote the service rate per server. In the stationary (steady-state) distribution, these metrics are given by: \begin{align} \rho &= \frac{\lambda}{c\mu} \newline \pi_0 &= \qty[\frac{(\rho c)^c}{c!(1-\rho)} + \sum_{n=0}^{c-1}\frac{(\rho c)^n}{n!}]^{-1} \newline L &= \frac{\lambda}{\mu} + \frac{\rho(\rho c)^c \pi_0}{c!(1-\rho)^2} \newline W &= \frac{1}{\mu} + \frac{(\rho c)^c\pi_0}{c!c\mu(1-\rho)^2} \end{align}
Derivation of the Stationary Distribution
The system state at time $t$ is characterized by the number of entities $n$ in the system. This system can be modeled as a birth-death process, where arrivals act as “births” and service completions act as “deaths”. The infinitesimal transition rate matrix $Q$ has the following entries: \begin{align} Q_{n,n+1} &= \lambda, \newline Q_{n, n-1} &= \mu \min(n, c), \newline Q_{n, n} &= -\lambda - \mu \min(n, c) \end{align}
The time-dependent state probability distribution $p_n(t)$ evolves according to: \begin{equation} p_n(t) = \qty(\mathbb{P}(0) \exp\qty[Q t])_n \end{equation}
While closed-form solutions exist for special cases ($c = 1$ or $c = \infty$), no general closed-form expression is available for arbitrary $c$. Therefore, we focus on computing the stationary (steady-state) distribution $\pi_n = \lim_{t \to \infty} p_n(t)$, which exists when $\rho < 1$.
The stationary distribution satisfies the balance equation: \begin{align} \pi_n &= (1 - \lambda - \mu \min(n, c))\pi_n + \lambda \pi_{n-1} + \mu \min(n + 1, c) \pi_{n+1} \newline \pi_{n+1} &= \frac{1}{\mu \min(n + 1, c)} \qty[(\lambda + \mu \min(n, c))\pi_n - \lambda \pi_{n-1}] \end{align}
Solving this recursion starting from $n = 0$: \begin{align} \pi_1 &= \frac{\lambda}{\mu}p_0, \newline \pi_2 &= \frac{1}{2}\qty(\frac{\lambda}{\mu})^2 \pi_0, \newline \pi_k &= \frac{1}{k!}\qty(\frac{\lambda}{\mu})^k \pi_0 \quad \text{for } k \le c, \newline \pi_c &= \frac{1}{c!}\qty(\frac{\lambda}{\mu})^c \pi_0, \newline \pi_{c + 1} &= \frac{c^c}{c!} \qty(\frac{\lambda}{\mu c})^{c + 1} \pi_0, \newline \pi_{c + k} &= \frac{c^c}{c!} \qty(\frac{\lambda}{\mu c})^{c + k} \pi_0 \quad \text{for } k \ge 1. \end{align}
Using the normalization condition $\sum_{n=0}^{\infty} \pi_n = 1$, we obtain: \begin{align} \pi_0 &= \qty[\sum_{n=0}^{c-1} \frac{1}{n!} \qty(\frac{\lambda}{\mu})^n + \sum_{k=0}^{\infty} \frac{c^c}{c!} \qty(\frac{\lambda}{\mu c})^{c + k}]^{-1} \newline &= \qty[\sum_{n=0}^{c-1} \frac{1}{n!} \qty(\frac{\lambda}{\mu})^n + \frac{c^c (\lambda/ \mu c)^{c}}{c! (1 - \lambda/\mu c)}]^{-1} \newline &\equiv \qty[\sum_{n=0}^{c-1} \frac{(\rho c)^n}{n!} + \frac{(\rho c)^{c}}{c! (1 - \rho)}]^{-1} \end{align}
Here we define $\rho = \lambda/(c\mu)$, which will turn out to be the average server utilization. With the complete stationary distribution determined, we can now compute the performance metrics.
Average Utilization
Average utilization represents the fraction of time each server is busy serving entities. Under the stationary assumption, the long-run time fraction equals the probability that a server is busy at any given moment.
Let $L_s = \min(n, c)$ denote the number of busy servers when there are $n$ entities in the system. The average utilization is then: \begin{equation} \frac{\mathbb{E}[L_s]}{c} = \sum_{n=0}^{\infty} \frac{\min(n, c)}{c} \pi_n = \sum_{n=0}^{c-1} \frac{n}{c} \pi_n + \sum_{n=c}^{\infty} \pi_n \end{equation}
By carefully rearranging these summations and using the expression for $\pi_0^{-1}$, we can show that the terms cancel to yield a simple result: \begin{align} \sum_{n=0}^{c-1} \frac{n}{c} \pi_n &= \sum_{n=0}^{c-1} \frac{n}{c} \frac{(\rho c)^n}{n!} \pi_0 = \rho \pi_0 \sum_{n=1}^{c-1} \frac{(\rho c)^{n-1}}{(n - 1)!} = \rho \pi_0 \qty[\sum_{n=0}^{c-1} \frac{(\rho c)^n}{n!} - \frac{(\rho c)^{c-1}}{(c-1)!} ] \newline \sum_{n=c}^{\infty} \pi_n &= \rho \pi_0 \sum_{n=c}^{\infty} \frac{c^c}{c!} \rho^{n-1} = \rho \pi_0 \qty[\sum_{n=c}^{\infty} \frac{c^c}{c!} \rho^n + \frac{\rho^{c-1}c^c}{c!} ] \newline \frac{\mathbb{E}[L_s]}{c} &= \rho \pi_0 \qty[\sum_{n=0}^{c-1} \frac{(\rho c)^n}{n!} + \sum_{n=c}^{\infty} \frac{c^c}{c!} \rho^n] = \rho \end{align}
This confirms that $\rho = \lambda/(c\mu)$ is indeed the average server utilization.
Average Number of Entities in the System
Next, we compute the expected total number of entities in the system, including both those being served and those waiting in the queue.
From the previous section, we know that $\rho c$ equals the average number of busy servers, which is also the average number of entities currently being served. Therefore, to find the total average number of entities $L$ in the system, we only need to add the average queue length $L_q$: \begin{equation} \mathbb{E}[L] = \mathbb{E}[L_s] + \mathbb{E}[L_q] = \rho c + \sum_{k=0}^\infty k \, p(c + k) \end{equation}
The second term is a standard application of geometric series summation: \begin{align} \mathbb{E}[L] &= \rho c + \frac{(\rho c)^c}{c!} \pi_0 \sum_{k=0}^\infty k \rho^k = \rho c + \frac{(\rho c)^c}{c!} \pi_0 \frac{\rho}{(1-\rho)^2} \newline &= \rho c + \frac{\rho (\rho c)^c}{c!(1-\rho)^2}\pi_0 \end{align}
Average Time in the System
Similar to decomposing the number of entities, the total time an entity spends in the system can be decomposed into waiting time in the queue and service time: \begin{equation} W = W_s + W_q \end{equation}
The service time distribution is given by the exponential distribution $p_s(t) = \mu e^{-\mu t}$. The waiting time $W_q$ in the queue depends on the queue length at the time of arrival. If there are $q$ entities in the queue (including the arriving entity), then $q + 1$ entities must be served before this entity begins service.
The distribution of time required to serve $k$ entities follows from the convolution of $k$ exponential random variables. As derived in the previous post, the CDF is $P(S_k \leq t)$. Differentiating with respect to $t$ yields the PDF: \begin{align} P(S_k \leq t) &= 1 - e^{-c\mu t} \sum_{i=0}^{k-1} \frac{(c\mu t)^i}{i!} \newline p_{q,k}(t) &= c\mu e^{-c\mu t} \frac{(c\mu t)^{k - 1}}{(k - 1)!} \end{align}
The probability distribution of the total time $T$ in the system can be expressed using conditional probabilities based on the queue length: \begin{align} p(T) &= \sum_{n=0}^{c - 1} p_s(T) \pi_n + \sum_{k=0}^\infty \pi_{c+k} \int_0^T p_{q, k + 1}(t) p_s(T - t) \, dt \newline \end{align}
Computing the expected sojourn time $\mathbb{E}[W] = \int_0^\infty t \, p(t) \, dt$ yields: \begin{align} \mathbb{E}[W] &= \int_0^\infty \dd T p(T) \newline &= \qty(\sum_{n=0}^{c - 1} \pi_n) \int_0^\infty \dd T T p_s(T) \newline &+ \sum_{k=0}^\infty \pi_{c+k} \int_0^\infty T \dd T \int_0^T \dd t p_{q, k + 1}(t) p_s(T - t) \end{align}
The double integral in the second term can be transformed into a sum of two independent integrals through variable separation: \begin{align} &\int_0^\infty T \dd T \int_0^T \dd t \; p_{q,k+1}(t) \; p_s(T -t) = \int_0^\infty \dd t_q \int_0^\infty \dd t_s (t_q + t_s) p_{q,k+1}(t_q) p_s(t_s) \newline &= \int_0^\infty \dd t_q t_q p_{q, k + 1}(t_q) \int_0^\infty \dd t_s p_s(t_s) + \int_0^\infty \dd t_q p_{q, k + 1}(t_q) \int_0^\infty \dd t_s t_s p_s(t_s) \newline &= \int_0^\infty \dd t \; t \; p_{q, k + 1}(t) + \int_0^\infty \dd t \; t \; p_s(t) \newline &= \frac{k + 1}{c\mu} + \frac{1}{\mu} \end{align}
Therefore, the average sojourn time $\mathbb{E}[W]$ is: \begin{align} \mathbb{E}[W] &= \frac{1}{\mu} \sum_{n=0}^{c -1} \pi_n + \sum_{k=0}^\infty \pi_{c+k} \qty[ \frac{k + 1}{c\mu} + \frac{1}{\mu} ] \newline &= \frac{1}{\mu} \qty[\sum_{n=0}^\infty \pi_n] + \frac{1}{c\mu} \sum_{k=0}^\infty (k + 1) \pi_{c+k} \newline &= \frac{1}{\mu} + \frac{1}{c\mu} \frac{(\rho c)^c}{c!} \pi_0 \sum_{k=0}^\infty (k + 1) \rho^k \newline &= \frac{1}{\mu} + \frac{(\rho c)^c}{c!c\mu(1-\rho)^2} \pi_0 \end{align}
Notations
| Symbol | Description | Units |
|---|---|---|
| $\lambda$ | Arrival rate | entities/time |
| $\mu$ | Service rate (per server) | entities/time |
| $c$ | Number of servers | - |
| $\rho$ | Server utilization | dimensionless |
| $L$ | Average entities in system | entities |
| $W$ | Average time in system | time |