Lesson 2.3: Synchronizing physical clocks

Clock Synchronization in Distributed Systems

1. The Core Problem

Every computer in a distributed system has its own physical clock that drifts over time due to:

Hardware limitations: Quartz oscillators have inherent inaccuracies
Environmental factors: Temperature changes affect clock rates
Manufacturing variances: No two clocks are identical

The drift is mathematically expressed as: $(1-ρ)(t'-t) ≤ H_i(t')-H_i(t) ≤ (1+ρ)(t'-t)$ Where:

$H_i$ is the hardware clock
$ρ$ is the maximum drift rate (e.g., $10^{-6}$ for quartz)
$t$ and $t'$ are real times

2. Synchronization Types

Clock synchronization in distributed systems ensures that different computers (nodes) maintain consistent time measurements despite hardware and network variability. There are two primary synchronization types: external synchronization and internal synchronization.

External Synchronization
Ensures all local clocks match an authoritative external time source (e.g., UTC) within a bounded deviation D:

$|S(t) - C_i(t)| < D \quad \text{for all } i \in \{1, 2, ..., N\}$

Key Characteristics:

Requires connection to external time sources like:
- GPS (μs accuracy)
- NTP servers (ms accuracy)
- Atomic clocks (ns accuracy)
Critical for applications needing absolute time:
- Financial transaction logging
- Regulatory compliance systems
- Scientific experiment coordination

Implementation Methods:

Network Time Protocol (NTP):
- Hierarchical stratum model (Stratum 0-15)
- Compensates for network latency using: $T_{corrected} = T_{server} + \frac{(T_2-T_1)-(T_4-T_3)}{2}$
- Typical accuracy: 1-50ms
Precision Time Protocol (PTP):
- Hardware timestamping for μs accuracy
- Master-slave architecture
- Requires specialized network hardware

Internal Synchronization
Maintains consistency between system clocks without UTC reference:

$|C_i(t) - C_j(t)| < D' \quad \text{for all } i,j \in \{1, 2, ..., N\}$

Key Characteristics:

Focuses on relative time consistency
Essential for:
- Distributed transaction ordering
- Event causality tracking
- Consistent snapshots Implementation Methods:

Berkeley Algorithm
Gossip-based Protocols

3. Clock Correctness

Essential Properties:

Bounded Drift: $\left|\frac{dC}{dt} - 1\right| < ρ$ Ensures clocks don't run too fast/slow
Monotonicity: $t' > t ⇒ C(t') > C(t)$ Critical for operations like file timestamps

Failure Modes:

Crash Failure: Clock stops (e.g., power loss)
Arbitrary Failure: Violates monotonicity (e.g., Y2K bug)

4. Synchronization Algorithms

Cristian's Algorithm (Basic Time Service):

Work Flow

Client sends request at $T_1$
Server responds at $T_2$ with $S(T_s)$
Client calculates adjusted time: $T_{adjusted} = S(T_s) + \frac{T_2-T_1}{2}$

Algorithm

The process on the client machine sends the request for fetching clock time(time at the server) to the Clock Server at time $T_0$
The Clock Server listens to the request made by the client process and returns the response in form of clock server time.
The client process fetches the response from the Clock Server at time $T_1$ and calculates the synchronized client clock time using the formula given below. $T_{client} = T_{Server} + \frac{T_1 - T_0}{2}$

Berkeley Algorithm (Internal Sync):

Work Flow

Master collects all clock values $C_1(t),...,C_N(t)$
Computes average: $\bar{C}(t) = \frac{1}{N+1}\left(C_{master} + \sum_{i=1}^N C_i(t)\right)$
Sends individual corrections: $\Delta_i = \bar{C}(t) - C_i(t)$

Algorithm

An individual node is chosen as the master node from a pool node in the network. This node is the main node in the network which acts as a master and the rest of the nodes act as slaves. The master node is chosen using an election process/leader election algorithm.
Master node periodically pings slaves nodes and fetches clock time at them using Cristian's algorithm.
Master node calculates the average time difference between all the clock times received and the clock time given by the master's system clock itself. This average time difference is added to the current time at the master's system clock and broadcasted over the network.

Network Time Protocol (NTP)

Cristian’s method and the Berkeley algorithm are intended primarily for use within intranets. The Network Time Protocol (NTP) [Mills 1995] defines an architecture for a time service and a protocol to distribute time information over the Internet. Features of NTP : Some features of NTP are -

NTP servers have access to highly precise atomic clocks and GPU clocks
It uses Coordinated Universal Time (UTC) to synchronize CPU clock time.
Avoids even having a fraction of vulnerabilities in information exchange communication.
Provides consistent timekeeping for file servers Working of NTP : NTP is a protocol that works over the application layer, it uses a hierarchical system of time resources and provides synchronization within the stratum servers. First, at the topmost level, there is highly accurate time resources' ex. atomic or GPS clocks. These clock resources are called stratum 0 servers, and they are linked to the below NTP server called Stratum 1,2 or 3 and so on. These servers then provide the accurate date and time so that communicating hosts are synced to each other. Advantages of NTP :
It provides internet synchronization between the devices.
It provides enhanced security within the premises.
It is used in the authentication systems like Kerberos.
It provides network acceleration which helps in troubleshooting problems. Used in file systems that are difficult in network synchronization. Disadvantages of NTP :
When the servers are down the sync time is affected across a running communication.
Servers are prone to error due to various time zones and conflict may occur.
Minimal reduction of time accuracy.
When NTP packets are increased synchronization is conflicted.
Manipulation can be done in synchronization.

5. Practical Considerations

Accuracy vs. Cost Tradeoff:

Method	Accuracy	Cost	Use Case
GPS	1μs	High	Financial systems
NTP	1-50ms	Low	Enterprise networks
PTP (IEEE 1588)	1μs	Moderate	Industrial systems

Implementation Challenges:

Network latency variability affects sync accuracy
Security risks from fake time servers
Temperature-induced drift in data centers: $ρ_{effective} = ρ_{base} + 0.001ΔT$

6. Real-World Impact

Consequences of Poor Sync:

Database replication conflicts
Incorrect transaction ordering
Unreliable system logs

Best Practices:

Use hierarchical time sources (stratum servers)
Implement multiple sync protocols as fallback
Monitor clock drift continuously: $\text{Drift Alert Threshold} = ρ × \text{Uptime}$

💡 Critical Insight: The choice between external and internal synchronization depends on whether you need absolute time (for compliance) or just consistent ordering (for coordination).