etcd: A database for key-value pairs
Nowadays, many developers are using distributed systems like cloud platforms. As a result, scalable clusters are replacing individual databases, and IT managers are grappling with new challenges. Hot topics include network failures, delays, finite data throughput, small system components, and transport security.
One possible solution is to set up a central location for changeable data that is fail-safe, fault-tolerant, and consistent. That’s where etcd comes in.
- Cost-effective vCPUs and powerful dedicated cores
- Flexibility with no minimum contract
- 24/7 expert support included
What is etcd?
etcd is a distributed key-value store developed by the CoreOS team. Like many other tools that run in a Docker environment, etcd was written in the Google programming language, Go. The developers’ goal was to create a secure storage space for critical data in distributed applications, with straightforward management functions.
The name comes from the convention used to name configuration files in GNU/Linux operating systems: “/etc”. The extra letter “d” stands for “distributed”. etcd is now open source and is managed by the Cloud Native Computing Foundation.
How does etcd work?
To understand etcd, you need to be familiar with three key concepts relating to storage management and clusters:
- Leader
- Elections
- Terms
In Raft-based systems, the cluster elects a leader for a given term. The leader processes all the storage requests that require the consensus of the cluster. Requests that do not require cluster consensus (reads, for example) can be answered by any member of the cluster. If the leader accepts a change, etcd makes sure it replicates the information to the follower nodes. Once the followers have confirmed receipt, the leader commits the changes.
This kind of system, where a leader uses an etcd database to coordinate changes with the nodes in the cluster, is very valuable in distributed applications. If changes would affect how a node operates, the node can block them. This ensures that the application remains stable and minimizes related problems.
If a leader dies or fails to respond within a certain time, the remaining nodes in the cluster elect a new leader. Each node has a random timeout setting that determines how long it waits until calling for a new election and declaring itself as a candidate. These timeout settings are controlled by a special timer within each node and are designed to allow a new node to become the leader as quickly as possible.
To ensure that there is always a majority in an election, there must be an odd number of nodes in the cluster. For performance reasons, clusters should not have more than seven nodes.
You can run etcd on a laptop or in a simple cloud system to try it out. It’s highly recommended to use an SSD, because etcd writes data to the hard disk. For production, refer to the guidelines in the official documentation.
Advantages of etcd
As well as ensuring that applications are stable, etcd offers numerous other advantages:
- Full replication: The entire store is available on each node in the cluster.
- High availability: etcd databases are designed to avoid single points of failure if hardware or network issues arise.
- Consistency: Each read returns the latest write access across all hosts.
- Ease of use: etcd has a well-defined, user-friendly API (gRPC) that is based on REST and JSON.
- Security: etcd automatically implements secure transfers via SSL/TLS and offers optional client certificate authentication.
- Speed: etcd has a base speed of 10,000 writes per second.
- Reliability: The Raft algorithm ensures that the store is always distributed correctly.
An etcd example: The key-value store in practice
Developers implemented etcd in Kubernetes in 2014, leading to a rapid expansion of the etcd community. Cloud providers like AWS, Google Cloud Platform, and Azur followed suit and successfully integrated etcd in their production environments.
But let’s go back to the first etcd example, Kubernetes. Kubernetes itself is a distributed system that runs on a cluster of several machines. This means it has a lot to gain from a distributed data store like etcd that keeps critical data safe. Within Kubernetes, the etcd database acts as the primary data store that contains the configuration data, status, and metadata. When changes are requested, etcd makes sure that all the nodes in the Kubernetes cluster can read and write the data. At the same time, it uses a “watch function” to monitor the actual and ideal state of the system. If the two states diverge, Kubernetes makes the necessary changes to reconcile them.
The “kubectl” command retrieves read values from the etcd database, and changes made with “kubectl apply” create or update entries in the etcd store. System crashes also automatically change values in etcd.