etcd学习笔记

kubernetes底层数据存放在etcd中,这里记录一下学习的笔记。

说明

  • Etcd 是 CoreOS 推出的分布式一致性键值存储,用于共享配置和服务发现
  • Etcd 支持集群模式部署,从而实现自身高可用
  • 本文以CentOS-7.6etcd-v3.3.10为例

etcd安装

二进制文件安装

下载

1
2
3
4
5
6
7
# 下载并解压
wget -q -O - https://github.com/etcd-io/etcd/releases/download/v3.3.10/etcd-v3.3.10-linux-amd64.tar.gz | tar xz
# 查看解压后的文件
ls etcd-v3.3.10-linux-amd64
Documentation etcd etcdctl README-etcdctl.md README.md READMEv2-etcdctl.md
# 将二进制执行文件移动到/usr/local/bin/
mv etcd-v3.3.10-linux-amd64/etcd etcd-v3.3.10-linux-amd64/etcdctl /usr/local/bin/

配置

创建用户

1
2
groupadd -r etcd
useradd -r -g etcd -s /bin/false etcd

创建目录

1
mkdir -p /var/lib/etcd /etc/etcd/

配置文件

创建配置文件etcd.config.yaml,内容如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# This is the configuration file for the etcd server.
# Human-readable name for this member.
name: 'default'
# Path to the data directory.
data-dir: /var/lib/etcd/default.etcd
# Path to the dedicated wal directory.
wal-dir: /var/lib/etcd/wal
# Number of committed transactions to trigger a snapshot to disk.
snapshot-count: 10000
# Time (in milliseconds) of a heartbeat interval.
heartbeat-interval: 100
# Time (in milliseconds) for an election to timeout.
election-timeout: 1000
# Raise alarms when backend size exceeds the given quota. 0 means use the
# default quota.
quota-backend-bytes: 0
# List of comma separated URLs to listen on for peer traffic.
listen-peer-urls: http://localhost:2380
# List of comma separated URLs to listen on for client traffic.
listen-client-urls: http://localhost:2379
# Maximum number of snapshot files to retain (0 is unlimited).
max-snapshots: 5
# Maximum number of wal files to retain (0 is unlimited).
max-wals: 5
# Comma-separated white list of origins for CORS (cross-origin resource sharing).
cors:
# List of this member's peer URLs to advertise to the rest of the cluster.
# The URLs needed to be a comma-separated list.
initial-advertise-peer-urls: http://localhost:2380
# List of this member's client URLs to advertise to the public.
# The URLs needed to be a comma-separated list.
advertise-client-urls: http://localhost:2379
# Discovery URL used to bootstrap the cluster.
discovery:
# Valid values include 'exit', 'proxy'
discovery-fallback: 'proxy'
# HTTP proxy to use for traffic to discovery service.
discovery-proxy:
# DNS domain used to bootstrap initial cluster.
discovery-srv:
# Initial cluster configuration for bootstrapping.
initial-cluster:
# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'etcd-cluster'
# Initial cluster state ('new' or 'existing').
initial-cluster-state: 'new'
# Reject reconfiguration requests that would cause quorum loss.
strict-reconfig-check: false
# Accept etcd V2 client requests
enable-v2: true
# Enable runtime profiling data via HTTP server
enable-pprof: true
# Valid values include 'on', 'readonly', 'off'
proxy: 'off'
# Time (in milliseconds) an endpoint will be held in a failed state.
proxy-failure-wait: 5000
# Time (in milliseconds) of the endpoints refresh interval.
proxy-refresh-interval: 30000
# Time (in milliseconds) for a dial to timeout.
proxy-dial-timeout: 1000
# Time (in milliseconds) for a write to timeout.
proxy-write-timeout: 5000
# Time (in milliseconds) for a read to timeout.
proxy-read-timeout: 0
client-transport-security:
# Path to the client server TLS cert file.
cert-file:
# Path to the client server TLS key file.
key-file:
# Enable client cert authentication.
client-cert-auth: false
# Path to the client server TLS trusted CA cert file.
trusted-ca-file:
# Client TLS using generated certificates
auto-tls: false
peer-transport-security:
# Path to the peer server TLS cert file.
cert-file:
# Path to the peer server TLS key file.
key-file:
# Enable peer client cert authentication.
client-cert-auth: false
# Path to the peer server TLS trusted CA cert file.
trusted-ca-file:
# Peer TLS using generated certificates.
auto-tls: false
# Enable debug-level logging for etcd.
debug: false
logger: zap
# Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd.
log-outputs: [stderr]
# Force to create a new one member cluster.
force-new-cluster: false
auto-compaction-mode: periodic
auto-compaction-retention: "1"
# Set level of detail for exported metrics, specify 'extensive' to include histogram metrics.
# default is 'basic'
metrics: 'basic'

创建服务文件

使用systemd托管etcd的服务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
cat > /usr/lib/systemd/system/etcd.service <<EOF
[Unit]
Description=etcd key-value store
Documentation=https://github.com/etcd-io/etcd
After=network.target

[Service]
User=etcd
Type=notify
ExecStart=/usr/local/bin/etcd --config-file /etc/etcd/etcd.config.yaml
Restart=always
RestartSec=10s
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target
EOF

运行etcd

1
2
3
chown -R etcd:etcd /var/lib/etcd /etc/etcd
systemctl daemon-reload
systemctl start etcd.service

验证etcd服务

1
2
3
etcdctl cluster-health
member 8e9e05c52164694d is healthy: got healthy result from http://localhost:2379
cluster is healthy

etcd集群部署

构建集群的方式

静态发现

预先已知 Etcd 集群中有哪些节点,在启动时直接指定好 Etcd 的各个 node 节点地址

动态发现

通过已有的 Etcd 集群作为数据交互点,然后在扩展新的集群时实现通过已有集群进行服务发现的机制

DNS动态发现

通过 DNS 查询方式获取其他节点地址信息

节点信息

这里只提供静态发现部署etcd集群的流程
IP地址主机名CPU内存
172.16.80.201etcd148G
172.16.80.202etcd248G
172.16.80.203etcd348G

静态发现部署etcd集群

创建配置文件

etcd1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# This is the configuration file for the etcd server.
# Human-readable name for this member.
name: 'etcd1'
# Path to the data directory.
data-dir: /var/lib/etcd/etcd1.etcd
# Path to the dedicated wal directory.
wal-dir: /var/lib/etcd/wal
# Number of committed transactions to trigger a snapshot to disk.
snapshot-count: 10000
# Time (in milliseconds) of a heartbeat interval.
heartbeat-interval: 100
# Time (in milliseconds) for an election to timeout.
election-timeout: 1000
# Raise alarms when backend size exceeds the given quota. 0 means use the
# default quota.
quota-backend-bytes: 0
# List of comma separated URLs to listen on for peer traffic.
listen-peer-urls: 'http://172.16.80.201:2380'
# List of comma separated URLs to listen on for client traffic.
listen-client-urls: 'http://172.16.80.201:2379,http://127.0.0.1:2379'
# Maximum number of snapshot files to retain (0 is unlimited).
max-snapshots: 5
# Maximum number of wal files to retain (0 is unlimited).
max-wals: 5
# Comma-separated white list of origins for CORS (cross-origin resource sharing).
cors:
# List of this member's peer URLs to advertise to the rest of the cluster.
# The URLs needed to be a comma-separated list.
initial-advertise-peer-urls: 'http://172.16.80.201:2380'
# List of this member's client URLs to advertise to the public.
# The URLs needed to be a comma-separated list.
advertise-client-urls: 'http://172.16.80.201:2379'
# Discovery URL used to bootstrap the cluster.
discovery:
# Valid values include 'exit', 'proxy'
discovery-fallback: 'proxy'
# HTTP proxy to use for traffic to discovery service.
discovery-proxy:
# DNS domain used to bootstrap initial cluster.
discovery-srv:
# Initial cluster configuration for bootstrapping.
initial-cluster: 'etcd1=http://172.16.80.201:2380,etcd2=http://172.16.80.202:2380,etcd3=http://172.16.80.203:2380'
# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'etcd-cluster'
# Initial cluster state ('new' or 'existing').
initial-cluster-state: 'new'
# Reject reconfiguration requests that would cause quorum loss.
strict-reconfig-check: false
# Accept etcd V2 client requests
enable-v2: true
# Enable runtime profiling data via HTTP server
enable-pprof: true
# Valid values include 'on', 'readonly', 'off'
proxy: 'off'
# Time (in milliseconds) an endpoint will be held in a failed state.
proxy-failure-wait: 5000
# Time (in milliseconds) of the endpoints refresh interval.
proxy-refresh-interval: 30000
# Time (in milliseconds) for a dial to timeout.
proxy-dial-timeout: 1000
# Time (in milliseconds) for a write to timeout.
proxy-write-timeout: 5000
# Time (in milliseconds) for a read to timeout.
proxy-read-timeout: 0
client-transport-security:
# Path to the client server TLS cert file.
cert-file:
# Path to the client server TLS key file.
key-file:
# Enable client cert authentication.
client-cert-auth: false
# Path to the client server TLS trusted CA cert file.
trusted-ca-file:
# Client TLS using generated certificates
auto-tls: false
peer-transport-security:
# Path to the peer server TLS cert file.
cert-file:
# Path to the peer server TLS key file.
key-file:
# Enable peer client cert authentication.
client-cert-auth: false
# Path to the peer server TLS trusted CA cert file.
trusted-ca-file:
# Peer TLS using generated certificates.
auto-tls: false
# Enable debug-level logging for etcd.
debug: false
logger: zap
# Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd.
log-outputs: ['stderr']
# Force to create a new one member cluster.
force-new-cluster: false
auto-compaction-mode: periodic
auto-compaction-retention: "1"
# Set level of detail for exported metrics, specify 'extensive' to include histogram metrics.
# default is 'basic'
metrics: 'basic'

etcd2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# This is the configuration file for the etcd server.
# Human-readable name for this member.
name: 'etcd2'
# Path to the data directory.
data-dir: /var/lib/etcd/etcd2.etcd
# Path to the dedicated wal directory.
wal-dir: /var/lib/etcd/wal
# Number of committed transactions to trigger a snapshot to disk.
snapshot-count: 10000
# Time (in milliseconds) of a heartbeat interval.
heartbeat-interval: 100
# Time (in milliseconds) for an election to timeout.
election-timeout: 1000
# Raise alarms when backend size exceeds the given quota. 0 means use the
# default quota.
quota-backend-bytes: 0
# List of comma separated URLs to listen on for peer traffic.
listen-peer-urls: 'http://172.16.80.202:2380'
# List of comma separated URLs to listen on for client traffic.
listen-client-urls: 'http://172.16.80.202:2379,http://127.0.0.1:2379'
# Maximum number of snapshot files to retain (0 is unlimited).
max-snapshots: 5
# Maximum number of wal files to retain (0 is unlimited).
max-wals: 5
# Comma-separated white list of origins for CORS (cross-origin resource sharing).
cors:
# List of this member's peer URLs to advertise to the rest of the cluster.
# The URLs needed to be a comma-separated list.
initial-advertise-peer-urls: 'http://172.16.80.202:2380'
# List of this member's client URLs to advertise to the public.
# The URLs needed to be a comma-separated list.
advertise-client-urls: 'http://172.16.80.202:2379'
# Discovery URL used to bootstrap the cluster.
discovery:
# Valid values include 'exit', 'proxy'
discovery-fallback: 'proxy'
# HTTP proxy to use for traffic to discovery service.
discovery-proxy:
# DNS domain used to bootstrap initial cluster.
discovery-srv:
# Initial cluster configuration for bootstrapping.
initial-cluster: 'etcd1=http://172.16.80.201:2380,etcd2=http://172.16.80.202:2380,etcd3=http://172.16.80.203:2380'
# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'etcd-cluster'
# Initial cluster state ('new' or 'existing').
initial-cluster-state: 'new'
# Reject reconfiguration requests that would cause quorum loss.
strict-reconfig-check: false
# Accept etcd V2 client requests
enable-v2: true
# Enable runtime profiling data via HTTP server
enable-pprof: true
# Valid values include 'on', 'readonly', 'off'
proxy: 'off'
# Time (in milliseconds) an endpoint will be held in a failed state.
proxy-failure-wait: 5000
# Time (in milliseconds) of the endpoints refresh interval.
proxy-refresh-interval: 30000
# Time (in milliseconds) for a dial to timeout.
proxy-dial-timeout: 1000
# Time (in milliseconds) for a write to timeout.
proxy-write-timeout: 5000
# Time (in milliseconds) for a read to timeout.
proxy-read-timeout: 0
client-transport-security:
# Path to the client server TLS cert file.
cert-file:
# Path to the client server TLS key file.
key-file:
# Enable client cert authentication.
client-cert-auth: false
# Path to the client server TLS trusted CA cert file.
trusted-ca-file:
# Client TLS using generated certificates
auto-tls: false
peer-transport-security:
# Path to the peer server TLS cert file.
cert-file:
# Path to the peer server TLS key file.
key-file:
# Enable peer client cert authentication.
client-cert-auth: false
# Path to the peer server TLS trusted CA cert file.
trusted-ca-file:
# Peer TLS using generated certificates.
auto-tls: false
# Enable debug-level logging for etcd.
debug: false
logger: zap
# Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd.
log-outputs: [stderr]
# Force to create a new one member cluster.
force-new-cluster: false
auto-compaction-mode: periodic
auto-compaction-retention: "1"
# Set level of detail for exported metrics, specify 'extensive' to include histogram metrics.
# default is 'basic'
metrics: 'basic'

etcd3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# This is the configuration file for the etcd server.
# Human-readable name for this member.
name: 'etcd3'
# Path to the data directory.
data-dir: /var/lib/etcd/etcd3.etcd
# Path to the dedicated wal directory.
wal-dir: /var/lib/etcd/wal
# Number of committed transactions to trigger a snapshot to disk.
snapshot-count: 10000
# Time (in milliseconds) of a heartbeat interval.
heartbeat-interval: 100
# Time (in milliseconds) for an election to timeout.
election-timeout: 1000
# Raise alarms when backend size exceeds the given quota. 0 means use the
# default quota.
quota-backend-bytes: 0
# List of comma separated URLs to listen on for peer traffic.
listen-peer-urls: 'http://172.16.80.203:2380'
# List of comma separated URLs to listen on for client traffic.
listen-client-urls: 'http://172.16.80.203:2379,http://127.0.0.1:2379'
# Maximum number of snapshot files to retain (0 is unlimited).
max-snapshots: 5
# Maximum number of wal files to retain (0 is unlimited).
max-wals: 5
# Comma-separated white list of origins for CORS (cross-origin resource sharing).
cors:
# List of this member's peer URLs to advertise to the rest of the cluster.
# The URLs needed to be a comma-separated list.
initial-advertise-peer-urls: 'http://172.16.80.203:2380'
# List of this member's client URLs to advertise to the public.
# The URLs needed to be a comma-separated list.
advertise-client-urls: 'http://172.16.80.203:2379'
# Discovery URL used to bootstrap the cluster.
discovery:
# Valid values include 'exit', 'proxy'
discovery-fallback: 'proxy'
# HTTP proxy to use for traffic to discovery service.
discovery-proxy:
# DNS domain used to bootstrap initial cluster.
discovery-srv:
# Initial cluster configuration for bootstrapping.
initial-cluster: 'etcd1=http://172.16.80.201:2380,etcd2=http://172.16.80.202:2380,etcd3=http://172.16.80.203:2380'
# Initial cluster token for the etcd cluster during bootstrap.
initial-cluster-token: 'etcd-cluster'
# Initial cluster state ('new' or 'existing').
initial-cluster-state: 'new'
# Reject reconfiguration requests that would cause quorum loss.
strict-reconfig-check: false
# Accept etcd V2 client requests
enable-v2: true
# Enable runtime profiling data via HTTP server
enable-pprof: true
# Valid values include 'on', 'readonly', 'off'
proxy: 'off'
# Time (in milliseconds) an endpoint will be held in a failed state.
proxy-failure-wait: 5000
# Time (in milliseconds) of the endpoints refresh interval.
proxy-refresh-interval: 30000
# Time (in milliseconds) for a dial to timeout.
proxy-dial-timeout: 1000
# Time (in milliseconds) for a write to timeout.
proxy-write-timeout: 5000
# Time (in milliseconds) for a read to timeout.
proxy-read-timeout: 0
client-transport-security:
# Path to the client server TLS cert file.
cert-file:
# Path to the client server TLS key file.
key-file:
# Enable client cert authentication.
client-cert-auth: false
# Path to the client server TLS trusted CA cert file.
trusted-ca-file:
# Client TLS using generated certificates
auto-tls: false
peer-transport-security:
# Path to the peer server TLS cert file.
cert-file:
# Path to the peer server TLS key file.
key-file:
# Enable peer client cert authentication.
client-cert-auth: false
# Path to the peer server TLS trusted CA cert file.
trusted-ca-file:
# Peer TLS using generated certificates.
auto-tls: false
# Enable debug-level logging for etcd.
debug: false
logger: zap
# Specify 'stdout' or 'stderr' to skip journald logging even when running under systemd.
log-outputs: [stderr]
# Force to create a new one member cluster.
force-new-cluster: false
auto-compaction-mode: periodic
auto-compaction-retention: "1"
# Set level of detail for exported metrics, specify 'extensive' to include histogram metrics.
# default is 'basic'
metrics: 'basic'

启动etcd集群

1
2
3
4
for NODE in 172.16.80.201 172.16.80.202 172.16.80.203;do
ssh $NODE systemctl enable etcd
ssh $NODE systemctl start etcd &
done

检查etcd集群

1
2
3
4
5
6
7
8
9
10
11
12
export ETCDCTL_API=2
etcdctl --endpoints 'http://172.16.80.201:2379,http://172.16.80.202:2379,http://172.16.80.202:2379' cluster-health
member 222fd3b0bb4a5931 is healthy: got healthy result from http://172.16.80.203:2379
member 8349ef180b115a83 is healthy: got healthy result from http://172.16.80.201:2379
member f525d2d797a7c465 is healthy: got healthy result from http://172.16.80.202:2379
cluster is healthy

export ETCDCTL_API=3
etcdctl --endpoints='http://172.16.80.201:2379,http://172.16.80.202:2379,http://172.16.80.202:2379' endpoint health
http://172.16.80.201:2379 is healthy: successfully committed proposal: took = 2.879402ms
http://172.16.80.203:2379 is healthy: successfully committed proposal: took = 6.708566ms
http://172.16.80.202:2379 is healthy: successfully committed proposal: took = 7.187607ms

SSL/TLS加密

此段翻译自官方文档

etcd支持自动TLS、客户端证书身份认证、客户端到服务器端以及对等集群的加密通信

生成证书

为方便起见,这里使用CFSSL工具生成证书

下载CFSSL

1
2
3
4
5
mkdir ~/bin
curl -s -L -o ~/bin/cfssl https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
curl -s -L -o ~/bin/cfssljson https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
chmod +x ~/bin/{cfssl,cfssljson}
export PATH=$PATH:~/bin

创建工作目录

1
2
mkdir ~/cfssl
cd ~/cfssl

创建默认配置文件

1
2
cfssl print-defaults config > ca-config.json
cfssl print-defaults csr > ca-csr.json

证书类型介绍

  • 客户端证书用于服务器验证客户端身份
  • 服务器端证书用于客户端验证服务器端身份
  • 对等证书由etcd集群成员使用,同时使用客户端认证服务器端认证

配置CA

修改ca-config.json

说明

  • expiry定义过期时间,这里的43800h为5年
  • usages字段定义用途
    • signing代表可以用于签发其他证书
    • key encipherment代表将密钥加密
    • server auth
    • client auth
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
"signing": {
"default": {
"expiry": "43800h"
},
"profiles": {
"server": {
"expiry": "43800h",
"usages": [
"signing",
"key encipherment",
"server auth"
]
},
"client": {
"expiry": "43800h",
"usages": [
"signing",
"key encipherment",
"client auth"
]
},
"peer": {
"expiry": "43800h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
}
}
}
}

配置证书请求

修改ca-csr.json,可以根据自己的需求修改对应字段

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
"CN": "My own CA",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "US",
"L": "CA",
"O": "My Company Name",
"ST": "San Francisco",
"OU": "Org Unit 1",
"OU": "Org Unit 2"
}
]
}

生成CA证书

运行以下命令生成CA证书

1
cfssl gencert -initca ca-csr.json | cfssljson -bare ca -

生成以下文件

1
2
3
ca-key.pem
ca.csr
ca.pem
  • ca-key.pem为CA的私钥,请妥善保管
  • csr文件为证书请求文件,可以删除

生成服务器端证书

1
cfssl print-defaults csr > server.json

修改server.jsonCNhosts字段,names字段按需修改

说明

  • hosts字段为列表,服务器端需要将自己作为客户端访问集群,可以使用hostname或者IP地址的形式定义hosts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
"CN": "example.net",
"hosts": [
"127.0.0.1",
"192.168.1.1",
"ext.example.com",
"coreos1.local",
"coreos1"
],
"key": {
"algo": "ecdsa",
"size": 256
},
"names": [
{
"C": "US",
"L": "CA",
"ST": "San Francisco"
}
]
}

创建服务器端证书和私钥

1
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server server.json | cfssljson -bare server

生成以下文件

1
2
3
server-key.pem
server.csr
server.pem

生成客户端证书

1
cfssl print-defaults csr > client.json

修改client.json,客户端证书不需要hosts字段,只需要CN字段设置为client

1
2
3
4
...
"CN": "client",
"hosts": [""],
...

创建客户端证书和私钥

1
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client

生成以下文件

1
2
3
client-key.pem
client.csr
client.pem

生成对等证书

1
cfssl print-defaults csr > member1.json

修改member1.jsonCN字段和hosts字段

1
2
3
4
5
6
7
8
9
...
"CN": "member1",
"hosts": [
"192.168.122.101",
"ext.example.com",
"member1.local",
"member1"
],
...

创建member1的证书和密钥

1
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer member1.json | cfssljson -bare member1

生成以下文件

1
2
3
member1-key.pem
member1.csr
member1.pem

对于多个member需要重复此操作,用于生成相对应的对等证书

验证证书

1
2
3
4
openssl x509 -in ca.pem -text -noout
openssl x509 -in server.pem -text -noout
openssl x509 -in client.pem -text -noout
openssl x509 -in member1.pem -text -noout

示例1、客户端使用HTTPS传输数据给服务器端

准备CA证书ca.pem,密钥对server.pem server-key.pem

启动服务器端

启动参数如下

1
2
3
4
5
6
etcd --name infra0 \
--data-dir /var/lib/etcd/infra0 \
--cert-file=/path/to/server.pem \
--key-file=/path/to/server-key.pem \
--advertise-client-urls=https://127.0.0.1:2379 \
--listen-client-urls=https://127.0.0.1:2379

客户端使用HTTPS访问服务器端

使用curl加载CA证书测试HTTPS连接

1
2
3
4
5
curl --cacert /path/to/ca.pem \
https://127.0.0.1:2379/v2/keys/foo \
-X PUT \
-d value=bar \
-v

示例2、客户端使用客户端证书作为身份验证访问服务器端

在示例1的基础上,需要客户端证书client.pemclient-key.pem

启动服务器端

启动参数如下,这里比示例1多了client-cert-authtruested-ca-file

1
2
3
4
5
6
7
8
etcd --name infra0 \
--data-dir /var/lib/etcd/infra0 \
--cert-file=/path/to/server.pem \
--key-file=/path/to/server-key.pem \
--advertise-client-urls=https://127.0.0.1:2379 \
--listen-client-urls=https://127.0.0.1:2379 \
--client-cert-auth \
--trusted-ca-file=/path/to/ca.crt

重复示例1的访问

1
curl --cacert /path/to/ca.crt https://127.0.0.1:2379/v2/keys/foo -XPUT -d value=bar -v

此命令结果会提示被服务器端拒绝

1
2
3
...
routines:SSL3_READ_BYTES:sslv3 alert bad certificate
...

使用客户端证书访问服务器端

1
2
3
4
5
6
7
curl --cacert /path/to/ca.pem \
--cert /path/to/client.pem \
--key /path/to/client-key.pem \
-L https://127.0.0.1:2379/v2/keys/foo \
-X PUT \
-d value=bar \
-v

命令结果包含以下信息

  • 身份认证成功
1
2
3
4
...
SSLv3, TLS handshake, CERT verify (15):
...
TLS handshake, Finished (20)

示例3、在集群中传输安全和客户端证书

这里需要为每个member配备对应的member证书,操作步骤见生成证书部分

假设有2个member,这两个member都已生成对应的证书(member1.pemmember1-key.pemmember2.pemmember2-key.pem)

etcd 成员将组成一个集群,集群中成员之间的所有通信将使用客户端证书进行加密和验证。

etcd的输出将显示其连接的地址使用HTTPS。

启动服务器端

https://discovery.etcd.io/new获取discovery_url作为启动集群的发现服务

发现服务可以在内网环境搭建,详见github地址

1
DISCOVERY_URL=$(curl https://discovery.etcd.io/new)

member1

1
2
3
4
5
6
7
8
9
etcd --name infra1 \
--data-dir /var/lib/etcd/infra1 \
--peer-client-cert-auth \
--peer-trusted-ca-file=/path/to/ca.pem \
--peer-cert-file=/path/to/member1.pem \
--peer-key-file=/path/to/member1-key.pem \
--initial-advertise-peer-urls=https://10.0.1.11:2380 \
--listen-peer-urls=https://10.0.1.11:2380 \
--discovery ${DISCOVERY_URL}

member2

1
2
3
4
5
6
7
8
9
etcd --name infra2 \
--data-dir /var/lib/etcd/infra2 \
--peer-client-cert-auth \
--peer-trusted-ca-file=/path/to/ca.pem \
--peer-cert-file=/path/to/member2.pem \
--peer-key-file=/path/to/member2-key.pem \
--initial-advertise-peer-urls=https://10.0.1.12:2380 \
--listen-peer-urls=https://10.0.1.12:2380 \
--discovery ${DISCOVERY_URL}

示例4、自动自签名

对于只需要加密传输数据而不需要身份验证的场景,etcd支持使用自动生成的自签名证书加密传输数据

启动服务器端

1
DISCOVERY_URL=$(curl https://discovery.etcd.io/new)

member1

1
2
3
4
5
6
7
etcd --name infra1 \
--data-dir /var/lib/etcd/infra1 \
--auto-tls \
--peer-auto-tls \
--initial-advertise-peer-urls=https://10.0.1.11:2380 \
--listen-peer-urls=https://10.0.1.11:2380 \
--discovery ${DISCOVERY_URL}

member2

1
2
3
4
5
6
7
etcd --name infra2 \
--data-dir /var/lib/etcd/infra2 \
--auto-tls \
--peer-auto-tls \
--initial-advertise-peer-urls=https://10.0.1.12:2380 \
--listen-peer-urls=https://10.0.1.12 :2380 \
--discovery ${DISCOVERY_URL}

注意

由于自签名证书不会进行身份认证,因此curl会返回错误,因此需要添加-k参数禁用证书链检查

etcd维护操作

查看所有的key

1
2
export ETCDCTL_API=3
etcdctl get / --prefix --keys-only

请求最大字节数

官网文档说明

max-request-bytes限制请求的大小,默认值是1572864,即1.5M。在某些场景可能会出现请求过大导致无法写入的情况,可以调大到10485760即10M。

快照条目数量调整

--snapshot-count:指定有多少事务(transaction)被提交时,触发截取快照保存到磁盘。从v3.2开始,--snapshot-count的默认值已从10000更改为100000。

注意

此参数具体数值可以通过根据实际情况调整

过低会带来频繁的IO压力,影响集群可用性和写入吞吐量。

过高则导致内存占用过高以及会让etcd的GC变慢

历史数据压缩(针对v3的API)

由于etcd保存了key的历史记录,因此能通过MVCC机制获取多版本的数据,需要定期压缩历史记录避免性能下降和空间耗尽。

到达上限阈值时,集群将处于只读和只能删除key的状态,无法写操作。

历史数据压缩只是针对数据的历史版本进行清理,清理之后只能读取到清理点之后的历史版本

手动压缩

清理revision为3之前的历史数据

1
2
export ETCDCTL_API=3
etcdctl compact 3

清理之后,访问revision3之前的数据会提示不存在

1
2
3
export ETCDCTL_API=3
etcdctl get KEY_NAME --rev=2
Error: etcdserver: mvcc: required revision has been compacted

自动压缩

启动参数中添加--auto-compaction-retention=1即为每小时压缩一次

碎片整理(针对v3的API)

在数据压缩操作之后,旧的revision被压缩,会产生内部碎片,这些内部碎片可以被etcd使用,但是仍消耗磁盘空间。

碎片整理就是将这部分空间释放出来。

1
2
export ETCDCTL_API=3
etcdctl defrag

空间配额

etcd通过--quota-backend-bytes参数来限制etcd数据库的大小,以字节为单位。

默认是2147483648即2GB,最大值为8589934592即8GB。

容量限制见官方文档

数据备份(针对v3的API)

快照备份

通过快照etcd集群可以作备份数据的用途。

可以通过快照备份的数据,将etcd集群恢复到快照的时间点。

1
2
export ETCDCTL_API=3
etcdctl snapshot save /path/to/snapshot.db

检查快照状态

1
2
3
4
5
6
etcd --write-out=table snapshot status /path/to/snapshot.db
+----------+----------+------------+------------+
| HASH | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| dd97719a | 24276 | 1113 | 3.0 MB |
+----------+----------+------------+------------+

基于快照的定期备份脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
#!/bin/sh
TIME=$(date +%Y%m%d)
HOUR=$(date +%H)
BACKUP_DIR="/data/etcd_backup/${TIME}"
mkdir -p $BACKUP_DIR
export ETCDCTL_API=3
/usr/local/bin/etcdctl --cacert=/etc/etcd/ssl/etcd-ca.pem \
--cert=/etc/etcd/ssl/etcd-client.pem \
--key=/etc/etcd/ssl/etcd-client-key.pem \
--endpoints=https://member1:2379,https://member2:2379,https://member3:2379 \
snapshot save $BACKUP_DIR/snapshot-${HOUR}.db
# 清理2天前的etcd备份
find /data/etcd_backup -type d -mtime +2 -exec rm -rf {} \;

etcd镜像集群(针对v3的API)

通过mirror-maker实时做镜像的方式同步数据,如果出现主机房服务挂了可以通过切换域名的形式切换到灾备机房;这个过程中数据是可以保持一致的。

提前部署好两套etcd集群之后,可以在主集群上面运行以下命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
export ETCDCTL_API=3
etcdctl make-mirror --no-dest-prefix=true http://mirror1:2379,http://mirror2:2379,http://mirror3:2379
# 输出示例
488
546
604
662
720
778
836
894
950
1009
1067
1125
1183
1241

make-mirror的输出为30s一次,程序为前台运行,可以通过nohup >/path/to/log 2>&1 &的方式扔到后台运行

etcd监控

debug endpoint

  • 启动参数中添加--debug即可打开debug模式,etcd会在http://x.x.x.x:2379/debug路径下输出debug信息。

  • 由于debug信息很多,会导致性能下降。

  • /debug/pprof为go语言runtime的endpoint,可以用于分析CPU、heap、mutex和goroutine利用率。

这里示例为使用go命令获取etcd最耗时的操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
$ go tool pprof http://127.0.0.1:2379/debug/pprof/profile
Fetching profile over HTTP from http://127.0.0.1:2379/debug/pprof/profile
Saved profile in /root/pprof/pprof.etcd-3.2.24.samples.cpu.001.pb.gz
File: etcd-3.2.24
Type: cpu
Time: Feb 10, 2019 at 9:57pm (CST)
Duration: 30s, Total samples = 60ms ( 0.2%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
(pprof)
(pprof) top10
Showing nodes accounting for 60ms, 100% of 60ms total
Showing top 10 nodes out of 25
flat flat% sum% cum cum%
60ms 100% 100% 60ms 100% runtime.futex
0 0% 100% 10ms 16.67% github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*raftNode).start.func1
0 0% 100% 10ms 16.67% github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/etcdserver.(*raftNode).tick
0 0% 100% 10ms 16.67% github.com/coreos/etcd/cmd/vendor/github.com/coreos/etcd/raft.(*node).Tick
0 0% 100% 20ms 33.33% runtime.chansend
0 0% 100% 30ms 50.00% runtime.exitsyscall
0 0% 100% 30ms 50.00% runtime.exitsyscallfast
0 0% 100% 30ms 50.00% runtime.exitsyscallfast.func1
0 0% 100% 30ms 50.00% runtime.exitsyscallfast_pidle
0 0% 100% 60ms 100% runtime.futexwakeup
(pprof) exit

metrics endpoint

每个etcd节点都会在/metrics路径下输出监控信息,监控软件可以通过此路径获取指标信息

具体的metrics信息可以参看官方文档

  • --listen-metrics-urls定义metrics的location。
  • --metrics可以定义basicextensive

这里通过curl命令来获取metrics信息

1
curl http://127.0.0.1:2379/metrics

health check

这里通过curl命令来获取health信息,返回结果为json

1
curl http://127.0.0.1:2379/health

返回结果如下

1
2
3
{
"health": "true"
}

对接Prometheus

配置文件

HTTP

1
2
3
4
5
6
global:
scrape_interval: 10s
scrape_configs:
- job_name: etcd-cluster-monitoring
static_configs:
- targets: ['10.240.0.32:2379','10.240.0.33:2379','10.240.0.34:2379']

HTTPS

1
2
3
4
5
6
7
8
9
10
11
12
13
global:
scrape_interval: 10s
scrape_configs:
- job_name: etcd-cluster-monitoring
static_configs:
- targets: ['10.240.0.32:2379','10.240.0.33:2379','10.240.0.34:2379']
scheme: https
tls_config:
# CA certificate to validate API server certificate with.
ca_file: /path/to/etcd-ca.pem
cert_file: /path/to/etcd-cert.pem
key_file: /path/to/etcd-key.pem
# insecure_skip_verify: true | false

监控告警

使用Alertmanager进行监控告警

Prometheus 1.x 范例

Prometheus 2.x 范例

监控指标展示

使用Grafana读取Prometheus的数据展示监控数据,Dashboard模板

img

etcd故障处理

leader节点故障

  • leader节点故障,etcd集群会自动选举出新的leader。
  • 故障检测模型是基于超时的,因此选举新的leader节点不会在旧的leader节点故障之后立刻发生。
  • 选举leader期间,集群不会处理写入操作。选举期间的写入请求会进入队列等待处理,直至选出新的leader节点。
  • 已经发送给故障leader但尚未提交的数据可能会丢失。这是因为新的leader节点有权对旧leader节点的数据进行修改
  • 客户端会发现一些写入请求可能会超时,没有提交的数据会丢失。

follower节点故障

  • follower故障节点数量少于集群节点的一半时,etcd集群是可以正常工作的。
    • 例如3个节点故障了1个,5个节点故障了2个
  • follower节点故障后,客户端的etcd库应该自动连接到etcd集群的其他成员。

超过半数节点故障

  • 由于Raft算法的原理所限,超过半数的集群节点故障会导致etcd集群进入不可写入的状态。
  • 只要正常工作的节点超过集群节点的一半,那么etcd集群会自动选举leader节点并且自动恢复到健康状态
  • 如果无法修复多数节点,那么就需要走灾难恢复的操作流程

网络分区

  • 由于网络故障,导致etcd集群被切分成两个或者更多的部分。
  • 那么占有多数节点的一方会成为可用集群,少数节点的一方不可写入。
  • 如果对等切分了集群,那么每个部分都不可用。
  • 这是因为Raft一致性算法保证了etcd是不存在脑裂现象。
  • 只要网络分区的故障解除,少数节点的一方会自动从多数节点一方识别出leader节点,然后恢复状态。

集群启动失败

只有超过半数成员启动完成之后,集群的bootstrap才会成功。

Raft一致性算法保证了集群节点的数据一致性和稳定性,因此对于节点的恢复,更多的是恢复etcd节点服务,然后恢复数据

新的集群

可以删除所有成员的数据目录,然后重新走创建集群的步骤

已有集群

这个就要看无法启动的节点是数据文件损坏,还是其他原因导致的。

这里以数据文件损坏为例。

  • 寻找正常的节点,使用etcdctl snapshot save命令保存出快照文件
  • 将故障节点的数据目录清空,使用etcdctl snapshot restore命令将数据恢复到数据目录
  • 使用etcdctl member list确认故障节点的信息
  • 使用etcdctl member remove删除故障节点
  • 使用etcdctl member add MEMBER_NAME --peer-urls=http://member:2379重新添加成员
  • 修改etcd启动参数--initial-cluster-state=existing启动故障节点的etcd服务

etcd灾难恢复

这里的灾难恢复,只能恢复v2或者v3的数据,不能同时恢复v2和v3。

两套API是相互隔离的。

针对v3的API

etcd v3的API提供了快照和恢复功能,可以在不损失快照点数据的情况下重建集群

快照备份数据

1
ETCDCTL_API=3 etcdctl --endpoints http://member1:2379,http://member2:2379,http://member3:2379 snapshot save /path/to/snapshot.db

恢复集群

  • 恢复etcd集群,只需要快照文件db即可。
  • 使用etcdctl snapshot restore命令还原数据时会自动创建新的etcd数据目录。
  • 恢复过程会覆盖快照文件里面的一些metadata(特别是member id和cluster id),该member会失去之前的id。覆盖metadata可以防止新成员无意中加入现有集群。
  • 从快照中恢复集群,必须以新集群启动。
  • 恢复时可以选择验证快照完整性hash。
    • 使用etcdctl snapshot save生成的快照,则具有完整性hash
    • 如果是直接从数据目录拷贝数据快照,则没有完整性hash,需要使用--skip-hash-check跳过检查

恢复节点数据

  • 这里假定原有的集群节点为member1、member2、member3

在member1、member2、member3上分别恢复快照数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ ETCDCTL_API=3 etcdctl snapshot restore snapshot.db \
--name member1 \
--initial-cluster member1=http://member1:2380,member2=http://member2:2380,member3=http://member3:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-advertise-peer-urls http://member1:2380
$ ETCDCTL_API=3 etcdctl snapshot restore snapshot.db \
--name member2 \
--initial-cluster member1=http://member1:2380,member2=http://member2:2380,member3=http://member3:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-advertise-peer-urls http://member2:2380
$ ETCDCTL_API=3 etcdctl snapshot restore snapshot.db \
--name member3 \
--initial-cluster member1=http://member1:2380,member2=http://member2:2380,member3=http://member3:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-advertise-peer-urls http://member3:2380

启动etcd集群

在member1、member2、member3上分别启动集群

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ etcd \
--name member1 \
--listen-client-urls http://member1:2379 \
--advertise-client-urls http://member1:2379 \
--listen-peer-urls http://member1:2380 &
$ etcd \
--name member2 \
--listen-client-urls http://member2:2379 \
--advertise-client-urls http://member2:2379 \
--listen-peer-urls http://member2:2380 &
$ etcd \
--name member3 \
--listen-client-urls http://member3:2379 \
--advertise-client-urls http://member3:2379 \
--listen-peer-urls http://member3:2380 &

针对v2的API

备份数据

1
2
3
4
5
6
ETCDCTL_API=3
etcdctl backup \
--data-dir /path/to/data-dir \
--wal-dir /path/to/wal_dir \
--backup-dir /path/to/backup_data_dir \
--backup-wal-dir /path/to/backup_wal_dir

清理数据目录

1
2
rm -rf /path/to/data-dir
rm -rf /path/to/wal-dir

恢复数据

1
2
mv /path/to/backup_data_dir /path/to/data-dir
mv /path/to/backup_wal_dir /path/to/wal_dir

启动etcd集群

启动参数需要添加--force-new-cluster

1
2
3
etcd --data-dir /path/to/data-dir \
--wal-dir /path/to/wal_dir \
--force-new-cluster

etcd版本升级

这里可以参考etcd的升级文档

etcd的FAQ

摘自Frequently Asked Questions (FAQ)

客户端是否需要向etcd集群的leader节点发送请求

  • leader节点负责处理所有需要集群共识的请求(例如写请求)。
  • 客户端不需要知道哪个节点是leader,follower节点会将所有需要集群共识的请求转发给leader节点。
  • 所有节点都可以处理不需要集群共识的请求(例如序列化读取)。

listen-client-urls、listen-peer-urls、advertise-client-urls、initial-advertise-peer-urls的区别

  • listen-client-urlslisten-peer-urls指定etcd服务端用于接收传入连接的本地地址,要监听所有地址,请指定0.0.0.0作为监听地址。
  • advertise-client-urlsinitial-advertise-peer-urls指定etcd的客户端及集群其他成员访问etcd服务的地址,此地址必须要被外部访问,因此不能设置127.0.0.1或者0.0.0.0等地址。

为什么不能通过更改listen-peer-urls或者initial-advertise-peer-urls来更新etcdctl member list中列出的advertise peer urls

  • 每个member的advertise-peer-urls来自初始化集群时的initial-advertise-peer-urls参数
  • 在member启动完成后修改listen-peer-urls或者initial-advertise-peer-urls参数不会影响现有的advertise-peer-urls,因为修改此参数需要通过集群仲裁以避免出现脑裂
  • 修改advertise-peer-url请使用etcd member update命令操作

系统要求

  • etcd会将数据写入磁盘,因此高性能的磁盘会更好,推荐使用SSD
  • 默认存储配额为2GB,最大值为8GB
  • 为了避免使用swap或者内存不足,服务器内存至少要超过存储配额

为什么etcd需要奇数个集群成员

  • etcd集群需要通过大多数节点仲裁才能将集群状态更新到一致
  • 仲裁为(n/2)+1
  • 双数个集群成员并不比奇数个节点容错性强

集群容错性列表

Cluster SizeMajorityFailure Tolerance
110
220
321
431
532
642
743
853
954

集群最大节点数量

  • 理论上没有硬性限制,一般不超过7个节点
  • 建议5个节点,5个节点可以容忍2个节点故障下线,在大多数情况下已经足够
  • 更多的节点可以提供更好的可用性,但是写入性能会有影响

部署跨数据中心的etcd集群是否合适

  • 跨数据中心的etcd集群可以提高可用性
  • 数据中心之间的网络延迟可能会影响节点的election
  • 默认的etcd配置可能会因为网络延迟频繁选举或者心跳超时,需要调整对应的参数

为什么etcd会因为磁盘IO延迟而重新选举

  • 这是故意设计的
  • 磁盘IO延迟是leader节点存活指标的一部分
  • 磁盘IO延迟很高导致选举超时,即使leader节点在选举间隔内能处理网络信息(例如发送心跳),但它实际上是不可用的,因为它无法及时提交新的提议
  • 如果经常出现因磁盘IO延迟而重新选举,请关注一下磁盘或者修改etcd时间参数

etcd性能压测

这里参考官方文档

性能指标

  • 延迟
    • 完成操作所需的时间
  • 吞吐量
    • 一段时间内完成的总操作数量

通常情况下,平均延迟会随着吞吐量的增加而增加。

etcd使用Raft一致性算法完成成员之间的数据同步并达成集群共识。

集群的共识性能,尤其是提交延迟,主要受到两个方面限制。

  • 网络IO延迟
  • 磁盘IO延迟

提交延迟的构成

  • 成员之间的网络往返时间RTT
    • 同一个数据中心内部的RTT是ms级别
    • 跨数据中心的RTT就需要考虑物理限制和网络质量
  • fdatasync数据落盘时间
    • 机械硬盘fdatasync延迟通常在10ms左右
    • 固态硬盘则低于1ms

其他延迟构成

  • 序列化etcd请求需要通过etcd后端boltdb的MVVC机制来完成,通常会在10ms完成。
  • etcd定期将最近提交的请求快照,然后跟磁盘上的快照合并,这个操作过程会导致延迟出现峰值。
  • 正在进行的数据压缩也会影响到延迟,所以要跟业务错开

benchmark跑分

etcd自带的benchmark命令行工具可以用来测试etcd性能

写入请求

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# 假定 HOST_1 是 leader, 写入请求发到 leader
benchmark --endpoints=${HOST_1} \
--conns=1 \
--clients=1 \
put --key-size=8 \
--sequential-keys \
--total=10000 \
--val-size=256
benchmark --endpoints=${HOST_1} \
--conns=100 \
--clients=1000 \
put --key-size=8 \
--sequential-keys \
--total=100000 \
--val-size=256

# 写入发到所有成员
benchmark --endpoints=${HOST_1},${HOST_2},${HOST_3} \
--conns=100 \
--clients=1000 \
put --key-size=8 \
--sequential-keys \
--total=100000 \
--val-size=256

序列化读取

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Single connection read requests
benchmark --endpoints=${HOST_1},${HOST_2},${HOST_3} \
--conns=1 \
--clients=1 \
range YOUR_KEY \
--consistency=l \
--total=10000
benchmark --endpoints=${HOST_1},${HOST_2},${HOST_3} \
--conns=1 \
--clients=1 \
range YOUR_KEY \
--consistency=s \
--total=10000

# Many concurrent read requests
benchmark --endpoints=${HOST_1},${HOST_2},${HOST_3} \
--conns=100 \
--clients=1000 \
range YOUR_KEY \
--consistency=l \
--total=100000
benchmark --endpoints=${HOST_1},${HOST_2},${HOST_3} \
--conns=100 \
--clients=1000 \
range YOUR_KEY \
--consistency=s \
--total=100000

etcd性能调优

参考官方文档

etcd默认配置是基于同一个数据中心,网络延迟较低的情况。

对于网络延迟较高,那么就需要优化心跳间隔和选举超时时间

时间参数(time parameter)

延迟不止有网络延迟,还可能受到节点磁盘IO影响。

每一次超时设置应该包括请求发出到响应成功的时间。

心跳间隔(heartbeat interval)

leader节点通知各follower节点自己的存活信息。

最佳实践是通过ping命令获取RTT最大值,然后设置为RTT的0.5~1.5倍。

默认是100ms。

选举超时(election timeout)

follower节点在多久之后没收到leader节点的心跳信息,就开始选举新leader节点。

默认是1000ms。选举超时应该设置为至少是RTT的10倍,以避免网络出现波动导致重新选举。

快照(snapshot)

etcd会将所有变更的key追加写入到wal日志文件中。

一行记录一个key的变更,因此日志会不断增长。

为避免日志过大,etcd会定期做快照。

快照操作会保存当前系统状态并移除旧的日志。

snapshot-count参数控制快照的频率,默认是10000,即每10000次变更会触发一次快照操作。

如果内存使用率高并且磁盘使用率高,可以尝试调低这个参数。

磁盘

etcd集群对磁盘IO延迟非常的敏感。

etcd需要存储变更日志、快照等操作,可能会导致磁盘IO出现很高的fsync延迟。

磁盘IO延迟高会导致leader节点心跳信息超时、请求超时、重新选举等。

  • etcd所使用的磁盘与系统盘分开
  • data目录和wal目录分别挂载不同的磁盘
  • 有条件推荐使用SSD固态硬盘
  • 使用ionice调高etcd进程的IO优先级(这个针对etcd数据目录在系统盘的情况)

    1
    ionice -c2 -n0 -p `pgrep etcd`

网络

如果leader节点接收来自客户端的大量请求,无法及时处理follower的请求,那么follower节点处理的请求也会因此出现延迟。

具体表现为follower会提示sending buffer is full。

可以通过调高leader的网络优先级或者通过流量管控机制来提高对follower的请求响应。