Kubernetes部署TiDB数据库集群

说明

  • 仅记录操作过程和部署过程
  • 操作系统使用的CentOS-7.6.1810 x86_64
  • 虚拟机配置4CPU 8G内存 30G系统盘 20G数据盘A 5G数据盘B
  • Kubernetes集群版本v1.14.4
  • 使用本地PV作为数据存储
  • TiDB-Operator版本v1.0.0
  • TiDB组件版本v3.0.1

服务器拓扑

服务器IP部署实例
172.16.80.201TiKv*1 TiDB*1 PD*1
172.16.80.202TiKv*1 TiDB*1 PD*1
172.16.80.203TiKv*1 TiDB*1 PD*1

准备工作

安装Helm

二进制安装

1
2
3
wget -O - https://get.helm.sh/helm-v2.14.1-linux-amd64.tar.gz | tar xz linux-amd64/helm
mv linux-amd64/helm /usr/local/bin/helm
rm -rf linux-amd64

创建RBAC

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
cat << EOF | kubectl apply -f -
# 创建名为tiller的ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: tiller
namespace: kube-system
---
# 给tiller绑定cluster-admin权限
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: tiller-cluster-rule
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: tiller
namespace: kube-system
EOF

安装Helm服务端

1
2
3
helm init --tiller-image gcr.azk8s.cn/google_containers/tiller:v2.14.1 \
--service-account tiller \
--stable-repo-url http://mirror.azure.cn/kubernetes/charts/

检查部署结果

查看Pod状态

1
kubectl -n kube-system get pod -l app=helm,name=tiller

输出示例

1
2
NAME                             READY     STATUS    RESTARTS   AGE
tiller-deploy-84fc6cd5f9-nz4m7 1/1 Running 0 1m

查看Helm版本信息

1
helm version

输出示例

1
2
Client: &version.Version{SemVer:"v2.14.1", GitCommit:"d325d2a9c179b33af1a024cdb5a4472b6288016a", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.14.1", GitCommit:"d325d2a9c179b33af1a024cdb5a4472b6288016a", GitTreeState:"clean"}

添加Helm Repo

1
helm repo add pingcap https://charts.pingcap.org/

更新helm缓存

1
helm repo update

查看TiDB-Operator版本

1
helm search pingcap -l

输出示例

1
2
3
4
5
6
7
8
9
10
11
12
13
NAME                 	CHART VERSION	APP VERSION	DESCRIPTION                            
pingcap/tidb-backup v1.0.0 A Helm chart for TiDB Backup or Restore
pingcap/tidb-backup v1.0.0-rc.1 A Helm chart for TiDB Backup or Restore
pingcap/tidb-backup v1.0.0-beta.3 A Helm chart for TiDB Backup or Restore
pingcap/tidb-backup v1.0.0-beta.2 A Helm chart for TiDB Backup or Restore
pingcap/tidb-cluster v1.0.0 A Helm chart for TiDB Cluster
pingcap/tidb-cluster v1.0.0-rc.1 A Helm chart for TiDB Cluster
pingcap/tidb-cluster v1.0.0-beta.3 A Helm chart for TiDB Cluster
pingcap/tidb-cluster v1.0.0-beta.2 A Helm chart for TiDB Cluster
pingcap/tidb-operator v1.0.0 tidb-operator Helm chart for Kubernetes
pingcap/tidb-operator v1.0.0-rc.1 tidb-operator Helm chart for Kubernetes
pingcap/tidb-operator v1.0.0-beta.3 tidb-operator Helm chart for Kubernetes
pingcap/tidb-operator v1.0.0-beta.2 tidb-operator Helm chart for Kubernetes

配置本地PV

参考【Kubernetes创建本地PV

修改ext4挂载选项defaults,nodelalloc,noatime

示例如下

1
UUID=f8727d20-3ef9-4f83-b865-25943bc342a6 /mnt/disks/f8727d20-3ef9-4f83-b865-25943bc342a6 ext4 defaults,nodelalloc,noatime 0 2

创建CRD

1
kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/master/manifests/crd.yaml

部署TiDB-Operator

创建工作目录

1
mkdir -p /home/TiDB

下载TiDB-Operator Chart包

1
2
cd /home/TiDB
helm fetch pingcap/tidb-operator --version=v1.0.0

解压Chart包

1
tar xzf tidb-operator-v1.0.0.tgz

编辑values.yaml

1
vim tidb-operator/values.yaml

修改如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# Default values for tidb-operator

# clusterScoped is whether tidb-operator should manage kubernetes cluster wide tidb clusters
# Also see rbac.create and controllerManager.serviceAccount
clusterScoped: true

# Also see clusterScoped and controllerManager.serviceAccount
rbac:
create: true

# operatorImage is TiDB Operator image
operatorImage: pingcap/tidb-operator:v1.0.0
imagePullPolicy: IfNotPresent

defaultStorageClassName: local-storage

controllerManager:
# With rbac.create=false, the user is responsible for creating this account
# With rbac.create=true, this service account will be created
# Also see rbac.create and clusterScoped
serviceAccount: tidb-controller-manager
logLevel: 2
replicas: 1
resources:
limits:
cpu: 250m
memory: 150Mi
requests:
cpu: 80m
memory: 50Mi
# autoFailover is whether tidb-operator should auto failover when failure occurs
autoFailover: true
# pd failover period default(5m)
pdFailoverPeriod: 5m
# tikv failover period default(5m)
tikvFailoverPeriod: 5m
# tidb failover period default(5m)
tidbFailoverPeriod: 5m

scheduler:
# With rbac.create=false, the user is responsible for creating this account
# With rbac.create=true, this service account will be created
# Also see rbac.create and clusterScoped
serviceAccount: tidb-scheduler
logLevel: 2
replicas: 1
schedulerName: tidb-scheduler
# features:
# - StableScheduling=true
resources:
limits:
cpu: 250m
memory: 150Mi
requests:
cpu: 80m
memory: 50Mi
kubeSchedulerImageName: gcr.azk8s.cn/google_containers/kube-scheduler
# This will default to matching your kubernetes version
# kubeSchedulerImageTag:

部署Operator

1
2
3
4
5
helm install pingcap/tidb-operator \
--name=tidb-operator \
--namespace=tidb-admin \
--version=v1.0.0 \
-f /home/tidb/tidb-operator/values.yaml

查看Operator部署情况

1
kubectl -n tidb-admin get pod -l app.kubernetes.io/name=tidb-operator

部署TiDB-Cluster

创建工作目录

1
mkdir -p /home/TiDB

下载Chart包

1
2
cd /home/TiDB
helm fetch pingcap/tidb-cluster --version=v1.0.0

解压Chart包

1
tar xzf tidb-cluster-v1.0.0.tgz

编辑values.yaml

配置含义请看这里【Kubernetes 上的 TiDB 集群配置

1
vim tidb-cluster/values.yaml

修改后如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
# Default values for tidb-cluster.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

# Also see monitor.serviceAccount
# If you set rbac.create to false, you need to provide a value for monitor.serviceAccount
rbac:
create: true

# clusterName is the TiDB cluster name, if not specified, the chart release name will be used
# clusterName: demo

# Add additional TidbCluster labels
# ref: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/
extraLabels: {}

# schedulerName must be same with charts/tidb-operator/values#scheduler.schedulerName
schedulerName: tidb-scheduler

# timezone is the default system timzone for TiDB
timezone: Asia/Shanghai

# default reclaim policy of a PV
pvReclaimPolicy: Retain

# services is the service list to expose, default is ClusterIP
# can be ClusterIP | NodePort | LoadBalancer
services:
- name: pd
type: ClusterIP

discovery:
image: pingcap/tidb-operator:v1.0.0
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 250m
memory: 150Mi
requests:
cpu: 80m
memory: 50Mi

# Whether enable ConfigMap Rollout management.
# When enabling, change of ConfigMap will trigger a graceful rolling-update of the component.
# This feature is only available in tidb-operator v1.0 or higher.
# Note: Switch this variable against an existing cluster will cause an rolling-update of each component even
# if the ConfigMap was not changed.
enableConfigMapRollout: true

pd:
# Please refer to https://github.com/pingcap/pd/blob/master/conf/config.toml for the default
# pd configurations (change to the tags of your pd version),
# just follow the format in the file and configure in the 'config' section
# as below if you want to customize any configuration.
# Please refer to https://pingcap.com/docs-cn/v3.0/reference/configuration/pd-server/configuration-file/
# (choose the version matching your pd) for detailed explanation of each parameter.
config: |
[log]
level = "info"
[replication]
location-labels = ["region", "zone", "rack", "host"]

replicas: 3
image: pingcap/pd:v3.0.1
# storageClassName is a StorageClass provides a way for administrators to describe the "classes" of storage they offer.
# different classes might map to quality-of-service levels, or to backup policies,
# or to arbitrary policies determined by the cluster administrators.
# refer to https://kubernetes.io/docs/concepts/storage/storage-classes
storageClassName: local-storage

# Image pull policy.
imagePullPolicy: IfNotPresent

resources:
limits: {}
# cpu: 8000m
# memory: 8Gi
requests:
# cpu: 4000m
# memory: 4Gi
storage: 1Gi

## affinity defines pd scheduling rules,it's default settings is empty.
## please read the affinity document before set your scheduling rule:
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
affinity: {}
## The following is typical example of affinity settings:
## The PodAntiAffinity setting of the example keeps PD pods does not co-locate on a topology node as far as possible to improve the disaster tolerance of PD on Kubernetes.
## The NodeAffinity setting of the example ensure that the PD pods can only be scheduled to nodes with label:[type="pd"],
# affinity:
# podAntiAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# # this term work when the nodes have the label named region
# - weight: 10
# podAffinityTerm:
# labelSelector:
# matchLabels:
# app.kubernetes.io/instance: <release name>
# app.kubernetes.io/component: "pd"
# topologyKey: "region"
# namespaces:
# - <helm namespace>
# # this term work when the nodes have the label named zone
# - weight: 20
# podAffinityTerm:
# labelSelector:
# matchLabels:
# app.kubernetes.io/instance: <release name>
# app.kubernetes.io/component: "pd"
# topologyKey: "zone"
# namespaces:
# - <helm namespace>
# # this term work when the nodes have the label named rack
# - weight: 40
# podAffinityTerm:
# labelSelector:
# matchLabels:
# app.kubernetes.io/instance: <release name>
# app.kubernetes.io/component: "pd"
# topologyKey: "rack"
# namespaces:
# - <helm namespace>
# # this term work when the nodes have the label named kubernetes.io/hostname
# - weight: 80
# podAffinityTerm:
# labelSelector:
# matchLabels:
# app.kubernetes.io/instance: <release name>
# app.kubernetes.io/component: "pd"
# topologyKey: "kubernetes.io/hostname"
# namespaces:
# - <helm namespace>
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: "kind"
# operator: In
# values:
# - "pd"

## nodeSelector ensure pods only assigning to nodes which have each of the indicated key-value pairs as labels
## ref:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
nodeSelector:
local-pv: present

## Tolerations are applied to pods, and allow pods to schedule onto nodes with matching taints.
## refer to https://kubernetes.io/docs/concepts/configuration/taint-and-toleration
tolerations: []
# - key: node-role
# operator: Equal
# value: tidb
# effect: "NoSchedule"
annotations: {}

tikv:
# Please refer to https://github.com/tikv/tikv/blob/master/etc/config-template.toml for the default
# tikv configurations (change to the tags of your tikv version),
# just follow the format in the file and configure in the 'config' section
# as below if you want to customize any configuration.
# Please refer to https://pingcap.com/docs-cn/v3.0/reference/configuration/tikv-server/configuration-file/
# (choose the version matching your tikv) for detailed explanation of each parameter.
config: |
log-level = "info"
[server]
status-addr = "0.0.0.0:20180"

# Here are some parameters you may want to customize (Please configure in the above 'config' section):
# [readpool.storage]
# ## Size of the thread pool for high-priority operations.
# # high-concurrency = 4
# ## Size of the thread pool for normal-priority operations.
# # normal-concurrency = 4
# ## Size of the thread pool for low-priority operations.
# # low-concurrency = 4
# [readpool.coprocessor]
# ## Most read requests from TiDB are sent to the coprocessor of TiKV. high/normal/low-concurrency is
# ## used to set the number of threads of the coprocessor.
# ## If there are many read requests, you can increase these config values (but keep it within the
# ## number of system CPU cores). For example, for a 32-core machine deployed with TiKV, you can even
# ## set these config to 30 in heavy read scenarios.
# ## If CPU_NUM > 8, the default thread pool size for coprocessors is set to CPU_NUM * 0.8.
# # high-concurrency = 8
# # normal-concurrency = 8
# # low-concurrency = 8
# [server]
# ## Size of the thread pool for the gRPC server.
# # grpc-concurrency = 4
# [storage]
# ## Scheduler's worker pool size, i.e. the number of write threads.
# ## It should be less than total CPU cores. When there are frequent write operations, set it to a
# ## higher value. More specifically, you can run `top -H -p tikv-pid` to check whether the threads
# ## named `sched-worker-pool` are busy.
# # scheduler-worker-pool-size = 4
#### Below parameters available in TiKV 2.x only
# [rocksdb.defaultcf]
# ## block-cache used to cache uncompressed blocks, big block-cache can speed up read.
# ## in normal cases should tune to 30%-50% tikv.resources.limits.memory
# # block-cache-size = "1GB"
# [rocksdb.writecf]
# ## in normal cases should tune to 10%-30% tikv.resources.limits.memory
# # block-cache-size = "256MB"
#### Below parameters available in TiKV 3.x and above only
# [storage.block-cache]
# ## Size of the shared block cache. Normally it should be tuned to 30%-50% of container's total memory.
# # capacity = "1GB"
# [raftstore]
# ## true (default value) for high reliability, this can prevent data loss when power failure.
# # sync-log = true
# # apply-pool-size = 2
# # store-pool-size = 2

replicas: 3
image: pingcap/tikv:v3.0.1
# storageClassName is a StorageClass provides a way for administrators to describe the "classes" of storage they offer.
# different classes might map to quality-of-service levels, or to backup policies,
# or to arbitrary policies determined by the cluster administrators.
# refer to https://kubernetes.io/docs/concepts/storage/storage-classes
storageClassName: local-storage

# Image pull policy.
imagePullPolicy: IfNotPresent

resources:
limits: {}
# cpu: 16000m
# memory: 32Gi
# storage: 300Gi
requests:
# cpu: 12000m
# memory: 24Gi
storage: 10Gi

## affinity defines tikv scheduling rules,affinity default settings is empty.
## please read the affinity document before set your scheduling rule:
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
affinity: {}

## nodeSelector ensure pods only assigning to nodes which have each of the indicated key-value pairs as labels
## ref:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
nodeSelector:
local-pv: present

## Tolerations are applied to pods, and allow pods to schedule onto nodes with matching taints.
## refer to https://kubernetes.io/docs/concepts/configuration/taint-and-toleration
tolerations: []
# - key: node-role
# operator: Equal
# value: tidb
# effect: "NoSchedule"
annotations: {}

tidb:
# Please refer to https://github.com/pingcap/tidb/blob/master/config/config.toml.example for the default
# tidb configurations(change to the tags of your tidb version),
# just follow the format in the file and configure in the 'config' section
# as below if you want to customize any configuration.
# Please refer to https://pingcap.com/docs-cn/v3.0/reference/configuration/tidb-server/configuration-file/
# (choose the version matching your tidb) for detailed explanation of each parameter.
config: |
[log]
level = "info"

replicas: 3
# The secret name of root password, you can create secret with following command:
# kubectl create secret generic tidb-secret --from-literal=root=<root-password> --namespace=<namespace>
# If unset, the root password will be empty and you can set it after connecting
# passwordSecretName: tidb-secret
# initSql is the SQL statements executed after the TiDB cluster is bootstrapped.
# initSql: |-
# create database app;
image: pingcap/tidb:v3.0.1
# Image pull policy.
imagePullPolicy: IfNotPresent

resources:
limits: {}
# cpu: 16000m
# memory: 16Gi
requests: {}
# cpu: 12000m
# memory: 12Gi


## affinity defines tikv scheduling rules,affinity default settings is empty.
## please read the affinity document before set your scheduling rule:
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
affinity: {}

## nodeSelector ensure pods only assigning to nodes which have each of the indicated key-value pairs as labels
## ref:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
nodeSelector: {}

## Tolerations are applied to pods, and allow pods to schedule onto nodes with matching taints.
## refer to https://kubernetes.io/docs/concepts/configuration/taint-and-toleration
tolerations: []
# - key: node-role
# operator: Equal
# value: tidb
# effect: "NoSchedule"
annotations: {}
maxFailoverCount: 3
service:
type: NodePort
exposeStatus: true
# annotations:
# cloud.google.com/load-balancer-type: Internal
separateSlowLog: true
slowLogTailer:
image: busybox:1.26.2
resources:
limits:
cpu: 100m
memory: 50Mi
requests:
cpu: 20m
memory: 5Mi

# tidb plugin configuration
plugin:
# enable plugin or not
enable: false
# the start argument to specify the folder containing
directory: /plugins
# the start argument to specify the plugin id (name "-" version) that needs to be loaded, e.g. 'conn_limit-1'.
list: ["whitelist-1"]

# mysqlClient is used to set password for TiDB
# it must has Python MySQL client installed
mysqlClient:
image: tnir/mysqlclient
imagePullPolicy: IfNotPresent

monitor:
create: true
# Also see rbac.create
# If you set rbac.create to false, you need to provide a value here.
# If you set rbac.create to true, you should leave this empty.
# serviceAccount:
persistent: false
storageClassName: local-storage
storage: 10Gi
initializer:
image: pingcap/tidb-monitor-initializer:v3.0.1
imagePullPolicy: IfNotPresent
reloader:
create: true
image: pingcap/tidb-monitor-reloader:v1.0.0
imagePullPolicy: IfNotPresent
service:
type: NodePort
grafana:
create: true
image: grafana/grafana:6.0.1
imagePullPolicy: IfNotPresent
logLevel: info
resources:
limits: {}
# cpu: 8000m
# memory: 8Gi
requests: {}
# cpu: 4000m
# memory: 4Gi
username: admin
password: admin
config:
# Configure Grafana using environment variables except GF_PATHS_DATA, GF_SECURITY_ADMIN_USER and GF_SECURITY_ADMIN_PASSWORD
# Ref https://grafana.com/docs/installation/configuration/#using-environment-variables
GF_AUTH_ANONYMOUS_ENABLED: "true"
GF_AUTH_ANONYMOUS_ORG_NAME: "Main Org."
GF_AUTH_ANONYMOUS_ORG_ROLE: "Viewer"
# if grafana is running behind a reverse proxy with subpath http://foo.bar/grafana
# GF_SERVER_DOMAIN: foo.bar
# GF_SERVER_ROOT_URL: "%(protocol)s://%(domain)s/grafana/"
service:
type: NodePort
prometheus:
image: prom/prometheus:v2.11.1
imagePullPolicy: IfNotPresent
logLevel: info
resources:
limits: {}
# cpu: 8000m
# memory: 8Gi
requests: {}
# cpu: 4000m
# memory: 4Gi
service:
type: NodePort
reserveDays: 12
# alertmanagerURL: ""
nodeSelector: {}
# kind: monitor
# zone: cn-bj1-01,cn-bj1-02
# region: cn-bj1
tolerations: []
# - key: node-role
# operator: Equal
# value: tidb
# effect: "NoSchedule"

binlog:
pump:
create: false
replicas: 1
image: pingcap/tidb-binlog:v3.0.1
imagePullPolicy: IfNotPresent
logLevel: info
# storageClassName is a StorageClass provides a way for administrators to describe the "classes" of storage they offer.
# different classes might map to quality-of-service levels, or to backup policies,
# or to arbitrary policies determined by the cluster administrators.
# refer to https://kubernetes.io/docs/concepts/storage/storage-classes
storageClassName: local-storage
storage: 20Gi
syncLog: true
# a integer value to control expiry date of the binlog data, indicates for how long (in days) the binlog data would be stored.
# must bigger than 0
gc: 7
# number of seconds between heartbeat ticks (in 2 seconds)
heartbeatInterval: 2

drainer:
create: false
image: pingcap/tidb-binlog:v3.0.1
imagePullPolicy: IfNotPresent
logLevel: info
# storageClassName is a StorageClass provides a way for administrators to describe the "classes" of storage they offer.
# different classes might map to quality-of-service levels, or to backup policies,
# or to arbitrary policies determined by the cluster administrators.
# refer to https://kubernetes.io/docs/concepts/storage/storage-classes
storageClassName: local-storage
storage: 10Gi
# the number of the concurrency of the downstream for synchronization. The bigger the value,
# the better throughput performance of the concurrency (16 by default)
workerCount: 16
# the interval time (in seconds) of detect pumps' status (default 10)
detectInterval: 10
# disbale detect causality
disableDetect: false
# disable dispatching sqls that in one same binlog; if set true, work-count and txn-batch would be useless
disableDispatch: false
# # disable sync these schema
ignoreSchemas: "INFORMATION_SCHEMA,PERFORMANCE_SCHEMA,mysql,test"
# if drainer donesn't have checkpoint, use initial commitTS to initial checkpoint
initialCommitTs: 0
# enable safe mode to make syncer reentrant
safeMode: false
# the number of SQL statements of a transaction that are output to the downstream database (20 by default)
txnBatch: 20
# downstream storage, equal to --dest-db-type
# valid values are "mysql", "pb", "kafka"
destDBType: pb
mysql: {}
# host: "127.0.0.1"
# user: "root"
# password: ""
# port: 3306
# # Time and size limits for flash batch write
# timeLimit: "30s"
# sizeLimit: "100000"
kafka: {}
# only need config one of zookeeper-addrs and kafka-addrs, will get kafka address if zookeeper-addrs is configed.
# zookeeperAddrs: "127.0.0.1:2181"
# kafkaAddrs: "127.0.0.1:9092"
# kafkaVersion: "0.8.2.0"

scheduledBackup:
create: false
# https://github.com/pingcap/tidb-cloud-backup
mydumperImage: pingcap/tidb-cloud-backup:20190610
mydumperImagePullPolicy: IfNotPresent
# storageClassName is a StorageClass provides a way for administrators to describe the "classes" of storage they offer.
# different classes might map to quality-of-service levels, or to backup policies,
# or to arbitrary policies determined by the cluster administrators.
# refer to https://kubernetes.io/docs/concepts/storage/storage-classes
storageClassName: local-storage
storage: 100Gi
# https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#schedule
schedule: "0 0 * * *"
# https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#suspend
suspend: false
# https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#jobs-history-limits
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
# https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#starting-deadline
startingDeadlineSeconds: 3600
# https://github.com/maxbube/mydumper/blob/master/docs/mydumper_usage.rst#options
options: "--verbose=3"
# secretName is the name of the secret which stores user and password used for backup
# Note: you must give the user enough privilege to do the backup
# you can create the secret by:
# kubectl create secret generic backup-secret --from-literal=user=root --from-literal=password=<password>
secretName: backup-secret
# backup to gcp
gcp: {}
# bucket: ""
# secretName is the name of the secret which stores the gcp service account credentials json file
# The service account must have read/write permission to the above bucket.
# Read the following document to create the service account and download the credentials file as credentials.json:
# https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually
# And then create the secret by: kubectl create secret generic gcp-backup-secret --from-file=./credentials.json
# secretName: gcp-backup-secret

# backup to ceph object storage
ceph: {}
# endpoint: ""
# bucket: ""
# secretName is the name of the secret which stores ceph object store access key and secret key
# You can create the secret by:
# kubectl create secret generic ceph-backup-secret --from-literal=access_key=<access-key> --from-literal=secret_key=<secret-key>
# secretName: ceph-backup-secret

# backup to s3
s3: {}
# region: ""
# bucket: ""
# secretName is the name of the secret which stores s3 object store access key and secret key
# You can create the secret by:
# kubectl create secret generic s3-backup-secret --from-literal=access_key=<access-key> --from-literal=secret_key=<secret-key>
# secretName: s3-backup-secret

metaInstance: "{{ $labels.instance }}"
metaType: "{{ $labels.type }}"
metaValue: "{{ $value }}"

部署Cluster

1
2
3
4
5
helm install pingcap/tidb-cluster \
--name=tidb-cluster \
--namespace=tidb \
--version=v1.0.0 \
-f /home/tidb/tidb-cluster/values.yaml

查看Cluster部署情况

1
kubectl -n tidb get pods -l app.kubernetes.io/instance=tidb-cluster

输出示例

1
2
3
4
5
6
7
8
9
10
11
12
NAME                                      READY   STATUS    RESTARTS   AGE
tidb-cluster-discovery-84d6cf454c-6c2cl 1/1 Running 0 77m
tidb-cluster-monitor-77cd9d7965-49v8t 3/3 Running 0 77m
tidb-cluster-pd-0 1/1 Running 0 23m
tidb-cluster-pd-1 1/1 Running 0 72m
tidb-cluster-pd-2 1/1 Running 0 72m
tidb-cluster-tidb-0 2/2 Running 0 2m11s
tidb-cluster-tidb-1 2/2 Running 0 2m2s
tidb-cluster-tidb-2 2/2 Running 0 100s
tidb-cluster-tikv-0 1/1 Running 0 8m56s
tidb-cluster-tikv-1 1/1 Running 0 8m54s
tidb-cluster-tikv-2 1/1 Running 0 8m52s

访问TiDB

TiDB兼容MySQL,因此能直接使用MySQL客户端连接TiDB集群

这里使用mysql-community-client-5.7.27-1.el7作为客户端程序

获取TiDB Service信息

1
kubectl -n tidb get svc -l app.kubernetes.io/component=tidb

输出示例

1
2
3
NAME                     TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                          AGE
tidb-cluster-tidb NodePort 10.96.129.51 <none> 4000:30944/TCP,10080:32052/TCP 85m
tidb-cluster-tidb-peer ClusterIP None <none> 10080/TCP 19m

从输出示例,可以看到tidb-cluster-tidbNodePort类型,业务端口4000被映射为节点的30944端口,直接通过此端口即可访问TiDB。

登录TiDB

默认集群部署完成后,root用户是无密码的

1
mysql -u root -P 30944 -h k8s-master

简单查询

查看系统表mysql.tidb

1
mysql> select VARIABLE_NAME,VARIABLE_VALUE from mysql.tidb;

输出示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
+--------------------------+--------------------------------------------------------------------------------------------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+--------------------------+--------------------------------------------------------------------------------------------------+
| bootstrapped | True |
| tidb_server_version | 33 |
| system_tz | Asia/Shanghai |
| tikv_gc_leader_uuid | 5b16245c9840001 |
| tikv_gc_leader_desc | host:tidb-cluster-tidb-0, pid:1, start at 2019-08-04 01:40:22.757972817 +0800 CST m=+0.859764216 |
| tikv_gc_leader_lease | 20190804-01:58:22 +0800 |
| tikv_gc_enable | true |
| tikv_gc_run_interval | 10m0s |
| tikv_gc_life_time | 10m0s |
| tikv_gc_last_run_time | 20190804-01:48:22 +0800 |
| tikv_gc_safe_point | 20190804-01:38:22 +0800 |
| tikv_gc_auto_concurrency | true |
| tikv_gc_mode | distributed |
+--------------------------+--------------------------------------------------------------------------------------------------+
13 rows in set (0.00 sec)

查看系统变量

1
mysql> show global variables like '%tidb%';

输出示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
+-------------------------------------+-----------------------+
| Variable_name | Value |
+-------------------------------------+-----------------------+
| tidb_optimizer_selectivity_level | 0 |
| tidb_slow_log_threshold | 300 |
| tidb_distsql_scan_concurrency | 15 |
| tidb_check_mb4_value_in_utf8 | 1 |
| tidb_checksum_table_concurrency | 4 |
| tidb_query_log_max_len | 2048 |
| tidb_mem_quota_sort | 34359738368 |
| tidb_low_resolution_tso | 0 |
| tidb_skip_utf8_check | 0 |
| tidb_constraint_check_in_place | 0 |
| tidb_snapshot | |
| tidb_current_ts | 0 |
| tidb_opt_write_row_id | 0 |
| tidb_opt_join_reorder_threshold | 0 |
| tidb_build_stats_concurrency | 4 |
| tidb_mem_quota_topn | 34359738368 |
| tidb_batch_insert | 0 |
| tidb_config | |
| tidb_batch_delete | 0 |
| tidb_opt_correlation_exp_factor | 1 |
| tidb_auto_analyze_ratio | 0.5 |
| tidb_index_serial_scan_concurrency | 1 |
| tidb_ddl_error_count_limit | 512 |
| tidb_batch_commit | 0 |
| tidb_wait_split_region_timeout | 300 |
| tidb_mem_quota_query | 34359738368 |
| tidb_dml_batch_size | 20000 |
| tidb_mem_quota_mergejoin | 34359738368 |
| tidb_projection_concurrency | 4 |
| tidb_index_join_batch_size | 25000 |
| tidb_wait_split_region_finish | 1 |
| tidb_back_off_weight | 2 |
| tidb_enable_fast_analyze | 0 |
| tidb_skip_isolation_level_check | 0 |
| tidb_mem_quota_hashjoin | 34359738368 |
| tidb_hash_join_concurrency | 5 |
| tidb_scatter_region | 0 |
| tidb_enable_window_function | 1 |
| tidb_max_chunk_size | 1024 |
| tidb_enable_cascades_planner | 0 |
| tidb_ddl_reorg_batch_size | 1024 |
| tidb_txn_mode | |
| tidb_opt_correlation_threshold | 0.9 |
| tidb_hashagg_final_concurrency | 4 |
| tidb_opt_agg_push_down | 0 |
| tidb_index_lookup_concurrency | 4 |
| tidb_enable_table_partition | auto |
| tidb_auto_analyze_end_time | 23:59 +0000 |
| tidb_index_lookup_size | 20000 |
| tidb_hashagg_partial_concurrency | 4 |
| tidb_opt_insubq_to_join_and_agg | 1 |
| tidb_ddl_reorg_worker_cnt | 16 |
| tidb_mem_quota_indexlookupreader | 34359738368 |
| tidb_mem_quota_indexlookupjoin | 34359738368 |
| tidb_mem_quota_nestedloopapply | 34359738368 |
| tidb_general_log | 0 |
| tidb_force_priority | NO_PRIORITY |
| tidb_enable_streaming | 0 |
| tidb_retry_limit | 10 |
| tidb_enable_radix_join | 0 |
| tidb_ddl_reorg_priority | PRIORITY_LOW |
| tidb_backoff_lock_fast | 100 |
| tidb_auto_analyze_start_time | 00:00 +0000 |
| tidb_init_chunk_size | 32 |
| tidb_expensive_query_time_threshold | 60 |
| tidb_disable_txn_auto_retry | 1 |
| tidb_slow_query_file | /var/log/tidb/slowlog |
| tidb_index_lookup_join_concurrency | 4 |
+-------------------------------------+-----------------------+
68 rows in set (0.01 sec)

查看监控信息

  • TiDB 通过 Prometheus 和 Grafana 监控 TiDB 集群。
  • 在通过 TiDB Operator 创建新的 TiDB 集群时,对于每个 TiDB 集群,会同时创建、配置一套独立的监控系统,与 TiDB 集群运行在同一 Namespace,包括 Prometheus 和 Grafana 两个组件。
  • 监控数据默认没有持久化,如果由于某些原因监控容器重启,已有的监控数据会丢失。可以在 values.yaml 中设置 monitor.persistenttrue 来持久化监控数据。

获取监控端口

1
kubectl -n tidb get svc -l app.kubernetes.io/component=monitor

输出示例

1
2
3
4
NAME                            TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
tidb-cluster-grafana NodePort 10.96.136.52 <none> 3000:30009/TCP 93m
tidb-cluster-monitor-reloader NodePort 10.96.211.160 <none> 9089:31302/TCP 93m
tidb-cluster-prometheus NodePort 10.96.75.4 <none> 9090:31666/TCP 93m

访问监控

可以看到监控服务都是NodePort类型,直接通过节点端口即可访问监控服务

更新和升级

TiDB-Operator

当新版本 tidb-operator 发布,只要更新 values.yaml 中的 operatorImage 然后执行上述命令就可以。但是安全起见,最好从新版本 tidb-operator chart 中获取新版本 values.yaml 并和旧版本 values.yaml 合并生成新的 values.yaml,然后升级。

TiDB Operator 是用来管理 TiDB 集群的,也就是说,如果 TiDB 集群已经启动并正常运行,你甚至可以停掉 TiDB Operator,而 TiDB 集群仍然能正常工作,直到你需要维护 TiDB 集群,比如伸缩、升级等等。

升级TiDB-Operator

1
helm upgrade tidb-operator pingcap/tidb-operator --version=v1.0.0 -f /home/tidb/tidb-operator/values.yaml

Kubernetes集群版本升级

当你的 Kubernetes 集群有版本升级,请确保 kubeSchedulerImageTag 与之匹配。默认情况下,这个值是由 Helm 在安装或者升级过程中生成的,要修改它你需要执行 helm upgrade

TiDB-Cluster

滚动更新 TiDB 集群时,会按 PD、TiKV、TiDB 的顺序,串行删除 Pod,并创建新版本的 Pod,当新版本的 Pod 正常运行后,再处理下一个 Pod。

滚动升级过程会自动处理 PD、TiKV 的 Leader 迁移与 TiDB 的 DDL Owner 迁移。因此,在多节点的部署拓扑下(最小环境:PD 3、TiKV 3、TiDB * 2),滚动更新 TiKV、PD 不会影响业务正常运行。

对于有连接重试功能的客户端,滚动更新 TiDB 同样不会影响业务。

对于无法进行重试的客户端,滚动更新 TiDB 则会导致连接到被关闭节点的数据库连接失效,造成部分业务请求失败。对于这类业务,推荐在客户端添加重试功能或在低峰期进行 TiDB 的滚动升级操作。

滚动更新可以用于升级 TiDB 版本,也可以用于更新集群配置。

更新TiDB-Cluster配置

默认条件下,修改配置文件不会自动应用到 TiDB 集群中,只有在实例重启时,才会重新加载新的配置文件。

操作步骤如下

  1. 修改集群的 values.yaml 文件,将 enableConfigMapRollout 的值设为 true

  2. 修改集群的 values.yaml 文件中需要调整的集群配置项,例如修改pd.replicastidb.replicastikv.replicas来进行水平扩容和缩容

  3. 执行helm upgrade命令升级

    1
    helm upgrade tidb-cluster pingcap/tidb-cluster -f /home/tidb/tidb-cluster/values.yaml --version=v1.0.0
  4. 查看升级进度

    1
    watch -n 1 'kubectl -n tidb get pod -o wide'

升级 TiDB-Cluster 版本

  1. 修改集群的 values.yaml 文件中的 tidb.imagetikv.imagepd.image 的值为新版本镜像;

  2. 执行 helm upgrade 命令进行升级:

    1
    helm upgrade tidb-cluster pingcap/tidb-cluster -f /home/tidb/tidb-cluster/values.yaml --version=v1.0.0
  3. 查看升级进度

    1
    watch -n 1 'kubectl -n tidb get pod -o wide'

注意:

  • enableConfigMapRollout 特性从关闭状态打开时,即使没有配置变更,也会触发一次 PD、TiKV、TiDB 的滚动更新。
  • 目前 PD 的 schedulerreplication 配置(values.yaml 中的 maxStoreDownTimemaxReplicas 字段)在集群安装完成后无法自动更新,需要通过 pd-ctl 手动更新。