Kubernetes 部署 MySQL 双主集群
MySQL 双主架构是指两台 MySQL 服务器互相作为主从,彼此之间相互复制数据。这种架构提供了读写分离能力的同时保证了高可用性,任一节点故障不影响服务继续运行。本文详细介绍在 Kubernetes 环境中使用 StatefulSet 部署 MySQL 双主集群的完整方案。
架构介绍
双主架构概述
MySQL 双主架构(Master-Master Replication)是数据库高可用的一种经典方案。在双主模式下,两台 MySQL 服务器互为主从,任何一台服务器上的写入操作都会同步到另一台服务器。这种架构的优势在于:两台服务器都可以接受写入请求,实现了真正的读写分离;任一节点故障时,应用程序可以自动切换到另一个节点继续服务;进行维护时可以通过切换保证服务不中断。
在 Kubernetes 环境中,我们通过以下方式实现双主部署:使用 StatefulSet 管理两个 MySQL Pod,每个 Pod 拥有独立的持久化存储;通过 Service 分别暴露主节点和从节点;配置 MySQL 主从复制参数实现数据同步;使用健康检查探针确保服务可用性。
架构拓扑
┌─────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ mysql-master-0 │◄────►│ mysql-master-1 │ │
│ │ (Pod/StatefulSet)│ │ (Pod/StatefulSet) │ │
│ └────────┬─────────┘ └────────┬─────────┘ │
│ │ │ │
│ ┌──────┴──────┐ ┌──────┴──────┐ │
│ │ PVC (数据) │ │ PVC (数据) │ │
│ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────┘
│ │
┌────────┴────────┐ ┌────────┴────────┐
│ mysql-master │ │ mysql-slave │
│ Service │ │ Service │
│ (读写) │ │ (只读) │
└────────┬────────┘ └────────┬────────┘
│ │
┌────┴────┐ ┌────┴────┐
│ App1 │ │ App2 │
└─────────┘ └─────────┘技术要点
双主复制的核心原理是利用 MySQL 的 binlog 日志进行数据同步。每个 MySQL 实例既作为主服务器向对方发送 binlog,又作为从服务器接收并执行对方的 binlog。这种复制模式是异步的,存在轻微的数据延迟,但对于大多数应用场景来说是可以接受的。
关键配置包括:server-id 必须唯一,每个实例拥有不同的 ID;auto-increment-increment 和 auto-increment-offset 设置避免主键冲突;log-bin 开启二进制日志用于复制;relay-log 配置中继日志。
完整部署资源清单
命名空间
apiVersion: v1
kind: Namespace
metadata:
name: mysql-cluster
labels:
name: mysql-cluster
environment: productionConfigMap 配置
apiVersion: v1
kind: ConfigMap
metadata:
name: mysql-config
namespace: mysql-cluster
data:
my.cnf: |
[mysqld]
# 基础配置
default-storage-engine=InnoDB
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
max_connections=1000
max_connect_errors=1000
explicit_defaults_for_timestamp=true
# 绑定地址
bind-address=0.0.0.0
# 二进制日志配置(用于复制)
log-bin=mysql-bin
binlog_format=ROW
binlog_row_image=FULL
expire_logs_days=7
max_binlog_size=256M
sync_binlog=1
# 中继日志配置
relay_log=relay-bin
relay_log_recovery=1
# 只读配置(从节点)
read_only=1
super_read_only=1
# 复制配置
server-id=1
log_replica_updates=1
replica_skip_errors=1062
# 性能优化
innodb_buffer_pool_size=512M
innodb_log_file_size=128M
innodb_flush_log_at_trx_commit=1
innodb_flush_method=O_DIRECT
table_open_cache=2000
table_definition_cache=2000
query_cache_type=0
query_cache_size=0
# 日志配置
log_error=/var/log/mysql/error.log
slow_query_log=1
slow_query_log_file=/var/log/mysql/slow.log
long_query_time=2
[client]
default-character-set=utf8mb4
[mysql]
default-character-set=utf8mb4
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mysql-init scripts
namespace: mysql-cluster
data:
init-master.sql: |
-- 设置主从复制账户和权限
CREATE USER IF NOT EXISTS 'repl'@'%' IDENTIFIED BY 'Repl@2024';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
GRANT SELECT ON *.* TO 'repl'@'%';
FLUSH PRIVILEGES;
-- 创建健康检查账户
CREATE USER IF NOT EXISTS 'healthcheck'@'%' IDENTIFIED BY 'Health@2024';
GRANT PROCESS, REPLICATION CLIENT ON *.* TO 'healthcheck'@'%';
FLUSH PRIVILEGES;
init-replica.sql: |
-- 停止复制线程
STOP SLAVE;
RESET SLAVE ALL;
-- 配置复制源
CHANGE REPLICATION SOURCE TO
SOURCE_HOST='mysql-master-0.mysql-cluster-headless',
SOURCE_PORT=3306,
SOURCE_USER='repl',
SOURCE_PASSWORD='Repl@2024',
SOURCE_AUTO_POSITION=1,
GET_SOURCE_PUBLIC_KEY=1;
-- 启动复制线程
START REPLICA;
-- 验证复制状态
SHOW REPLICA STATUS\GHeadless Service
apiVersion: v1
kind: Service
metadata:
name: mysql-cluster-headless
namespace: mysql-cluster
labels:
app: mysql
spec:
clusterIP: None
ports:
- name: mysql
port: 3306
selector:
app: mysqlMySQL StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql-master
namespace: mysql-cluster
spec:
serviceName: mysql-cluster-headless
replicas: 2
selector:
matchLabels:
app: mysql
updateStrategy:
type: RollingUpdate
podManagementPolicy: Parallel
template:
metadata:
labels:
app: mysql
spec:
terminationGracePeriodSeconds: 30
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: mysql
topologyKey: kubernetes.io/hostname
containers:
- name: mysql
image: mysql:8.0.36
imagePullPolicy: IfNotPresent
ports:
- name: mysql
containerPort: 3306
- name: mysql-replica
containerPort: 33060
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secrets
key: root-password
- name: MYSQL_DATABASE
value: "appdb"
- name: MYSQL_REPLICATION_USER
value: "repl"
- name: MYSQL_REPLICATION_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secrets
key: repl-password
- name: SERVER_ID
valueFrom:
fieldRef:
fieldPath: metadata.name
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql
- name: mysql-config
mountPath: /etc/mysql/conf.d
- name: mysql-logs
mountPath: /var/log/mysql
livenessProbe:
exec:
command:
- mysqladmin
- ping
- -h
- localhost
- -u
- root
- -pHealth@2024
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
exec:
command:
- mysql
- -h
- localhost
- -u
- root
- -pHealth@2024
- -e
- SELECT 1
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
startupProbe:
exec:
command:
- mysql
- -h
- localhost
- -u
- root
- -pHealth@2024
- -e
- SELECT 1
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 30
volumes:
- name: mysql-data
persistentVolumeClaim:
claimName: mysql-data
- name: mysql-config
configMap:
name: mysql-config
- name: mysql-logs
emptyDir: {}
volumeClaimTemplates:
- metadata:
name: mysql-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast
resources:
requests:
storage: 50GiMySQL Services
apiVersion: v1
kind: Service
metadata:
name: mysql-master
namespace: mysql-cluster
labels:
app: mysql
spec:
type: ClusterIP
ports:
- name: mysql
port: 3306
targetPort: 3306
selector:
app: mysql
statefulset.kubernetes.io/pod-name: mysql-master-0
---
apiVersion: v1
kind: Service
metadata:
name: mysql-replica
namespace: mysql-cluster
labels:
app: mysql
spec:
type: ClusterIP
ports:
- name: mysql
port: 3306
targetPort: 3306
selector:
app: mysql
statefulset.kubernetes.io/pod-name: mysql-master-1Secret
apiVersion: v1
kind: Secret
metadata:
name: mysql-secrets
namespace: mysql-cluster
type: Opaque
stringData:
root-password: Root@2024
repl-password: Repl@2024
healthcheck-password: Health@2024ServiceAccount 和 RBAC
apiVersion: v1
kind: ServiceAccount
metadata:
name: mysql
namespace: mysql-cluster
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: mysql
namespace: mysql-cluster
rules:
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: mysql
namespace: mysql-cluster
subjects:
- kind: ServiceAccount
name: mysql
namespace: mysql-cluster
roleRef:
kind: Role
name: mysql
apiGroup: rbac.authorization.k8s.ioPDB(Pod 中断预算)
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: mysql-pdb
namespace: mysql-cluster
spec:
minAvailable: 1
selector:
matchLabels:
app: mysqlStorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
replication-type: regional-pd
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer初始化脚本
由于 StatefulSet 启动时需要配置主从复制,我们需要添加初始化容器或使用生命周期钩子。以下是完整的初始化配置:
# 在 StatefulSet 的 initContainers 中添加
initContainers:
- name: init-mysql
image: mysql:8.0.36
command:
- bash
- -c
- |
#!/bin/bash
set -ex
# 获取 Pod 序号
ordinal=$(echo ${HOSTNAME} | sed 's/.*-//')
# 第一个节点:初始化数据库
if [[ ordinal -eq 0 ]]; then
echo "Initializing first MySQL node..."
# 等待 MySQL 启动
until mysqladmin ping -h localhost -u root -p${MYSQL_ROOT_PASSWORD} --silent; do
echo "Waiting for MySQL to be ready..."
sleep 2
done
# 执行初始化 SQL
mysql -u root -p${MYSQL_ROOT_PASSWORD} < /docker-entrypoint-initdb.d/init-master.sql
echo "First MySQL node initialized successfully"
else
# 等待主节点就绪
echo "Waiting for master node to be ready..."
until mysql -h mysql-master-0.mysql-cluster-headless -u root -p${MYSQL_ROOT_PASSWORD} -e "SELECT 1" --silent; do
echo "Waiting for master node..."
sleep 2
done
# 等待复制账户创建完成
sleep 10
# 配置复制
echo "Configuring replication from master node..."
mysql -u root -p${MYSQL_ROOT_PASSWORD} -h mysql-master-0.mysql-cluster-headless <<EOF
STOP REPLICA;
RESET REPLICA ALL;
CHANGE REPLICATION SOURCE TO
SOURCE_HOST='mysql-master-0.mysql-cluster-headless',
SOURCE_PORT=3306,
SOURCE_USER='${MYSQL_REPLICATION_USER}',
SOURCE_PASSWORD='${MYSQL_REPLICATION_PASSWORD}',
SOURCE_AUTO_POSITION=1,
GET_SOURCE_PUBLIC_KEY=1;
START REPLICA;
EOF
echo "Replication configured successfully"
fi
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secrets
key: root-password
- name: MYSQL_REPLICATION_USER
value: "repl"
- name: MYSQL_REPLICATION_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secrets
key: repl-password
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql
- name: mysql-init
mountPath: /docker-entrypoint-initdb.d部署步骤
1. 创建命名空间
kubectl apply -f 00-namespace.yaml2. 创建 ConfigMap 和 Secret
kubectl apply -f 01-configmap.yaml
kubectl apply -f 02-secrets.yaml3. 创建 StorageClass
kubectl apply -f 03-storageclass.yaml4. 创建 Service
kubectl apply -f 04-services.yaml5. 创建 StatefulSet
kubectl apply -f 05-statefulset.yaml6. 验证部署
# 查看 Pod 状态
kubectl get pods -n mysql-cluster -l app=mysql
# 查看服务
kubectl get svc -n mysql-cluster
# 查看 PVC
kubectl get pvc -n mysql-cluster
# 查看日志
kubectl logs mysql-master-0 -n mysql-cluster -c mysql
kubectl logs mysql-master-1 -n mysql-cluster -c mysql7. 验证复制状态
# 进入 MySQL 容器
kubectl exec -it mysql-master-0 -n mysql-cluster -- mysql -u root -pRoot@2024
# 执行验证SQL
SHOW REPLICA STATUS\G应用配置示例
Java 应用配置
# Spring Boot 配置示例
spring:
datasource:
url: jdbc:mysql:replication://source=mysql-master.mysql-cluster.svc.cluster.local:3306,target=mysql-replica.mysql-cluster.svc.cluster.local:3306/appdb?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=Asia/Shanghai
username: appuser
password: App@2024
driver-class-name: com.mysql.cj.jdbc.Driver
hikari:
maximum-pool-size: 20
minimum-idle: 5
connection-timeout: 30000
idle-timeout: 600000
max-lifetime: 1800000Python 应用配置
# Python SQLAlchemy 配置示例
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool
DATABASE_URL = (
"mysql+pymysql://appuser:App@2024@"
"mysql-master.mysql-cluster.svc.cluster.local:3306/appdb"
"?retry_on_failure=3"
)
engine = create_engine(
DATABASE_URL,
poolclass=QueuePool,
pool_size=10,
max_overflow=20,
pool_pre_ping=True,
)
# 读写分离配置可以使用 ReadReplica 连接
READ_REPLICA_URL = (
"mysql+pymysql://appuser:App@2024@"
"mysql-replica.mysql-cluster.svc.cluster.local:3306/appdb"
)监控和告警
Prometheus 监控配置
apiVersion: v1
kind: ConfigMap
metadata:
name: mysql-monitoring
namespace: mysql-cluster
data:
mysqld-exporter-job.yaml: |
- job_name: mysql_exporter
static_configs:
- targets:
- mysql-master-0.mysql-cluster:9104
- mysql-master-1.mysql-cluster:9104告警规则
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: mysql-alerts
namespace: mysql-cluster
spec:
groups:
- name: mysql.rules
rules:
- alert: MySQLDown
expr: up{job="mysql_exporter"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "MySQL {{ $labels.instance }} is down"
- alert: MySQLReplicationLag
expr: mysql_replica_seconds_behind_master > 30
for: 5m
labels:
severity: warning
annotations:
summary: "MySQL replication lag is {{ $value }} seconds"运维操作
备份和恢复
# 备份
kubectl exec mysql-master-0 -n mysql-cluster -- \
mysqldump -u root -pRoot@2024 --all-databases > backup.sql
# 恢复
kubectl exec -i mysql-master-0 -n mysql-cluster -- \
mysql -u root -pRoot@2024 < backup.sql扩容
# 扩容(需要手动配置新节点的复制)
kubectl scale statefulset mysql-master -n mysql-cluster --replicas=3主节点切换
# 手动切换主节点(停止应用后执行)
kubectl exec mysql-master-1 -n mysql-cluster -- \
mysql -u root -pRoot@2024 -e "STOP REPLICA; START REPLICA;"常见问题排查
复制延迟过高
可能原因包括网络延迟、事务过大、硬件性能不足。处理方法:优化慢查询、减少大事务、增加硬件资源。
数据不一致
可能原因包括复制中断后未及时发现。处理方法:重新建立复制、使用 pt-table-checksum 检查。
连接超时
可能原因包括健康检查配置不当。处理方法:调整探针配置、增加超时时间。
总结
本文提供了在 Kubernetes 环境中部署 MySQL 双主集群的完整方案。生产环境中建议配合使用监控告警系统,定期检查复制状态,并设置合理的备份策略。双主架构虽然提供了高可用性,但真正的零数据丢失还需要配合延迟副本和定期备份来实现。
