MySQL 连接数打满业务雪崩,运维第一时间该做什么?
服务器日志疯狂刷屏:Too many connections,MySQL 连接数瞬间打满,业务接口全面超时。
作为运维,你第一时间该做什么?不是慌着重启,也不是盲目调大连接数,而是一套先止血、再定位、后根治的标准应急流程。今天就把生产级 MySQL 连接数爆满处理步骤,一次性讲透。
一、问题现象
典型表现:
- 应用报错:
ERROR 1040 (HY000): Too many connections - 部分接口超时,但数据库直连正常
- 普通账号无法连接,
root@localhost还能登录
二、快速止血
2.1 杀掉空闲睡眠连接
sql
-- 查看睡眠连接分布
SELECT USER, HOST, COUNT(*) FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND = 'Sleep' GROUP BY USER, HOST;
-- 生成并执行KILL语句
SELECT CONCAT('KILL ', ID, ';') FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND = 'Sleep' AND TIME > 30 INTO OUTFILE '/tmp/kill.sql';
SOURCE /tmp/kill.sql;2.2 临时提高连接数上限
sql
-- 动态调整,无需重启
SET GLOBAL max_connections = 500;2.3 杀掉慢查询
sql
-- 找出执行时间最长的查询
SELECT ID, USER, TIME, LEFT(INFO, 100) FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND != 'Sleep' ORDER BY TIME DESC LIMIT 10;
-- 终止指定查询
KILL QUERY <ID>;三、根因分析
3.1 慢查询导致
sql
-- 查看未提交的长事务
SELECT trx_id, trx_started, TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS running_sec, trx_query FROM INFORMATION_SCHEMA.INNODB_TRX ORDER BY trx_started;3.2 连接泄漏
sql
-- 同一IP连接数过多且sleep时间很长 → 疑似泄漏
SELECT USER, HOST, COUNT(*) AS cnt, MAX(TIME) AS max_sleep FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND = 'Sleep' GROUP BY USER, HOST HAVING cnt > 10;3.3 短连接频繁创建
bash
# 查看连接建立速率
mysql -e "SHOW STATUS LIKE 'Connections';"
mysql -e "SHOW STATUS LIKE 'Aborted_connects';"3.4 max_connections 设置过小
sql
-- 查看历史最大使用量
SHOW STATUS LIKE 'Max_used_connections';四、排障命令速查表
| 目的 | 命令 |
|---|---|
| 查看连接概览 | SHOW STATUS LIKE 'Threads_%'; |
| 查看所有连接详情 | SELECT ID,USER,HOST,COMMAND,TIME,STATE FROM INFORMATION_SCHEMA.PROCESSLIST; |
| 按用户统计连接 | SELECT USER,COUNT(*) FROM PROCESSLIST GROUP BY USER; |
| 查看锁等待 | SELECT * FROM INFORMATION_SCHEMA.INNODB_TRX WHERE trx_state='LOCK WAIT'; |
五、长期解决方案
5.1 合理设置 max_connections
| 服务器规格 | 推荐值 |
|---|---|
| 4核8GB | 300-500 |
| 8核16GB | 500-800 |
| 16核32GB | 800-1500 |
| 32核64GB | 1500-3000 |
sql
-- 永久生效
SET PERSIST max_connections = 800;5.2 配置空闲连接超时
sql
SET PERSIST wait_timeout = 600; -- 10分钟
SET PERSIST interactive_timeout = 600;5.3 应用层连接池(以Java HikariCP为例)
yaml
maximum-pool-size: 20 # 最大连接数
minimum-idle: 5 # 最小空闲
connection-timeout: 30000 # 获取连接超时(ms)
idle-timeout: 600000 # 空闲超时(ms)5.4 代码规范(以Python为例)
python
# 务必在finally中关闭连接
conn = pool.get_connection()
try:
cursor = conn.cursor()
cursor.execute(sql)
return cursor.fetchall()
finally:
cursor.close()
conn.close() # 归还连接池附:MySQL连接数爆满快速诊断脚本
bash
#!/bin/bash
SOCKET="/var/lib/mysql/mysql.sock"
echo "=== 连接数概览 ==="
mysql -S "$SOCKET" -e "SHOW STATUS LIKE 'Threads_connected'; SHOW VARIABLES LIKE 'max_connections';"
echo -e "\n=== 连接数TOP用户 ==="
mysql -S "$SOCKET" -e "SELECT USER, COUNT(*) FROM PROCESSLIST GROUP BY USER ORDER BY COUNT(*) DESC LIMIT 5;"
echo -e "\n=== 最长运行查询 ==="
mysql -S "$SOCKET" -e "SELECT ID, TIME, LEFT(INFO,80) FROM PROCESSLIST WHERE COMMAND='Query' ORDER BY TIME DESC LIMIT 5;"
echo -e "\n=== 未提交事务 ==="
mysql -S "$SOCKET" -e "SELECT trx_id, TIMESTAMPDIFF(SECOND,trx_started,NOW()) as sec FROM INFORMATION_SCHEMA.INNODB_TRX;"总的来说,MySQL 出现 Too many connections 并不可怕,可怕的是一上来就乱操作、只治标不治本。遇到连接数打满,先稳住心态,按先止血、再定位、后根治的思路一步步来:先用清理空闲连接、临时调大连接数快速恢复业务,再通过排查慢查询、长事务、连接泄漏找到真正诱因,最后从数据库配置、应用连接池、代码规范三方面做好长期防护。
线上故障拼的不是速度,而是章法!
