Skip to content

MySQL 连接数打满业务雪崩,运维第一时间该做什么?

服务器日志疯狂刷屏:Too many connections,MySQL 连接数瞬间打满,业务接口全面超时。
作为运维,你第一时间该做什么?不是慌着重启,也不是盲目调大连接数,而是一套先止血、再定位、后根治的标准应急流程。今天就把生产级 MySQL 连接数爆满处理步骤,一次性讲透。

一、问题现象

典型表现:

  • 应用报错:ERROR 1040 (HY000): Too many connections
  • 部分接口超时,但数据库直连正常
  • 普通账号无法连接,root@localhost 还能登录

二、快速止血

2.1 杀掉空闲睡眠连接

sql
-- 查看睡眠连接分布
SELECT USER, HOST, COUNT(*) FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND = 'Sleep' GROUP BY USER, HOST;

-- 生成并执行KILL语句
SELECT CONCAT('KILL ', ID, ';') FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND = 'Sleep' AND TIME > 30 INTO OUTFILE '/tmp/kill.sql';
SOURCE /tmp/kill.sql;

2.2 临时提高连接数上限

sql
-- 动态调整,无需重启
SET GLOBAL max_connections = 500;

2.3 杀掉慢查询

sql
-- 找出执行时间最长的查询
SELECT ID, USER, TIME, LEFT(INFO, 100) FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND != 'Sleep' ORDER BY TIME DESC LIMIT 10;

-- 终止指定查询
KILL QUERY <ID>;

三、根因分析

3.1 慢查询导致

sql
-- 查看未提交的长事务
SELECT trx_id, trx_started, TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS running_sec, trx_query FROM INFORMATION_SCHEMA.INNODB_TRX ORDER BY trx_started;

3.2 连接泄漏

sql
-- 同一IP连接数过多且sleep时间很长 → 疑似泄漏
SELECT USER, HOST, COUNT(*) AS cnt, MAX(TIME) AS max_sleep FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND = 'Sleep' GROUP BY USER, HOST HAVING cnt > 10;

3.3 短连接频繁创建

bash
# 查看连接建立速率
mysql -e "SHOW STATUS LIKE 'Connections';"
mysql -e "SHOW STATUS LIKE 'Aborted_connects';"

3.4 max_connections 设置过小

sql
-- 查看历史最大使用量
SHOW STATUS LIKE 'Max_used_connections';

四、排障命令速查表

目的命令
查看连接概览SHOW STATUS LIKE 'Threads_%';
查看所有连接详情SELECT ID,USER,HOST,COMMAND,TIME,STATE FROM INFORMATION_SCHEMA.PROCESSLIST;
按用户统计连接SELECT USER,COUNT(*) FROM PROCESSLIST GROUP BY USER;
查看锁等待SELECT * FROM INFORMATION_SCHEMA.INNODB_TRX WHERE trx_state='LOCK WAIT';

五、长期解决方案

5.1 合理设置 max_connections

服务器规格推荐值
4核8GB300-500
8核16GB500-800
16核32GB800-1500
32核64GB1500-3000
sql
-- 永久生效
SET PERSIST max_connections = 800;

5.2 配置空闲连接超时

sql
SET PERSIST wait_timeout = 600;    -- 10分钟
SET PERSIST interactive_timeout = 600;

5.3 应用层连接池(以Java HikariCP为例)

yaml
maximum-pool-size: 20     # 最大连接数
minimum-idle: 5           # 最小空闲
connection-timeout: 30000 # 获取连接超时(ms)
idle-timeout: 600000      # 空闲超时(ms)

5.4 代码规范(以Python为例)

python
# 务必在finally中关闭连接
conn = pool.get_connection()
try:
    cursor = conn.cursor()
    cursor.execute(sql)
    return cursor.fetchall()
finally:
    cursor.close()
    conn.close()   # 归还连接池

附:MySQL连接数爆满快速诊断脚本

bash
#!/bin/bash
SOCKET="/var/lib/mysql/mysql.sock"

echo "=== 连接数概览 ==="
mysql -S "$SOCKET" -e "SHOW STATUS LIKE 'Threads_connected'; SHOW VARIABLES LIKE 'max_connections';"

echo -e "\n=== 连接数TOP用户 ==="
mysql -S "$SOCKET" -e "SELECT USER, COUNT(*) FROM PROCESSLIST GROUP BY USER ORDER BY COUNT(*) DESC LIMIT 5;"

echo -e "\n=== 最长运行查询 ==="
mysql -S "$SOCKET" -e "SELECT ID, TIME, LEFT(INFO,80) FROM PROCESSLIST WHERE COMMAND='Query' ORDER BY TIME DESC LIMIT 5;"

echo -e "\n=== 未提交事务 ==="
mysql -S "$SOCKET" -e "SELECT trx_id, TIMESTAMPDIFF(SECOND,trx_started,NOW()) as sec FROM INFORMATION_SCHEMA.INNODB_TRX;"

总的来说,MySQL 出现 Too many connections 并不可怕,可怕的是一上来就乱操作、只治标不治本。遇到连接数打满,先稳住心态,按先止血、再定位、后根治的思路一步步来:先用清理空闲连接、临时调大连接数快速恢复业务,再通过排查慢查询、长事务、连接泄漏找到真正诱因,最后从数据库配置、应用连接池、代码规范三方面做好长期防护。

线上故障拼的不是速度,而是章法!