数据库索引设计的几个误区

数据库索引设计的几个误区

索引是数据库性能优化的利器,但我在项目中见过太多因为索引设计不当导致的性能问题。从过度索引导致写入性能下降,到索引失效引起慢查询,这些坑都踩过。今天总结几个常见的索引设计误区。

误区一:索引越多越好

常见错误做法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- 为每个字段都建索引
CREATE TABLE user_orders (
id BIGINT PRIMARY KEY,
user_id BIGINT,
order_no VARCHAR(32),
product_id BIGINT,
quantity INT,
price DECIMAL(10,2),
status TINYINT,
created_at TIMESTAMP,
updated_at TIMESTAMP,

-- 错误:为每个可能查询的字段都建索引
KEY idx_user_id (user_id),
KEY idx_order_no (order_no),
KEY idx_product_id (product_id),
KEY idx_quantity (quantity),
KEY idx_price (price),
KEY idx_status (status),
KEY idx_created_at (created_at),
KEY idx_updated_at (updated_at)
);

问题分析

1
2
3
4
5
6
7
8
9
10
11
12
13
# 查看表的索引情况
SHOW INDEX FROM user_orders;

# 查看表的存储开销
SELECT
table_name,
ROUND((data_length + index_length) / 1024 / 1024, 2) AS table_size_mb,
ROUND(index_length / 1024 / 1024, 2) AS index_size_mb,
ROUND(index_length / (data_length + index_length) * 100, 2) AS index_ratio
FROM information_schema.tables
WHERE table_name = 'user_orders';

-- 结果可能显示:索引大小占表总大小的60%以上

正确的做法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- 根据实际查询模式设计复合索引
CREATE TABLE user_orders (
id BIGINT PRIMARY KEY,
user_id BIGINT,
order_no VARCHAR(32),
product_id BIGINT,
quantity INT,
price DECIMAL(10,2),
status TINYINT,
created_at TIMESTAMP,
updated_at TIMESTAMP,

-- 根据查询模式设计复合索引
UNIQUE KEY uk_order_no (order_no), -- 订单号唯一查询
KEY idx_user_status_created (user_id, status, created_at DESC), -- 用户订单查询
KEY idx_product_created (product_id, created_at DESC), -- 商品销售查询
KEY idx_status_created (status, created_at DESC) -- 订单状态查询
);

误区二:复合索引字段顺序随意

错误的索引设计

1
2
3
4
5
6
7
-- 查询模式分析
-- Q1: SELECT * FROM orders WHERE user_id = ? AND status = ? ORDER BY created_at DESC
-- Q2: SELECT * FROM orders WHERE status = ? AND created_at > ?
-- Q3: SELECT * FROM orders WHERE user_id = ? ORDER BY created_at DESC

-- 错误的索引顺序
KEY idx_wrong (created_at, status, user_id) -- 无法有效支持上述查询

字段顺序原则

1
2
3
4
5
6
7
-- 正确的索引设计原则:
-- 1. 等值查询字段放在前面
-- 2. 范围查询字段放在中间
-- 3. 排序字段放在最后

KEY idx_user_status_created (user_id, status, created_at DESC), -- 支持Q1和Q3
KEY idx_status_created (status, created_at DESC) -- 支持Q2

实战案例:订单查询优化

问题 SQL

1
2
3
4
5
6
7
8
-- 慢查询:耗时2.3秒
SELECT * FROM orders
WHERE user_id = 12345 AND status IN (1, 2, 3)
ORDER BY created_at DESC
LIMIT 20;

-- 原索引设计
KEY idx_old (created_at, user_id, status) -- 错误的字段顺序

分析执行计划

1
2
3
4
5
6
7
8
9
EXPLAIN SELECT * FROM orders
WHERE user_id = 12345 AND status IN (1, 2, 3)
ORDER BY created_at DESC
LIMIT 20\G

-- 结果显示:
-- key: idx_old
-- rows: 500000 (需要扫描50万行)
-- Extra: Using where; Using filesort (需要额外排序)

优化后

1
2
3
4
5
6
7
-- 正确的索引顺序
KEY idx_new (user_id, status, created_at DESC)

-- 执行计划改善:
-- rows: 20
-- Extra: Using where (利用索引,无需排序)
-- 查询时间:15ms

误区三:忽略索引覆盖

低效的查询方式

1
2
3
4
5
6
7
-- 原始查询:需要回表
SELECT id, order_no, status, created_at
FROM orders
WHERE user_id = 12345 AND status = 1;

-- 当前索引
KEY idx_user_status (user_id, status)

覆盖索引优化

1
2
3
4
5
6
7
8
9
10
-- 优化后的索引:包含查询所需的所有字段
KEY idx_user_status_cover (user_id, status, order_no, created_at)

-- 执行计划对比
EXPLAIN SELECT id, order_no, status, created_at
FROM orders
WHERE user_id = 12345 AND status = 1\G

-- 优化前:Extra: NULL (需要回表)
-- 优化后:Extra: Using index (覆盖索引,无需回表)

实际性能对比

1
2
3
4
5
6
7
8
9
-- 性能测试脚本
SET @start_time = NOW(6);
SELECT id, order_no, status, created_at
FROM orders
WHERE user_id IN (1000, 2000, 3000, 4000, 5000) AND status = 1;
SELECT TIMESTAMPDIFF(MICROSECOND, @start_time, NOW(6)) / 1000 AS execution_time_ms;

-- 优化前:250ms
-- 优化后:45ms (性能提升5倍)

误区四:前缀索引使用不当

错误的前缀索引长度

1
2
3
4
5
6
7
8
9
10
11
12
13
-- 对于较长的字符串字段
CREATE TABLE user_profiles (
id BIGINT PRIMARY KEY,
email VARCHAR(128),
description TEXT,
avatar_url VARCHAR(512)
);

-- 错误:前缀太短,区分度不够
KEY idx_email_short (email(5)) -- 区分度可能很低

-- 错误:前缀太长,浪费空间
KEY idx_email_long (email(100)) -- 大部分email不会这么长

计算最优前缀长度

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- 分析字符串分布,确定最优前缀长度
SELECT
LENGTH(email) as email_length,
COUNT(*) as count
FROM user_profiles
GROUP BY LENGTH(email)
ORDER BY email_length;

-- 计算不同前缀长度的区分度
SELECT
COUNT(DISTINCT LEFT(email, 5)) / COUNT(*) AS selectivity_5,
COUNT(DISTINCT LEFT(email, 10)) / COUNT(*) AS selectivity_10,
COUNT(DISTINCT LEFT(email, 15)) / COUNT(*) AS selectivity_15,
COUNT(DISTINCT LEFT(email, 20)) / COUNT(*) AS selectivity_20
FROM user_profiles;

-- 选择区分度达到0.95以上的最短长度
-- 结果显示15个字符就能达到0.98的区分度
KEY idx_email_optimal (email(15))

误区五:忽略索引维护

未及时删除无用索引

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- 分析索引使用情况
SELECT
s.table_schema,
s.table_name,
s.index_name,
s.seq_in_index,
s.column_name,
t.table_rows,
IFNULL(st.rows_read, 0) as rows_read
FROM information_schema.statistics s
LEFT JOIN information_schema.tables t
ON s.table_schema = t.table_schema AND s.table_name = t.table_name
LEFT JOIN sys.schema_index_statistics st
ON s.table_schema = st.table_schema
AND s.table_name = st.table_name
AND s.index_name = st.index_name
WHERE s.table_schema = 'your_database'
ORDER BY st.rows_read DESC;

-- 找出从未使用的索引
SELECT DISTINCT
object_schema,
object_name,
index_name
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NOT NULL
AND count_star = 0
AND object_schema = 'your_database';

定期索引维护

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- 重建索引以消除碎片
ALTER TABLE large_table ENGINE=InnoDB;

-- 或者针对特定索引
DROP INDEX idx_name ON table_name;
CREATE INDEX idx_name ON table_name (column_name);

-- 分析索引碎片情况
SELECT
table_name,
index_name,
stat_name,
stat_value,
stat_description
FROM mysql.innodb_index_stats
WHERE table_name = 'your_table'
AND stat_name = 'size';

误区六:在小表上建过多索引

小表索引优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- 对于小表(< 1000行),简单的全表扫描可能比索引更快
-- 错误做法
CREATE TABLE config_settings (
id INT PRIMARY KEY,
setting_key VARCHAR(50),
setting_value TEXT,
created_at TIMESTAMP,

-- 没必要的索引
KEY idx_key (setting_key), -- 表只有几十行数据
KEY idx_created (created_at) -- 全表扫描更快
);

-- 正确做法:只保留必要的唯一索引
CREATE TABLE config_settings (
id INT PRIMARY KEY,
setting_key VARCHAR(50) UNIQUE, -- 业务唯一性要求
setting_value TEXT,
created_at TIMESTAMP
);

索引设计最佳实践

设计检查清单

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- 1. 分析查询模式
EXPLAIN SELECT ... ; -- 检查每个重要查询的执行计划

-- 2. 监控慢查询日志
SET slow_query_log = 'ON';
SET long_query_time = 1; -- 记录超过1秒的查询

-- 3. 定期分析索引使用情况
SELECT * FROM sys.schema_unused_indexes WHERE object_schema = 'your_db';

-- 4. 监控索引大小和维护成本
SELECT
table_name,
ROUND((data_length + index_length) / 1024 / 1024, 2) AS total_size_mb,
ROUND(index_length / 1024 / 1024, 2) AS index_size_mb,
ROUND(index_length / (data_length + index_length) * 100, 2) AS index_ratio
FROM information_schema.tables
WHERE table_schema = 'your_database'
ORDER BY index_size_mb DESC;

自动化监控脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#!/bin/bash
# index_health_check.sh

DB_NAME="your_database"
MYSQL_CMD="mysql -u$USER -p$PASS"

echo "=== 索引健康度检查 ==="

# 1. 检查未使用的索引
echo "未使用的索引:"
$MYSQL_CMD -e "
SELECT DISTINCT object_schema, object_name, index_name
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NOT NULL AND count_star = 0
AND object_schema = '$DB_NAME';"

# 2. 检查重复索引
echo "可能重复的索引:"
$MYSQL_CMD -e "
SELECT
s1.table_name,
s1.index_name AS index1,
s2.index_name AS index2,
s1.column_name
FROM information_schema.statistics s1
JOIN information_schema.statistics s2
ON s1.table_schema = s2.table_schema
AND s1.table_name = s2.table_name
AND s1.column_name = s2.column_name
AND s1.index_name < s2.index_name
WHERE s1.table_schema = '$DB_NAME';"

# 3. 检查大表缺失索引的情况
echo "大表可能缺失的索引:"
$MYSQL_CMD -e "
SELECT
table_name,
table_rows,
ROUND((data_length + index_length) / 1024 / 1024, 2) AS size_mb
FROM information_schema.tables
WHERE table_schema = '$DB_NAME'
AND table_rows > 100000
AND table_name NOT IN (
SELECT DISTINCT table_name
FROM information_schema.statistics
WHERE table_schema = '$DB_NAME'
);"

索引设计需要在查询性能和维护成本之间找到平衡。避免这些常见误区,建立合适的监控和维护机制,是保证数据库长期稳定运行的关键。记住:好的索引设计来自于对业务查询模式的深度理解。