MySQL-SQL-DQL 数据查询语言

MySQL 查询语句

13.2.9 SELECT Statement https://dev.mysql.com/doc/refman/5.7/en/select.html

13.2.10 Subqueries https://dev.mysql.com/doc/refman/5.7/en/subqueries.html

select [ all | distinct ]  <目标列表达式 | 聚集函数>[别名]...
from <表名或视图名>[别名]...
[ where <条件表达式> ]
[ group by <列名> [having <条件表达式>]  ]
[ order by <列名> [ASC | DESC] ]

SQL优化

count(*)慢优化

对于没有过滤条件的 count(*) MyISAM 引擎把一个表的总行数存在了磁盘上，因此执行 count() 的时候会直接返回这个数，效率很高；而 InnoDB 引擎就麻烦了，它执行 count() 的时候，需要把数据一行一行地从引擎里面读出来，然后累积计数。

id分段count后求和

sql1: select count(*) from user where score>90; 优化为按主键id分段count后相加 sql2:

select sum(cnt) from (
  select count(*) cnt from table_a where id<1000000 and age is not null
  union all
  select count(*) cnt from table_a where id>=1000000 and id<2000000 and age is not null
) t;

2亿8千万数据，符合条件的约3000万 sql1 执行了6个小时还没结束

非精确count-information_schema.tables

从 information_schema.tables 查询 TABLE_ROWS 对于 InnoDB 存储引擎来说是个近似值。 select TABLE_ROWS from information_schema.tables where table_schema='db_name' and table_name='table_a';

非精确-show table status like

本质和从 information_schema.tables 查询 TABLE_ROWS 一样，对于 InnoDB 存储引擎来说是个近似值。

mysql> show table status like 'my_app_table_11111';
+--------------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+
| Name               | Engine | Version | Row_format | Rows   | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation       | Checksum | Create_options | Comment |
+--------------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+
| my_app_table_11111 | InnoDB |      10 | Dynamic    | 145394 |           4536 |   659570688 |               0 |     28540928 |   7340032 |         340429 | 2021-06-27 21:49:33 | 2021-07-14 17:02:52 | NULL       | utf8_general_ci |     NULL |                |         |
+--------------------+--------+---------+------------+--------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+

非精确count-解析explain扫描行数

解析 explain 输出的 rows 字段

mysql> explain select * from vehicle_four_wheel;
+----+-------------+--------------------+------------+------+---------------+------+---------+------+--------+----------+-------+
| id | select_type | table              | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra |
+----+-------------+--------------------+------------+------+---------------+------+---------+------+--------+----------+-------+
|  1 | SIMPLE      | table_my_app       | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 140755 |   100.00 | NULL  |
+----+-------------+--------------------+------------+------+---------------+------+---------+------+--------+----------+-------+

count(distinct)优化

sql1: select count(distinct person_no) from user where score>90; 优化为先查出distinct编号，再count sql2: select count(*) from (select distinct(person_no) from user where score>90) t;

2亿8千万数据，符合条件的有大约1千万数据 sql1 执行了6个小时还没结束 sql2半小时执行完了。

int和varchar类型索引

当MySQL中字段为 int 类型时，搜索条件 where num='123' 与 where num=123 都可以使用该字段的索引。 当MySQL中字段为 varchar 类型时，搜索条件 where num='123' 可以使用索引，where num=123 不可以使用索引。

所以：字段类型不同会有隐式类型转换，可能导致索引失效。

这一点也是MySQL int和varchar类型选择的一个依据： 1、除了 int 效率较高之外。 2、int类型字段建索引后可以匹配int和varchar条件，但varchar类型字段建索引后只能匹配varchar条件如果某个字段存储的是int，那就在MySQL中定义为int，而不要使用varchar。

去掉表字段上的unix_timestamp时间函数

根据更新时间查询 user 报慢查询，DBA 建议不要在表字段上使用函数，会导致无法使用索引。不要在表字段上用函数，会用不到索引，在参数上用或者直接转换下参数

MariaDB [db]> explain select id from user where unix_timestamp(update_time) >= 1571673600 and unix_timestamp(update_time) < 1571760000 ;
+------+-------------+-------+-------+---------------+-------------+---------+------+--------+--------------------------+
| id   | select_type | table | type  | possible_keys | key         | key_len | ref  | rows   | Extra                    |
+------+-------------+-------+-------+---------------+-------------+---------+------+--------+--------------------------+
|    1 | SIMPLE      | user  | index | NULL          | update_time | 4       | NULL | 208320 | Using where; Using index |
+------+-------------+-------+-------+---------------+-------------+---------+------+--------+--------------------------+
1 row in set (0.012 sec)

MariaDB [db]> explain SELECT id FROM user WHERE update_time >= '2019-10-22 00:00:00.0(Timestamp)' AND update_time < '2019-10-23 00:00:00.0(Timestamp)';
+------+-------------+-------+-------+---------------------------+-------------+---------+------+--------+--------------------------+
| id   | select_type | table | type  | possible_keys             | key         | key_len | ref  | rows   | Extra                    |
+------+-------------+-------+-------+---------------------------+-------------+---------+------+--------+--------------------------+
|    1 | SIMPLE      | user  | index | update_time   | update_time | 4       | NULL | 208320 | Using where; Using index |
+------+-------------+-------+-------+---------------------------+-------------+---------+------+--------+--------------------------+
1 row in set (0.010 sec)

原因 SQL语句where中如果有functionName(colname)或者某些运算，则MYSQL无法使用基于colName的索引。使用索引需要直接查询某个字段。索引失效的原因是索引是针对原值建的二叉树，将列值计算后，原来的二叉树就用不上了；为了解决索引列上计算引起的索引失效问题，将计算放到索引列外的表达式上。

create_time排序改为id排序避免回表

假如 user 表使用自增 id 作为主键，且创建后 create_time 不会变，则 id 排序和 create_time 排序的结果是一样的，可使用 id 排序代替 create_time 排序。因为 create_time 上即使有索引也是二级索引，需要回表找到主键 id 才行。

offset分页改写为where id条件

如果像下面这样循环分页查，随着 offset 的增加，查询会变的越来越慢，因为 mysql 是查出匹配条件的全部数据后再切分的。

select *
from user
where xx=xx
order by xx
limit $offset, 100;

正确的循环分页查询方法应该是把 offset 转换为 where id 条件，每次不断改变条件，使用 where id 条件和 limit 来分页，不要使用偏移量 offset。典型的比如使用 id，每次记录上一页最后一个id当做条件

select *
from user
where id > $last_id
order by id
limit 100;

比如对外提供一个按 create_time 时间范围分页查询的接口 queryByCreateTimePageable(String t1, String t2, long offset, int count); 本来sql如下

SELECT *
FROM table t
WHERE create_time >= $t1
  AND create_time <= $t2
ORDER BY id ASC
LIMIT $offset, $count;

可以内部对其进行优化，将offset分页改写为 where id 条件

SELECT *
FROM table t
WHERE create_time >= t1
  AND create_time <= t2
  AND id >= (SELECT ID
             FROM table st force index (primary)
             WHERE create_time >= t1
               AND create_time <= t2
             ORDER BY id ASC
             LIMIT $offset,1)
ORDER BY id ASC
LIMIT $count;

两表关联的连接表如何建索引

user 人员表（id, name, birthday） user_tag 人员标签表（id, user_id, tag_id）一个人可以有多个标签 tag 标签表 (id,name) 根据标签id列表查询人员并按生日倒序排序特别慢

select distinct u.*
from user u join user_tag ut on u.id=ut.user_id
where ut.tag_id in (1,2,3)
order by u.birthday desc limit 10;

数据量：200万 user，300万 user_tag，这种索引该怎么建？最终建立的索引

user_tag 表 1、在 user_id 列上创建索引，为了加速”查某人的标签” 2、在 (tag_id, user_id) 列上创建联合索引，为了查某标签下有多少人 alter table user_tag add index idx_user_id(user_id), add index idx_tag_user(tag_id, user_id);

网上有人做了实验，结果是：关联表分别创建 user_id 与 tag_id 的单列索引 idx_user, idx_tag 最优。

MySQL两表关联的连接表该如何创建索引？ https://www.jianshu.com/p/0ec5b4dedc1a

Prepared Statements 预编译语句

13.5 Prepared Statements https://dev.mysql.com/doc/refman/5.7/en/sql-prepared-statements.html

预编译语句（PreparedStatement）是一种在数据库操作中预先定义 SQL 模板的技术。其核心思想是将 SQL 语句的结构与参数分离：SQL 语句中的动态值通过占位符（如 ?）表示，执行时再将具体参数绑定到占位符。

MySQL 支持 server 端预编译语句

MySQL 支持 server 端的预编译语句：

减少解析开销：同一模板多次执行时，直接复用预编译的执行计划，降低CPU消耗。
防御SQL注入：参数值通过类型绑定而非字符串拼接，避免恶意代码注入（如 OR 1=1 注入）

MySQL 通过 PREPARE, EXECUTE, DEALLOCATE 三个指令实现预处理语句：

-- 1. 定义预处理语句
PREPARE stmt_name FROM 'SELECT * FROM users WHERE age > ?';

-- 2. 设置参数并执行
SET @age = 18;
EXECUTE stmt_name USING @age;

-- 3. 释放资源
DEALLOCATE PREPARE stmt_name;

Java 中使用 PreparedStatement

// 连接数据库
Class.forName("com.mysql.cj.jdbc.Driver");
Connection conn = DriverManager.getConnection(
    "jdbc:mysql://localhost:3306/db_name", "user", "password"
);

// 构建预编译语句
String sql = "INSERT INTO users (name, age) VALUES (?, ?)";
PreparedStatement pstmt = conn.prepareStatement(sql);

// 绑定参数
pstmt.setString(1, "Alice");  // 第一个占位符设为字符串"Alice"
pstmt.setInt(2, 30);          // 第二个占位符设为整数30

// 执行增删改
int rowsAffected = pstmt.executeUpdate();

// 执行查询
ResultSet rs = queryPstmt.executeQuery();
while (rs.next()) {
    System.out.println(rs.getString("name"));
}

// 关闭资源
rs.close();
pstmt.close();
conn.close();

干预索引

use index(a,b) 限制索引范围

有时我们在数据表里建立了很多索引，当MySQL对索引进行选择时，这些索引都在考虑的范围内。但有时我们希望MySQL只考虑几个索引，而不是全部的索引，这就需要用到USE INDEX对查询语句进行设置。 SELECT * FROM TABLE1 USE INDEX (FIELD1, FIELD2) … 无论在TABLE1中已经建立了多少个索引，MySQL在选择索引时，只考虑在FIELD1和FIELD2上建立的索引。

ignore index(a,b) 忽略索引

如果我们要考虑的索引很多，而不被使用的索引又很少时，可以使用IGNORE INDEX进行反向选取。在上面的例子中是选择被考虑的索引，而使用IGNORE INDEX是选择不被考虑的索引。 SELECT * FROM TABLE1 IGNORE INDEX (FIELD1, FIELD2) … 在上面的SQL语句中，TABLE1表中只有FIELD1和FIELD2上的索引不被使用。

force index(a) 强制使用索引

有时我们希望MySQL必须要使用某一个索引(由于MySQL在查询时只能使用一个索引，因此只能强迫MySQL使用一个索引)。这就需要使用FORCE INDEX来完成这个功能。 SELECT * FROM TABLE1 FORCE INDEX (FIELD1) … 以上的SQL语句只使用建立在FIELD1上的索引，而不使用其它字段上的索引。

like

like binary 区分大小写

MySQL 的 like 查询是不区分大小写的有时候，我们需要区分大小写的是，该怎么办呢？

一、一种方法是在查询时指定大小写敏感，在 like 的后面加个 binary 就可以了

select *
from user
where name like binary '%aaa%'

二、另一种方法是建表时可以设置表或行的 collation, 使其为 binary 或 case sensitive. 在 MySQL 中，对于 Column Collate 其约定的命名方法如下： *_bin: 表示的是binary case sensitive collation，也就是说是区分大小写的 *_cs: case sensitive collation，区分大小写 *_ci: case insensitive collation，不区分大小写

like escape 转义通配符

like 子句中 % 匹配多个字符， _ 匹配单个字符。如果要匹配的内容本身包含 % 和 _ 怎么办呢？

可以使用 escape 指定转义字符，转义字符后面的 % 或 _ 就不作为通配符了，注意前面没有转义字符的%和_仍然起通配符作用

select * from user where name like '%li/_%' escape '/'; select * from user where name like '%li/%%' escape '/';

regexp 正则匹配

^ 匹配输入字符串的开始位置。如果设置了 RegExp 对象的 Multiline 属性，^ 也匹配 '\n' 或 '\r' 之后的位置。 $ 匹配输入字符串的结束位置。如果设置了 RegExp 对象的 Multiline 属性，$ 也匹配 '\n' 或 '\r' 之前的位置。 . 匹配除 "\n" 之外的任何单个字符。要匹配包括 '\n' 在内的任何字符，请使用像 '[.\n]' 的模式。 [...] 字符集合。匹配所包含的任意一个字符。例如， '[abc]' 可以匹配 "plain" 中的 'a'。 [^...] 负值字符集合。匹配未包含的任意字符。例如， '[^abc]' 可以匹配 "plain" 中的'p'。 p1|p2|p3 匹配 p1 或 p2 或 p3。例如，'z|food' 能匹配 "z" 或 "food"。'(z|f)ood' 则匹配 "zood" 或 "food"。 * 匹配前面的子表达式零次或多次。例如，zo* 能匹配 "z" 以及 "zoo"。* 等价于 {0,} + 匹配前面的子表达式一次或多次。例如，'zo+' 能匹配 "zo" 以及 "zoo"，但不能匹配 "z"。+ 等价于 {1,} {n} n 是一个非负整数。匹配确定的 n 次。例如，'o{2}' 不能匹配 "Bob" 中的 'o'，但是能匹配 "food" 中的两个 o。 {n,m} m 和 n 均为非负整数，其中n <= m。最少匹配 n 次且最多匹配 m 次。

例：

1、查找 long 型字符串 create_time 以 000 结尾的记录

select * from user where create_time regexp '000$';

2、查找 name 以 '张' 开头的记录

select * from user where name regexp '^张';

3、查找 name 包含 '小' 的记录

select * from user where name regexp '小';

12.8.2 Regular Expressions https://dev.mysql.com/doc/refman/5.7/en/regexp.html

order by field 按指定顺序排序

order by field 可以按指定的顺序排序，最好搭配in一起使用

SELECT * FROM MyTable
where id in(5, 3, 7, 1)
ORDER BY FIELD(`id`, 5, 3, 7, 1)

好像如果是数字排序的话，不加in也可以。

按指定的姓名顺序排序

SELECT * FROM MyTable
WHERE name IN ('张三', '李四', '王五', '孙六')
ORDER BY FIELD(name, '李四', '孙六', '张三', '王五');

How does ORDER BY FIELD() in MySQL work internally https://dba.stackexchange.com/questions/109120/how-does-order-by-field-in-mysql-work-internally

in/not in

not in () 内有 NULL 值导致查询无结果

查 id 未在 t_student 表 class_id 列出现的 t_class 记录：

SELECT *
FROM t_class
WHERE id NOT IN (SELECT class_id FROM t_student);

结果无数据，实际上是有很多无 student 关联的 class 的。

原因： NULL 值问题，t_student 表中 class_id 列包含 NULL 值（有的学生还没有班级） NOT IN () 集合内有 NULL 值会导致条件返回未知结果（UNKNOWN），导致条件无法满足 当使用 NOT IN (子查询) 时，如果子查询结果包含至少一个 NULL 值，整个条件会返回未知结果（UNKNOWN），导致主查询不返回任何记录 因为 SQL 中任何值与 NULL 比较都会返回 UNKNOWN，而不是 TRUE 或 FALSE，数据库无法确定 id 是否等于 NULL（任何与 NULL 的比较都会返回未知）。

解决： 1、排除子查询中的 NULL 值

SELECT *
FROM t_class
WHERE id NOT IN (SELECT class_id FROM t_student WHERE class_id is not null);

2、使用 not exists

exists

查关联表中不存在的

查询用户地址表 user_address 中的 user_id 不在 user 表中的脏数据，可以用 not exists

select *
from user_address ua 
where not exists (
  select 1
  from user u
  where u.id = ua.user_id
);

重复数据取最后更新的

表结构定义：

-- 用户身份历史表
DROP TABLE IF EXISTS user_role_transaction;
CREATE TABLE `user_role_transaction` (
  `id`                   BIGINT       NOT NULL    AUTO_INCREMENT,
  `user_id`              BIGINT       NOT NULL,
  `account_id`           BIGINT       NOT NULL    DEFAULT 0,
  `user_identity`        TINYINT      NOT NULL    DEFAULT 0 COMMENT '用户身份',
  `create_time`          DATETIME     NOT NULL    DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  PRIMARY KEY (`id`),
  KEY (`user_id`),
  KEY (`account_id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8;

存储的是 user_id 的 role 值历史。

要求对于 user_id 相同的记录，选出 create_time 最新的一条，单个查的话很简单，排下序就行，现在要求根据 user_id 批量查。 sql 如下，对于外层的每条记录，去内层查有没有 create_time 更大的，没有的话，选出这条记录：

-- 批量查询每个user_id的最新身份
select *
from user_role_transaction urt
where not exists (
  select 1
  from user_role_transaction urt2
  where urt.user_id = urt2.user_id and urt2.create_time > urt.create_time
)
and user_id in (120253,147896)

这个 sql 的问题是，如果有 user_id 和 create_time 都相同的记录且这个 create_time 是这个 user_id 的最大的，还是会选出多条，所以就依赖于数据必须创建时间不同。如何解决？其实这个需求应该根据 id 筛选，直接选 id 最大的一条，肯定是唯一的，就不应该根据 create_time 查。

SQL中遇到多条相同内容只取一条的最简单实现 https://blog.csdn.net/lizeyang/article/details/18420007

重复数据取最后插入的

表结构同上，根据 user_id 批量查询，对于每个 user_id，如果存在多条数据，选择 id 最大的那条记录，即最后插入的记录，id 为自增主键。

select *
from user_role_transaction urt
where not exists (
  select 1
  from user_role_transaction urt2
  where urt2.user_id=urt.user_id and urt2.id>urt.id
)
and user_id in (153048,153037)

筛选 register_record 表中有比 third_party_user 时间小的记录的，查出其 create_time 最小的一个

select u.id as user_id, case when tpu.name='懂车帝客户' then '6Cvz6a3pPW' else 'eUFUWmA6L6' end as 'code', u.create_time, rr.campaign_code as '首次code'
from third_party_user tpu join user u on tpu.phone=u.mobile
join leads_distribution ld on u.id = ld.user_id
left join register_record rr on u.id=rr.user_id
where tpu.clue_create_time > '2020-04-03 19:28:00' and tpu.clue_create_time < '2020-04-09 16:07:00'
and exists(select * from register_record rr2 where rr2.user_id = u.id and rr2.create_time < tpu.create_time)
and not exists(select * from register_record rr3 where rr3.user_id=u.id and rr3.create_time < rr.create_time)
order by tpu.id desc;

重复数据取最大/最小

ID相同取最小TYPE

-- 方法1:(推荐在数据量较大时使用)
select *
from Temp A
where TYPE=(
    select min(TYPE)
    from Temp
    where ID=A.ID
  )

-- 方法2:
Select *
from Temp A
where not exists(
    select 1
    from Temp
    where ID=A.ID and TYPE<A.TYPE
  )

-- 方法3:
select A.*
from Temp A
  join (
      select min(TYPE)TYPE,ID
      from Temp
      group by ID) B
  on A.ID=B.ID and A.TYPE=B.TYPE

Sql 处理重复记录(相同ID取另一字段最大值) https://www.norbread.com/2018/01/14/sql-duplicated/

case when 条件表达式

groupby sum(case when) 批量分组统计

学生表 StudentDO(id, status, school_id) 查询：输入 school_id 列表，查询每个 school 中的：学生总数、一年级状态学生个数、二年级状态学生个数、已毕业状态学生个数

实现：使用一个 groupby sum(case when) sql 就可以把全部学校的学生状态个数统计查询出来

SELECT 
    student_do.school_id AS schoolId,
    COUNT(*) AS totalCount,
    SUM(CASE WHEN student_do.status = 'FIRST_GRADE' THEN 1 ELSE 0 END) AS firstGradeCount,
    SUM(CASE WHEN student_do.status = 'SECOND_GRADE' THEN 1 ELSE 0 END) AS secondGradeCount,
    SUM(CASE WHEN student_do.status = 'GRADUATED' THEN 1 ELSE 0 END) AS graduatedCount
FROM student_do
WHERE  student_do.school_id IN (?, ?, ?) 
GROUP BY student_do.school_id

case when 查及格和不及格人数

用一个SQL语句完成不同条件的分组例如：stu_cls表有字段name class score，用一条sql查询每个class里及格和不及格的学生人数，结果形式class num(>=60)，num(<60)

select class,
sum(case when score>=60 then 1 else 0 end) '及格人数' ,
sum(case when score<60 then 1 else 0 end) '不及格人数',
count(*) '总人数'
from stu_cls
group by class;

case根据列in分组

name 是 masikkk, devgou, madaimeng 的，nickname 列是 myself name 是其他的， nickname 是 others

select mobile, name, create_time, case when name in ('masikkk','madaimeng','devgou') then 'myself' else 'others' end as 'nickname'
from user;

user_role_transaction 保存的是用户身份变更历史，每变动一次插入一条新数据 user 用户表，user 表中的数据可能在 user_role_transaction 中不存在查询所有用户的最新实时身份 identity：

select u.id, case when urt.id is not null then urt.user_identity else 1 end
from  user u left join user_role_transaction urt on urt.user_id = u.id
WHERE urt.id is null or (urt.user_id = u.id AND NOT exists(
    SELECT *
    FROM user_role_transaction urt2
    WHERE urt2.user_id = urt.user_id AND urt2.id > urt.id)
)
order by u.id;

使用了 case when，对于在 user_role_transaction 表中的 user（即 urt.id is not null）结果为 urt.user_identity，否则（即不在 user_role_transaction 表中的 user）结果为固定值 1

update中使用case when赋值

需求： user 表的 identity 字段是当前用户身份，新加的字段 user_role_transaction 保存的是用户身份变更历史，每变动一次插入一条新数据之前只有 user_role_transaction 历史表，user 表中没有 identity 字段，现在刚加上，要用 user_role_transaction 的 user_identity 字段初始化 user 表的 identity 字段，规则是：对于 user_id 在 user_role_transaction 表中的 user，把 user 表的 identity 字段设置为此 user_id 在 user_role_transaction 表中的最新一个值（即id最大的）对于 user_id 不在 user_role_transaction 表中的 user，把 user 表的 identity 字段设置为固定值 1

sql 如下：

UPDATE user u left join user_role_transaction urt on urt.user_id = u.id
set u.identity = case when urt.id is not null then  urt.user_identity else 1 end
WHERE urt.id is null or (urt.user_id = u.id AND NOT exists(
    SELECT *
    FROM user_role_transaction urt2
    WHERE urt2.user_id = urt.user_id AND urt2.id > urt.id)
);

解释：使用了 update join 根据查询条件更新，由于 user 表中的数据有可能不在 user_role_transaction 表中，使用了 left join 左外连接来保留 user 中非关联数据筛选条件是 urt.id 为空（即左外关联后 user_role_transaction 表对应列是空的，即不在 user_role_transaction 表中的 user），或者关联后 user_role_transaction 中 id 最大的（这里用了个 not exists 筛选没有比他 id 更大的） set 赋值语句使用了 case when，对于在 user_role_transaction 表中的 user（即 urt.id is not null）设置为选出的 urt.user_identity，否则（即不在 user_role_transaction 表中的 user）设置为 1

引用列别名

MySQL 中，where 中不能使用别名，group by/having 和 order by 中可以使用别名 Oracle/Hive/标准SQL/达梦数据库中，where/group by/having 中不能使用列别名，order by 中可以使用别名 解决方法是将别名放到子查询中，例如 select .... from (select col1 ccc from table) where ccc > 1

这是由 sql 语句的执行顺序决定的：

where 子句先执行，此时肯定不知道 select 中的别名从哪里来的，所以肯定无法使用别名
标准SQL中是先执行 group by 和 having 再执行 select 中的列筛选、函数、别名，所以 group by/having 中还不知道别名的含义
MySQL 中对查询做了加强处理，可以先知道别名，所以 group by/having 中可以使用别名
order by 最后执行，使用别名肯定是没有问题的，order by 中甚至可以直接使用栏位的下标来进行排序，如：order by 1 desc,2 asc

例1、where 中使用列别名，报错 [42S22][1054] Unknown column 'times' in 'where clause'

select FROM_UNIXTIME(SUBSTRING(timestamp,1,10),'%Y-%m-%d %H:%i') as times
from user
where times='2022-01-24 02:30'
limit 10;

例2、order by 中可使用列别名

select FROM_UNIXTIME(SUBSTRING(timestamp,1,10),'%Y-%m-%d %H:%i') as times
from user
order by times desc
limit 10;

例3、MySQL中 group by/having 中可使用列别名

select FROM_UNIXTIME(SUBSTRING(timestamp,1,10),'%Y-%m-%d %H:%i') as times, count(*)
from user
group by times
having times > '2022-01-26 02:17';

不等号

mysql中用 <> 与 != 都是可以的，但 sqlserver 中不识别 !=，所以建议用 <>

OR条件没加括号直接和其他条件AND

比如想查询指定user_id的没有订单号的数据，如果写成下面这样就大错特错了：

SELECT * FROM order
WHERE user_id = 2812 AND order_no IS NULL OR order_no = '' ;

正确的写法是用括号把或条件括起来：

SELECT * FROM order
WHERE user_id = 2812 AND
  ( order_no IS NULL OR order_no = '' );

join

left join 或 left outer join : 左连接，返回左表中所有的记录以及右表中连接字段相等的记录。 right join 或 right outer join: 右连接，返回右表中所有的记录以及左表中连接字段相等的记录。 inner join 或 join : 内连接，又叫等值连接，只返回两个表中连接字段相等的行。 full join 或 full outer join: 外连接，返回两个表中的行：left join + right join。 cross join : 结果是笛卡尔积，就是第一个表的行数乘以第二个表的行数。

`join`和`inner join`是完全相同的

mysql 和标准sql 中，join 和 inner join 都是完全相同的 Difference between JOIN and INNER JOIN https://stackoverflow.com/questions/565620/difference-between-join-and-inner-join

连接中的on和where

数据库在通过连接两张或多张表来返回记录时，都会生成一张中间的临时表，然后再将这张临时表返回给用户。在使用 left jion 时，on 和 where 条件的区别如下： 1、on 条件是在生成临时表时使用的条件，它不管 on 中的条件是否为真，都会返回左边表中的记录。 2、where 条件是在临时表生成好后，再对临时表进行过滤的条件。这时已经没有 left join 的含义（必须返回左边表的记录）了，条件不为真的就全部过滤掉。

SQL JOIN 中 on 与 where 的区别 https://www.runoob.com/w3cnote/sql-join-the-different-of-on-and-where.html

`where`逗号分隔等价于`join`或`inner join`

1、如下两个 sql 是相等的，都是带有 on 条件的 inner join

select * from A a join B b on a.id=b.a_id;
select * from A a, B b where a.id=b.a_id;

2、如下两个 sql 也是相等的，都是不带 on 条件的 cross join，结果是两表的笛卡尔积，行数等于 A表行数乘以 B表行数

select * from A a join B b;
select * from A a, B b;

但更推荐使用join语法

INNER JOIN ON vs WHERE clause https://stackoverflow.com/questions/1018822/inner-join-on-vs-where-clause

用left join代替not in

比如我们想查询用户地址表 user_address 中的 user_id 不在 user 表中的脏数据，这些脏数据可能是由于 user_address 表的 user_id 字段没加外键约束导致的。可以用下面的 not in 子句进行条件筛选：

SELECT *
FROM user_address
where user_id not in (select id from user );

或者，使用 left join 左外连接把 user_address 表中独有的数据保留下来，结果中右边user表中没数据的都是null，直接用user的某个字段是否null判断即可，可以用关联后的 user 表的任意字段是否 null 进行判断，一般习惯用id

select *
from user_address as ua left join user as u ON ua.user_id=u.id
where u.id is null;

或者，可以使用 not exists

select *
from user_address ua 
where not exists (
  select 1
  from user u
  where u.id = ua.user_id
);

mysql中`cross join`和`join`以及`inner join`完全相同

mysql 中，cross join和join以及inner join完全相同，无任何区别。见 5.6 版本官网文档 13.2.9.2 JOIN Syntax https://dev.mysql.com/doc/refman/5.6/en/join.html

In MySQL, JOIN, CROSS JOIN, and INNER JOIN are syntactic equivalents (they can replace each other). In standard SQL, they are not equivalent. INNER JOIN is used with an ON clause, CROSS JOIN is used otherwise.

当没有on连接条件时，cross join和join以及inner join 都是笛卡尔积。例如

select * from A cross join B;

可以替换为 inner join 或 join ，和标准sql一样，是做笛卡尔积，结果个数是表A行数乘以表B行数

由于mysql将这三者视为等同，所以cross join也可以加on条件，而标准sql中cross join是不能加条件的。例如

select *
from A as a cross join B as b on a.id=b.a_id;

和使用 inner join 或 join 完全相同。

mysql中没有`full join`全连接

mysql 中没有 full join 语法，下面的sql会报语法错误：

select *
from A as a full join B as b on a.id=b.a_id;

可使用 union 并集代替全连接。

How to do a FULL OUTER JOIN in MySQL? https://stackoverflow.com/questions/4796872/how-to-do-a-full-outer-join-in-mysql

union

MySQL UNION 用于把来自多个 SELECT 语句的结果组合到一个结果集合中。语法为：

SELECT column,... FROM table1
UNION [ALL]
SELECT column,... FROM table2
...

在多个 SELECT 语句中，对应的列应该具有相同的字段属性，且第一个 SELECT 语句中被使用的字段名称也被用于结果的字段名称。

UNION 常用于数据类似的两张或多张表查询，如分库分表中的数据分表，或者是按时间划分的数据历史表等。

注意： 1、union 会去除结果中的重复记录，这里的重复指的是所有字段完全相同，有任意字段不同也算作不同记录。 2、第一个 SELECT 语句中被使用的字段名称也被用于结果的字段名称 3、各 SELECT 语句字段名称可以不同，但字段属性必须一致。

union与union all区别

当使用 UNION 时，MySQL 会把结果集中重复的记录删掉，而使用 UNION ALL ，MySQL 会把所有的记录返回，且效率高于 UNION。

使用 UNION ALL 的时候，只是单纯的把各个查询组合到一起而不会去判断数据是否重复。因此，当确定查询结果中不会有重复数据或者不需要去掉重复数据的时候，应当使用 UNION ALL 以提高查询效率。

为什么用 union all 效率更高？因为他不需要去重

MySQL UNION 与 UNION ALL 语法与用法 https://www.cnblogs.com/zhangminghui/p/4408546.html

union all 多表count求和

select count(*) from t_pic_record_0 where groupid='collection_test_3'
union all
select count(*) from t_pic_record_1 where groupid='collection_test_3'
union all
select count(*) from t_pic_record_2 where groupid='collection_test_3';

结果为

+----------+
| count(*) |
+----------+
|        6 |
|        5 |
|        5 |
+----------+

要求和的话在外层再加一次 sum() 或 count() 即可

select sum(cnt)
from (
    select count(*) cnt from t_pic_record_0 where groupid='collection_test_3'
    union all
    select count(*) cnt from t_pic_record_1 where groupid='collection_test_3'
    union all
    select count(*) cnt from t_pic_record_2 where groupid='collection_test_3'
) t_cnt;

结果

+----------+
| sum(cnt) |
+----------+
|       16 |
+----------+

SQL算术运算

SQL 支持加减乘除求余算术运算

select (select 5)  + (select 3);
select (select 5)  - (select 3);
select (select 5)  * (select 3);
select (select 5)  / (select 3);
select (select 5)  % (select 3);

distinct 多列

select distinct a,b,c from tableA where c > const; 等价于 select a,b,c from tableA where c > const group by a,b,c;

8.2.1.16 DISTINCT Optimization https://dev.mysql.com/doc/refman/5.7/en/distinct-optimization.html

注意此时是将a,b,c三列所有不同的组合全部列出来，而不仅仅只是distinct a

如果想 distinct 只作用于a列，同时又能查出对应的b和c列，可以使用group by a来代替，此时如果唯一的a列对应的b，c列值不同，查出的结果可能具有不确定性。

mysql实现distinct限制一列而查多列的方法 https://blog.csdn.net/liuxiao723846/article/details/79181857

group by

group by 将查询结果按某一列或多列的值分组，值相等的为一组

where 子句中不能使用聚集函数作为条件表达式

join 后 group by 聚合为拼接串

user_label_mapping 是 user id 和标签 id 映射表 user_label 是标签表一个 user 可以有多个标签，查出同一 user 的多个标签并串接为逗号分隔字符串

select user_id, group_concat(ul.name) 'hobbies'
from user_label_mapping ulm join user_label ul on ulm.user_label_id = ul.id
where ulm.type='user_hobby'
group by user_id

join 后先 group by 按 user_id 分组，然后把标签名 group_concat 串接起来。

select字段必须是分组字段或聚集函数

select 中指定的字段要么就要包含在 group by 语句的后面，作为分组的依据；要么就要被包含在聚合函数中。 含有 group by 子句的 select 中指定的字段必须是“分组依据字段”，其他字段若想出现在 select 中则必须包含在聚合函数中。

order字段必须是分组字段或聚集函数

group by后的order by子句中的字段必须是“分组依据字段”，其他字段若想出现在select中则必须包含在聚合函数中。例如

SELECT errortype, isschchg, count(*)
FROM "ais"."re_ais_dcm_changeresult"
WHERE "airlinecode" = 'HU'
group by errortype, isschchg
ORDER BY savedate

执行时报错：ERROR: column "re_ais_dcm_changeresult.savedate" must appear in the GROUP BY clause or be used in an aggregate function 因为savedate字段不是group by分组字段，也不在聚集函数中，若将order by子句改为：ORDER BY count(savedate)，即将savedate放入聚集函数则没问题。

ERROR 1055 (42000) sql_mode=only_full_group_by

select *
from user
group by name;

报错 ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'x.x' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

原因是 SQL 标准中不允许 SELECT 列表，HAVING 条件语句，或 ORDER BY 语句中出现分组字段和聚集函数之外的字段。而 MySQL 中有一个状态 ONLY_FULL_GROUP_BY 来标识是否遵从这一标准，默认为开启状态。

解决 1、关闭 ONLY_FULL_GROUP_BY SET SESSION sql_mode=(SELECT REPLACE(@@sql_mode,'ONLY_FULL_GROUP_BY,','')); 要想永久生效，需要修改 my.cnf 配置并重启mysql，

2、通过 ANY_VALUE() 来改造查询语句以避免报错使用 ANY_VALUE() 包裹的值不会被检查，跳过该错误。 SELECT gender, ANY_VALUE(last_name) FROM employees GROUP BY gender

having子句中不能使用列别名

group by 或 having 子句中不能使用列的别名例：

SELECT a as c, sum(b)
FROM test
GROUP BY c

会提示错误，group by c 改成 group by a 就行。

我在开发中遇到的错误：

[ERROR] 2016-07-05 11:01:04,694 method:com.masikkk.myservice.integratedAnalyse(AgentBookWs.java:83)
integerated analyse query failed!
org.springframework.jdbc.BadSqlGrammarException:
### Error querying database.  Cause: org.postgresql.util.PSQLException: ERROR: column "ras_agent_psgbook.bookingoffice" must appear in the GROUP BY clause or be used in an aggregate function
  Position: 8
### The error may exist in mybatis/PsgBookMapper.xml
### The error may involve com.masikkk.dao.PsgBookMapper.selectIntegratedAnalyseByOffice-Inline
### The error occurred while setting parameters
### SQL: select bookingoffice as operatingOffice, oc as oc,count(1) as addNum    from ras_agent_psgbook    where    oc=? and depdate between ? and ? and bookingdate between ? and ?                  and (depdate > ? or (depdate=? and deptime > ?))                  group by operatingoffice,oc      order by addNum desc
### Cause: org.postgresql.util.PSQLException: ERROR: column "ras_agent_psgbook.bookingoffice" must appear in the GROUP BY clause or be used in an aggregate function
  Position: 8
; bad SQL grammar []; nested exception is org.postgresql.util.PSQLException: ERROR: column "ras_agent_psgbook.bookingoffice" must appear in the GROUP BY clause or be used in an aggregate function

select中给bookingoffice设定别名为operatingOffice，group by中使用别名operatingoffice导致SQL报错。

group by 可以分页

group by 的结果可正常分页：

select age, sum(score)
from student
where class = 'xxx'
group by age
order by sum(score) desc
limit 3,3;

`\G` 列变行

\G 的作用是将查到的结构旋转 90 度变成纵向