Chapter 3 Introduction to SQL¶

1 Data Definition¶

数据定义语言（DDL, data-definition language）

定义关系的模式、属性域类型、完整性约束，还可指定索引、安全授权、物理存储结构等信息。

类型	说明
`char(n)`	定长字符串，长度 \(n\) 由用户指定
`varchar(n)`	变长字符串，最大长度 \(n\) 由用户指定
`int/smallint`	整数/短整数，取值范围依赖机器
`numeric(p,d)`	定点数，精度由用户指定为 \(p\) 位数字，其中小数点后保留 \(d\) 位数字
`real/double`	单精度/双精度浮点数，精度依赖机器
`float(n)`	浮点数，精度由用户指定，至少包含 \(n\) 位数字

完整性约束

\(not\ null\)：属性值不可为空
\(primary\ key\ (A_1, ..., A_n)\)：指定主键，主键属性自动满足 \(not\ null\)
\(foreign\ key\ (A_m, ..., A_n)\ references\ r\)：指定外键，关联另一张表的主键

操作	语法	说明
创建表	`create table 表名 (属性1 类型1, 属性2 类型2, ..., 完整性约束1, ...)`	-
删除表	`drop table 表名`	删除表结构及所有数据
清空表数据	`delete from 表名`	仅删除数据，保留表结构（非DDL操作）
增加属性	`alter table 表名 add 属性类型`	新属性值默认赋null
删除属性	`alter table 表名 drop 属性`	多数数据库不支持此操作

示例

教师表（instructor）定义

create table instructor (
    ID char(5),
    name varchar(20) not null,
    dept_name varchar(20),
    salary numeric(8,2),
    primary key (ID),
    foreign key (dept_name) references department
);

学生表（student）定义

create table student (
    ID          varchar(5),
    name        varchar(20) not null,
    dept_name   varchar(20),
    tot_cred    numeric(3,0),
    primary key (ID),
    foreign key (dept_name) references department
);

选课表（takes）定义

create table takes (
    ID          varchar(5),
    course_id   varchar(8),
    sec_id      varchar(8),
    semester    varchar(6),
    year        numeric(4,0),
    grade       varchar(2),

    primary key (ID, course_id, sec_id, semester, year),
    // sec_id（班号）不能从上述主键中移除
    // 以此保证同一个学生在同一学期的同一门课程中，不会被注册到两个不同的班级

    foreign key (ID) references student,
    foreign key (course_id, sec_id, semester, year) references section
);

课程表（course）定义

create table course (
    // 主键可直接与属性定义结合
    course_id   varchar(8) primary key,
    title       varchar(50),
    dept_name   varchar(20),
    credits     numeric(2,0),
    foreign key (dept_name) references department
);

2 Basic Query Structure¶

基本语法：

select 属性1, 属性2, ...
from 表1, 表2, ...
where 条件

注意

等价于多集关系代数 \(\prod_{A1,..An}(\sigma_P(r1 \times r2 \times .. \times rm))\)
SQL 默认保留重复元组
SQL 名称大小写不敏感
SQL 查询的返回结果本身也是一个关系（即一张新的二维表）

拓展语法

去重：select distinct 属性，保留重复用 select all 属性（默认）
全属性查询：select *
算术表达式：select 子句可使用 +、-、*、/，如 salary/12
常量查询：无 from 子句，如 select '437'

WHERE 子句：支持比较运算（=、>、<、>=、<=、<>）和逻辑连接符（and/or/not），条件可基于算术表达式结果。

查找计算机科学系（Comp. Sci.）中薪资高于 80000 的所有教师

select name
from instructor
where dept_name = 'Comp. Sci.' and salary > 80000;

FROM 子句：对应关系代数中的笛卡尔积运算

生成 instructor（教师表）和 teaches（授课表）的笛卡尔积

select *
from instructor, teaches;

连接（Joins）：笛卡尔积结合 WHERE 子句

查找教师姓名及其所教授课程的课程编号

select name, course_id
from instructor, teaches
where instructor.ID = teaches.ID;

查找计算机科学系（Comp. Sci.）开设的每门课程的课程编号、学期、年份及课程名称

select section.course_id, semester, year, title
from section, course
where section.course_id = course.course_id and dept_name = 'Comp. Sci.';

自然连接（Nature Join）：自动匹配两张表所有同名属性，并仅保留一份同名属性列，需避免无关属性同名导致的错误匹配

列出每位教师及其所教授课程的名称

-- 错误版本：会将 course.dept_name 与 instructor.dept_name 进行等值匹配
select name, title
from instructor natural join teaches natural join course;

-- 正确版本
select name, title
from instructor natural join teaches, course
where teaches.course_id = course.course_id;

重命名：用于给表/属性重命名，as 可省略，Oracle 数据库中必须省略

找出所有薪资高于计算机科学系（Comp. Sci.）中任意一位教师的教师姓名

select distinct T.name
from instructor as T, instructor as S
where T.salary > S.salary and S.dept_name = 'Comp. Sci.';

字符串匹配（like）：\(\%\) 匹配任意子串（包括空串），\(\text{_}\) 匹配单个字符

查找所有姓名中包含子字符串 dar 的教师姓名

select name
from instructor
where name like '%dar%';

匹配字符串 100 %

like '100 \%' escape '\'

注：escape '\' 定义反斜杠 \ 为转义符，使 \% 被当作普通百分号处理

模式匹配：区分大小写，支持字符串连接（||）、大小写转换、求长度、提取子串等

模式匹配示例

Intro%：匹配所有以 "Intro" 开头的字符串
%Comp%：匹配所有包含子字符串 "Comp" 的字符串
___：匹配恰好 3 个字符的字符串
___ %：匹配至少 3 个字符的字符串

排序：order by 属性1 [asc/desc], 属性2 [asc/desc]，默认为升序

按字母顺序列出所有教师的姓名

select distinct name
from instructor
order by name

特定的 WHERE 子句谓词：使用 between 比较运算符进行范围查询

查找所有薪资在 90,000 美元到 100,000 美元之间的教师姓名

select name
from instructor
where salary between 90000 and 100000

元组比较：直接对元组整体比较

查询生物系（Biology）的所有教师姓名及该教师教授的课程编号

select name, course_id
from instructor, teaches
where (instructor.ID, dept_name) = (teaches.ID, 'Biology');

重复元组（Duplicates）

多重集合（Multiset）版本的运算符不会自动去重，会完整保留元组的副本数量，SQL 的 SELECT 语句默认遵循这种多重集合语义。

运算符	符号	描述
选择	\(\sigma_\theta(r_1)\)	若元组 \(t_1\) 在 \(r_1\) 中有 \(c_1\) 个副本，且 \(t_1\) 满足选择条件 \(\sigma_\theta\)，则在结果中保留 \(c_1\) 个 \(t_1\) 副本
投影	\(\Pi_A(r_1)\)	对 \(r_1\) 中每一个元组 \(t_1\) 的副本，在结果中生成一个对应的 \(\Pi_A(t_1)\) 副本（\(\Pi_A(t_1)\) 为 \(t_1\) 在属性集 \(A\) 上的投影）
笛卡尔积	\(r_1 \times r_2\)	若元组 \(t_1\) 在 \(r_1\) 中有 \(c_1\) 个副本，元组 \(t_2\) 在 \(r_2\) 中有 \(c_2\) 个副本，则组合元组 \(t_1.t_2\) 在结果中会出现 \(c_1 \times c_2\) 个副本

假设多重集合关系 \(r_1(A,B) = \{(1,a), (2,a)\}\)，\(r_2(C) = \{(2), (3), (3)\}\)

则 \(\Pi_B(r_1) = \{(a), (a)\}\)，\(\Pi_B(r_1) \times r_2 = \{(a,2), (a,2), (a,3), (a,3), (a,3), (a,3)\}\)

3 Set Operations¶

SQL 支持关系代数的集合操作，默认自动去重，保留重复需用多集版本（all）。

集合运算	语法
并集	`query1 union query2`
交集	`query1 intersect query2`
差集	`query1 except query2`

查找在 2009 年秋季或 2010 年春季开设的课程

(select course_id from section where sem = 'Fall' and year = 2009)
union
(select course_id from section where sem = 'Spring' and year = 2010)

查找在 2009 年秋季和 2010 年春季都开设的课程

(select course_id from section where sem = 'Fall' and year = 2009)
intersect
(select course_id from section where sem = 'Spring' and year = 2010)

查找在 2009 年秋季开设但 2010 年春季未开设的课程

(select course_id from section where sem = 'Fall' and year = 2009)
except
(select course_id from section where sem = 'Spring' and year = 2010)

4 Aggregate Functions¶

聚合函数	对 `null` 的处理	全为 `null` 时的结果
`sum()` / `avg()` / `min()` / `max()`	忽略 `null` 值	返回 `null`
`count(列名)`	忽略 `null` 值	返回 `0`
`count(*)`	不忽略任何行	返回总行数

查找计算机科学系教师的平均薪资

select avg(salary)
from instructor
where dept_name = 'Comp. Sci.';

查找在 2010 年春季学期授课的教师总人数

-- 使用 distinct 确保每位教师只被统计一次，避免重复计数
select count(distinct ID)
from teaches
where semester = 'Spring' and year = 2010;

查找 course 关系中的元组总数

select count(*)
from course;

分组聚合：group by 属性1, 属性2, ...，select 子句中的非聚合属性必须出现在 group by 中

查询各院系教师的平均薪资

select dept_name, avg(salary)
from instructor
group by dept_name;
-- 注：没有教师的系不会出现在结果中

错误示例

select dept_name, id, avg(salary)
from instructor
group by dept_name;
-- 按 dept_name 分组时，select 中出现了未聚合、也未出现在 group by 中的 id 列

select id, max(salary) from instructor;
-- 未使用 group by 子句，却混合了普通列 id 和聚合函数 max(salary)

分组过滤：对分组后的结果进行过滤（having），区别于分组前过滤元组（WHERE），紧跟GROUP BY，可使用聚合函数

针对每个院系，找出薪资超过 30000 的教师的平均薪资，然后输出院系名称以及平均值超过 42000 的结果

select dept_name, avg(salary)
from instructor
where salary > 30000
group by dept_name
having avg(salary) > 42000;

5 Nested Subqueries¶

子查询是嵌套在其他查询中的 select-from-where 表达式，可拆分复杂查询，支持在 where/from 子句中使用。

in / not in：判断属性值是否在子查询的结果集中

查找担任学生导师的教师姓名

select name
from instructor
where ID in (select i_id from advisor)

select name
from instructor, advisor
where ID = i_id

查找不担任学生导师的教师姓名

select name
from instructor
where ID not in (select i_id from advisor)

查找在 2009 年秋季和 2010 年春季都开设的课程

select distinct course_id
from section
where semester = 'Fall' and year = 2009
    and course_id in (
        select course_id
        from section
        where semester = 'Spring' and year = 2010
    );

查找在 2009 年秋季和 2010 年春季都开设的课程数量

select count(distinct course_id) as course_count
from section
where semester = 'Fall' and year = 2009
    and course_id in (
        select course_id
        from section
        where semester = 'Spring' and year = 2010
    );

some：存在性比较，F <comp> some 子查询 表示 F 与子查询中至少一个值满足比较条件

查找薪资高于生物系至少一位教师的教师姓名

select T.name
from instructor as T, instructor as S
where T.salary > S.salary and S.dept_name = 'Biology';

select name
from instructor
where salary > some (
    select salary
    from instructor
    where dept_name = 'Biology'
);

示例

\((5 < \textbf{some} \ \begin{bmatrix} 0 \\ 5 \\ 6 \end{bmatrix}) = \textbf{true}\)
\((5 < \textbf{some} \ \begin{bmatrix} 0 \\ 5 \end{bmatrix}) = \textbf{false}\)
\((5 = \textbf{some} \ \begin{bmatrix} 0 \\ 5 \end{bmatrix}) = \textbf{true}\)
\((5 \neq \textbf{some} \ \begin{bmatrix} 0 \\ 5 \end{bmatrix}) = \textbf{true}\)

注意

\((= \ some) \equiv in\)
\((\neq \ some) \not\equiv not\ in\)

all：全量比较，F <comp> all 子查询 表示 F 与子查询中所有值满足比较条件

查找薪资高于生物系所有教师的教师姓名

select name
from instructor
where salary > all (
    select salary
    from instructor
    where dept_name = 'Biology'
);

示例

\((5 < \text{all}\ \begin{bmatrix}0\\5\\6\end{bmatrix}) = \text{false}\)
\((5 > \text{all}\ \begin{bmatrix}0\\5\\6\end{bmatrix}) = \text{false}\)
\((5 \neq \text{all}\ \begin{bmatrix}0\\5\\6\end{bmatrix}) = \text{false}\)
\((5 = \text{all}\ \begin{bmatrix}0\\5\\6\end{bmatrix}) = \text{false}\)
\((5 < \text{all}\ \begin{bmatrix}6\\10\end{bmatrix}) = \text{true}\)
\((5 = \text{all}\ \begin{bmatrix}4\\5\end{bmatrix}) = \text{false}\)
\((5 \neq \text{all}\ \begin{bmatrix}4\\6\end{bmatrix}) = \text{true}\)

注意

\((\neq \text{ all}) \equiv \text{not in}\)
\((= \text{ all}) \not\equiv \text{in}\)

exists / not exists：判断子查询结果是否非空

找出担任学生导师的教师姓名

select name
from instructor
where exists (
    select *
    from advisor
    where i_id = instructor.ID
);

找出未担任学生导师的教师姓名

select name
from instructor
where not exists (
    select *
    from advisor
    where i_id = instructor.ID
);

找出在 2009 年秋季与 2010 年春季均开设的课程

select distinct course_id
from section as S
where semester = 'Fall' and year = 2009
    and exists (
        select *
        from section as T
        where semester = 'Spring' and year = 2010 and T.course_id = S.course_id
);

instructor.ID 和 S.course_id 这类不属于子查询所属关系的属性，被称为相关变量（correlation variables）。

找出修完了生物系全部课程的所有学生

select S.ID, S.name
from student as S
-- 不存在该学生没修的生物系课程
where not exists (
    -- 查询生物系开设的所有课程
    (select course_id
    from course
    where dept_name = 'Biology')
    except
    -- 查询当前学生已经修过的所有课程
    (select T.course_id
    from takes as T
    -- 关联外层学生，只查当前学生的选课记录
    where T.ID = S.ID
    )
);

unique：检测子查询结果是否无重复元组，空集返回 true

找出在 2009 年最多只开设过一次的所有课程

select T.course_id
from course as T
where unique (
    select R.course_id
    from section as R
    where R.course_id = T.course_id and R.year = 2009
);

from 子句中的子查询：子查询结果作为临时关系参与查询，lateral 子句允许 from 子句中 lateral 关键字之后的部分访问前面部分的相关变量（部分数据库不支持）

查询平均薪资超过 42,000 美元的院系，及其教师平均薪资

通过子查询先聚合，再在外层过滤

select dept_name, avg_salary
from (
    -- 子查询：先按院系分组，计算每个院系的平均薪资
    select dept_name, avg(salary) as avg_salary
    from instructor
    group by dept_name
)
-- 外层查询：筛选平均薪资 > 42000 的院系
where avg_salary > 42000;

显式命名派生表

select dept_name, avg_salary
from (
    select dept_name, avg(salary)
    from instructor
    group by dept_name
)
-- 为派生表和列指定别名
as dept_avg (dept_name, avg_salary)
where avg_salary > 42000;

使用 lateral 子句（课件提供的代码，但似乎不符合要求）

select name, salary, avg_salary
from instructor I1,
    lateral (
        select avg(salary) as avg_salary
        from instructor I2
        where I2.dept_name = I1.dept_name
    );

with 子句：定义临时视图，其定义仅对当前查询有效，简化复杂的嵌套子查询，类似关系代数的赋值操作

找出预算最高的所有院系

with max_budget (value) as
    (select max(budget) from department)
select dept_name
from department, max_budget
where department.budget = max_budget.value;

找出总薪资高于所有院系总薪资平均值的所有院系

with
-- 计算每个院系的总薪资，命名为 dept_total
dept_total (dept_name, value) as
    (select dept_name, sum(salary)
    from instructor
    group by dept_name),
-- 基于 dept_total，计算所有院系总薪资的平均值
dept_total_avg(value) as
    (select avg(value) from dept_total)
-- 筛选出总薪资 ≥ 平均值的院系
select dept_name
from dept_total, dept_total_avg
where dept_total.value >= dept_total_avg.value;

标量子查询：返回单个值的子查询，可用于需要单值的场景（如 where 子句的比较），若返回多值则报运行时错误。

查询薪资的十倍超过所在院系预算的教师

select name from instructor
where salary *10 > (
    select budget
    from department
    where department.dept_name = instructor.dept_name
);

找出预算最高的所有院系（不使用 with 子句）

select dept_name
from department
where budget = (select max(budget) from department);

6 Modification of the Database¶

删除：delete from 表名 where 条件，无 where 则删除表中所有数据，条件可包含子查询

删除金融系（Finance）的所有教师

delete from instructor
where dept_name = 'Finance';

删除所有位于 Watson 大楼院系的教师

delete from instructor
where dept_name in (
    select dept_name
    from department
    where building = 'Watson'
);

删除所有薪资低于教师平均薪资的教师

delete from instructor
where salary < (select avg(salary) from instructor);

插入：支持单条插入和从其他表查询插入，未指定的属性赋 null

向 course 表添加一条新记录

省略列名（需按表定义的列顺序提供值）

insert into course
values ('CS-437', 'Database Systems', 'Comp. Sci.', 4);

显式指定列名（更清晰、可维护，推荐使用）

insert into course (course_id, title, dept_name, credits)
values ('CS-437', 'Database Systems', 'Comp. Sci.', 4);

将所有教师信息插入 student 表，并将 tot_creds 设为 0

insert into student
select ID, name, dept_name, 0
from instructor;

更新：update 表名 set 属性=新值 where 条件，无 where 则更新所有元组，支持 case 语句实现多条件更新（避免多次更新的顺序问题），新值可通过标量子查询获取

为所有教师薪资上调 5%

update instructor
set salary = salary * 1.05;

分条件调整薪资：薪资超过 100,000 美元的教师上调 3%，其余教师上调 5%

分两条语句执行（需注意执行顺序）

update instructor
set salary = salary * 1.03
where salary > 100000;

update instructor
set salary = salary * 1.05
where salary <= 100000;

使用 case 语句实现条件更新

update instructor
set salary = case
    when salary <= 100000 then salary * 1.05
    else salary * 1.03
end;

重新计算所有学生的总学分

update student S
set tot_creds = (
    select sum(credits)
    from takes natural join course
    where S.ID = takes.ID
        and takes.grade <> 'F'
        and takes.grade is not null
);

复制表结构：create table 新表名 like 原表名，创建与现有表 schema 完全相同的空表