SQL<统计每个学校各难度的用户平均刷题数>的两种解法
题目
运营想要计算一些参加了答题的不同学校、不同难度的用户平均答题量,请你写SQL取出相应数据。
牛客:SQL23 统计每个学校各难度的用户平均刷题数
用户信息表:user_profile
id | device_id | gender | age | university | gpa | 30days | question_cnt | answer_cnt |
---|---|---|---|---|---|---|---|---|
1 | 2138 | male | 21 | 北京大学 | 3.4 | 7 | 2 | 12 |
2 | 3214 | male | 复旦大学 | 4 | 15 | 5 | 25 | |
3 | 6543 | female | 20 | 北京大学 | 3.2 | 12 | 3 | 30 |
4 | 2315 | female | 23 | 浙江大学 | 3.6 | 5 | 1 | 2 |
5 | 5432 | male | 25 | 山东大学 | 3.8 | 20 | 15 | 70 |
6 | 2131 | male | 28 | 山东大学 | 3.3 | 15 | 7 | 13 |
7 | 4321 | male | 28 | 复旦大学 | 3.6 | 9 | 6 | 52 |
题库练习明细表:question_practice_detail
id | device_id | question_id | result |
---|---|---|---|
1 | 2138 | 111 | wrong |
2 | 3214 | 112 | wrong |
3 | 3214 | 113 | wrong |
4 | 6543 | 111 | right |
5 | 2315 | 115 | right |
6 | 2315 | 116 | right |
7 | 2315 | 117 | wrong |
8 | 5432 | 117 | wrong |
9 | 5432 | 112 | wrong |
10 | 2131 | 113 | right |
11 | 5432 | 113 | wrong |
12 | 2315 | 115 | right |
13 | 2315 | 116 | right |
14 | 2315 | 117 | wrong |
15 | 5432 | 117 | wrong |
16 | 5432 | 112 | wrong |
17 | 2131 | 113 | right |
18 | 5432 | 113 | wrong |
题目难度表:question_detail
id | question_id | diffcult_level |
---|---|---|
1 | 111 | hard |
2 | 112 | medium |
3 | 113 | easy |
4 | 115 | easy |
5 | 116 | medium |
6 | 117 | easy |
输出示例
university | difficult_level | avg_answer_cnt |
---|---|---|
北京大学 | hard | 1.0000 |
复旦大学 | easy | 1.0000 |
复旦大学 | medium | 1.0000 |
山东大学 | easy | 4.5000 |
山东大学 | medium | 3.0000 |
浙江大学 | easy | 5.0000 |
浙江大学 | medium | 2.0000 |
题解
方法1:对各学校、难度分组,获取刷题总数,去重获取用户数,直接计算。
select u.university, difficult_level, count(d.question_id)/count(distinct u.device_id)avg_answer_cnt from
user_profile u
join question_practice_detail p on u.device_id=p.device_id
join question_detail d on p.question_id=d.question_id
group by u.university,
d.difficult_level
⚠️ 下述方法低效且多余,但当初自己绕进去了,便分享一下,若不想被折磨建议别看。
方法2:子查询先获取各学校、各难度的用户数量;主查询获取各学校、各难度刷题总数,再除以用户数量。
涉及:多字段表连接、count distinct与group by的关系、group by的select结果
select
up.university,
qd.difficult_level,
count(qpd.question_id) / sub.people_num
FROM
user_profile up
JOIN question_practice_detail qpd ON up.device_id = qpd.device_id
JOIN question_detail qd ON qpd.question_id = qd.question_id
JOIN (
select
university,
difficult_level,
count(distinct u.device_id) people_num #3.将deviceid distinct,获得各难度用户数量
#4.此时difficult_level为distinct,而university为indistinct,但difficult_level和university的组合是distinct的;获各难度下用户数量。
from
user_profile u
join question_practice_detail q on u.device_id = q.device_id
join question_detail d on q.question_id = d.question_id
#1.连接各表后,deviceid为indistinct,因为practice中同一deviceid对应多个questionid
group by
university,
difficult_level
#2.获各难度下deviceid(indistinct)、questionid
) AS sub ON up.university = sub.university -- 按学校关联
AND qd.difficult_level = sub.difficult_level -- 按难度关联
#5.为确保将各学校、各难度用户数量传递正确,将用户数量传递给所有对应组合,故同时关联学校、难度;不会由于子查询的join新增行数,只是为各行加了people_num字段。
#6.所有表连接后和子查询中的表连接结果一样,只是新加了people_num
group by
up.university,
qd.difficult_level,
sub.people_num #因为select中用到所以写上,不会影响groupby,因为people_num对应的就是独特的university和difficult_level组合。
count(distinct u.device_id)
不使用distinct的方法:
再套一层子查询,用group by device_id去重,再在外层查询中count(device_id)
方法3:子查询先获取各学校、各难度刷题总数;主查询获取各学校、各难度的用户数量,再计算。
select
up.university,
qd.difficult_level,
sub.total_questions / count(distinct up.device_id) as avg_answer_cnt -- 用子查询的刷题总数计算每个用户的平均刷题数
FROM
user_profile up
JOIN question_practice_detail qpd ON up.device_id = qpd.device_id
JOIN question_detail qd ON qpd.question_id = qd.question_id
JOIN (
-- 子查询:按学校和难度获取每个学校、难度下的刷题总数
select
university,
difficult_level,
count(qpd.question_id) as total_questions -- 获取刷题总数
from
user_profile up
join question_practice_detail qpd on up.device_id = qpd.device_id
join question_detail qd on qpd.question_id = qd.question_id
group by
university,
difficult_level
) AS sub ON up.university = sub.university
AND qd.difficult_level = sub.difficult_level
group by
up.university,
qd.difficult_level,
sub.total_questions
[...]and sub1.datenow=sub2.datelag+1 用多字段表连接将同用户的datelag与datenow匹配[...]