题目

运营想要计算一些参加了答题的不同学校、不同难度的用户平均答题量,请你写SQL取出相应数据。
牛客:SQL23 统计每个学校各难度的用户平均刷题数
用户信息表:user_profile

iddevice_idgenderageuniversitygpa30daysquestion_cntanswer_cnt
12138male21北京大学3.47212
23214male 复旦大学415525
36543female20北京大学3.212330
42315female23浙江大学3.6512
55432male25山东大学3.8201570
62131male28山东大学3.315713
74321male28复旦大学3.69652

题库练习明细表:question_practice_detail

iddevice_idquestion_idresult
12138111wrong
23214112wrong
33214113wrong
46543111right
52315115right
62315116right
72315117wrong
85432117wrong
95432112wrong
102131113right
115432113wrong
122315115right
132315116right
142315117wrong
155432117wrong
165432112wrong
172131113right
185432113wrong

题目难度表:question_detail

idquestion_iddiffcult_level
1111hard
2112medium
3113easy
4115easy
5116medium
6117easy

输出示例

universitydifficult_levelavg_answer_cnt
北京大学hard1.0000
复旦大学easy1.0000
复旦大学medium1.0000
山东大学easy4.5000
山东大学medium3.0000
浙江大学easy5.0000
浙江大学medium2.0000

题解

方法1:对各学校、难度分组,获取刷题总数,去重获取用户数,直接计算。

select u.university, difficult_level, count(d.question_id)/count(distinct u.device_id)avg_answer_cnt from 
user_profile u
join question_practice_detail p on u.device_id=p.device_id
join question_detail d on p.question_id=d.question_id
group by u.university,
d.difficult_level

⚠️ 下述方法低效且多余,但当初自己绕进去了,便分享一下,若不想被折磨建议别看。

方法2:子查询先获取各学校、各难度的用户数量;主查询获取各学校、各难度刷题总数,再除以用户数量。

涉及:多字段表连接、count distinct与group by的关系、group by的select结果

select
    up.university,
    qd.difficult_level,
    count(qpd.question_id) / sub.people_num
FROM
    user_profile up
    JOIN question_practice_detail qpd ON up.device_id = qpd.device_id
    JOIN question_detail qd ON qpd.question_id = qd.question_id
    JOIN (
        select
            university,
            difficult_level,
            count(distinct u.device_id) people_num #3.将deviceid distinct,获得各难度用户数量
            #4.此时difficult_level为distinct,而university为indistinct,但difficult_level和university的组合是distinct的;获各难度下用户数量。
        from
            user_profile u
            join question_practice_detail q on u.device_id = q.device_id
            join question_detail d on q.question_id = d.question_id
            #1.连接各表后,deviceid为indistinct,因为practice中同一deviceid对应多个questionid            
        group by
            university,
            difficult_level
            #2.获各难度下deviceid(indistinct)、questionid
    ) AS sub ON up.university = sub.university -- 按学校关联
    AND qd.difficult_level = sub.difficult_level -- 按难度关联
    #5.为确保将各学校、各难度用户数量传递正确,将用户数量传递给所有对应组合,故同时关联学校、难度;不会由于子查询的join新增行数,只是为各行加了people_num字段。
    #6.所有表连接后和子查询中的表连接结果一样,只是新加了people_num
group by
    up.university,
    qd.difficult_level,
    sub.people_num #因为select中用到所以写上,不会影响groupby,因为people_num对应的就是独特的university和difficult_level组合。
count(distinct u.device_id)不使用distinct的方法:

再套一层子查询,用group by device_id去重,再在外层查询中count(device_id)

方法3:子查询先获取各学校、各难度刷题总数;主查询获取各学校、各难度的用户数量,再计算。

select
    up.university,
    qd.difficult_level,
    sub.total_questions / count(distinct up.device_id) as avg_answer_cnt  -- 用子查询的刷题总数计算每个用户的平均刷题数
FROM
    user_profile up
    JOIN question_practice_detail qpd ON up.device_id = qpd.device_id
    JOIN question_detail qd ON qpd.question_id = qd.question_id
    JOIN (
        -- 子查询:按学校和难度获取每个学校、难度下的刷题总数
        select
            university,
            difficult_level,
            count(qpd.question_id) as total_questions  -- 获取刷题总数
        from
            user_profile up
            join question_practice_detail qpd on up.device_id = qpd.device_id
            join question_detail qd on qpd.question_id = qd.question_id
        group by
            university,
            difficult_level
    ) AS sub ON up.university = sub.university
    AND qd.difficult_level = sub.difficult_level
group by
    up.university,
    qd.difficult_level,
    sub.total_questions
SQL<统计每个学校各难度的用户平均刷题数>的两种解法

https://bluish.net/archives/2194/

作者

bluish

发布时间

2025-02-16

许可协议

CC BY-SA 4.0

标签: sql

仅有一条评论

  1. [...]and sub1.datenow=sub2.datelag+1 用多字段表连接将同用户的datelag与datenow匹配[...]

添加新评论