类似pandas中的分组再计算的效果,hive可用ROW_NUMBER() ROW_NUMBER() OVER (partition BY COLUMN_A ORDER BY COLUMN_B ASC/DESC) rn rn 是排序的别名执行时每组的编码从1开始 partition by:类似hive的建表,分区的意思;COLUMN_A 是分组字段 order by :排序,默认是升序,加desc降序;COLUMN_B 是排序字段
注:我们可以观察到group by可以实现同样的分组聚合功能,但sql语句不能写与分组聚合无关的字段,否则会报错,即group by 与over(partition by ......)主要区别为,带上group by的hive sql语句只能显示与分组聚合相关的字段,而带上over(partition by ......)的hive sql语句能显示所有字段.。
CASE [ expression ]
WHEN condition1 THEN result1
WHEN condition2 THEN result2
...
WHEN conditionn THEN resultn
ELSE result
END
SELECT
CASE WHEN fo LIKE '搜索结果/%' THEN split(fo,'/')[2] WHEN fo LIKE '音乐电台/公共电台/综合搜索/%' THEN split(fo,'/')[3] END kw,
SUM(CASE WHEN sex_age.sex='Male' THEN sex_age.age ELSE 0 END)
where trim(fs) <> '播放错误'
where split('fo_2','/')[2] like '薛之谦%'
cast(st as bigint)
select sh,sn from (select sh,sn,row_number() over(partition by sh order by cnt desc) rn from (select sh,sn,count(1) cnt from pc_part where spt_cnt>=10 group by sh,sn) c0) c1 where rn=1
select data_field,
regexp_extract(data_field,'.*?bgStart\\=([^&]+)',1) as aaa,
regexp_extract(data_field,'.*?contentLoaded_headStart\\=([^&]+)',1) as bbb,
regexp_extract(data_field,'.*?AppLoad2Req\\=([^&]+)',1) as ccc
from pt_nginx_loginlog_st
where pt = '2012-03-26'limit 2;
select day_id,mac_id,mac_color,day_num,sum(day_num)over(partition by day_id,mac_id order by day_id) sum_num from test_temp_mac_id;
select day_id,mac_id,sum(day_num) sum_num from test_temp_mac_id group by day_id,mac_id order by day_id;
concat(string A, string B…) 连接多个字符串,合并为一个字符串,可以接受任意数量的输入字符串
concat_ws(string SEP, string A, string B…) 链接多个字符串,字符串之间以指定的分隔符分开。
hive -e"set hive.cli.print.header=true;
set mapreduce.job.queuename=root.baseDepSarchQueue;
select CONCAT_WS('@@',songname, max(ownercount),max(playcount)) from (
select songname, ownercount, playcount from common.st_k_mixsong_part where dt='2018-10-01' and ownercount>0 and playcount>0 )a group by songname;">song_info.csv