Spark Function
groupby
// Compute the average for all numeric columns grouped by department.
df.groupBy($"department").avg()
df.groupBy($"reorder").agg(count("reorder").alias("cnt"))
// Compute the max age and average salary, grouped by department and gender.
df.groupBy($"department", $"gender").agg(Map(
"salary" -> "avg",
"age" -> "max"
))orderBy和sort
排序
抽样
Select
返回子集
返回统计值
agg聚合函数
返回统计值
distinct
collect(搭配toMap实现dataframe转map)
struct(将两列变成tuple,然后变成map;拆分成多列)
join
Reference
Last updated