🔍

Grouping/aggregation

Grouping fields is done in the third argument. It should be a dict with keys representing resulting field names and values representing source fields. A field dict in the fourth argument plays a different role when grouping dict is present. It defines grouping expressions to apply.

E.g.:

o)t:((+:)`a`b`c`d!(1 2 2;3 4 4;6 7 7;8 9 10));
o)gf: `a`b`c!`a`b`c; // grouping fields
o)ge: (,`d)!(,(sum;`d)); //grouping expressions
o)0N#.(?[t;(); gf; ge])
a b c d
--------
1 3 6 8
2 4 7 19
o)

Expressions for group dict values are supported. Conventional functional syntax is used:

o)t:(+:)`a`b`c`d!(1 2 2;3 4 4;6 7 7;8 9 10);
o)gf: `a`b!`a`b; // grouping fields
o)ge: (,`e)!(,(sum;(+;`c;`d))); //grouping expressions
o)0N#.(?[t;(); gf; ge])
a b e
------
1 3 14
2 4 33
o)

As for grouping expressions, the following list of monads is currently supported and optimized for performance:

Exp Meaning Description
#: count number of non-null values in group
*: first first non-null value in group
last last last non-null value in group
min minimum minimum non-null value in group
max maximum maximum non-null value in group
sum sum sum of non-null values in group
avg arithmetic average average of non-null values in group

User-defined monads are accepted too but they may lead to slower performance as all intermediate group vectors must be preserved during query processing and a more general approach is used:

o)t:((+:)`a`b`c`d!(1 2 3;3 4 5;6 7 8;8 9 10));
o)gf:{+/x};
o)0N#.(?[t;();`a`b`c!`a`b`c;(,`d)!(,(gf;`d))])
a b c d
--------
1 3 6 8
2 4 7 9
3 5 8 10
o)

See "Grouping using incremental lambdas" to speed-up custom user-defined aggregations.

Generic aggregation only (without explicit grouping fields) is done like this:

o)t1:((+:)`a`b`c!(1 2 3 2;3 4 5 6;6 7 8 8));
o)0N#.(?[(`t1;t1);();();`d`e!(({+/x};~`t1`c);({+/x};~`t1`b))])
d  e
-----
29 18
o)

Pay attention to an empty list in the third argument that enables aggregation only.

Aggregation only (without explicit grouping fields) using special monads is done like:

o)t:((+:)`a`b`c`d!(1 2 3;3 4 5;6 7 8;8 9 10));
o)0N#.(?[(`t;t);();0b;`a`b`c`d!((sum;~`t`a);(avg;~`t`b);(max;~`t`c);(min;~`t`d))])
a b c d
-------
6 4 8 8
o)

Grouping using incremental lambdas

As it has been noted before, grouping using custom/user monads is supported, but it can result in excessive memory usage. It happens due to necessity ofaccumulating all intermediate data in grouping buckets until all data is collected. Another way of grouping is supported specifically to make it workincrementally. Incremental grouping does its job by splitting entire data in chunks and processing each data chunk as comes.

Essentially, incremental grouping requires three user-defined lambdas. First lambda - to initialize grouping state, preparing data structures. Second - doing incremental aggregation itself. And third - to finalize grouping on completion.

Key Meaning Input parameters Expected result
`init Initialize state [cg; data], where`cg` - number of groups found so far, `data` - current data chunk Any shaped initialized aggregation state
`aggr Incremental aggregation [s; cg; ix; data], where`s` - current aggregation state, `cg` - number of groups found so far, `ix` - vector of group indices for each element in `data`,`data` - current data chunk Aggregation state
`fin Finalize aggregation state [s], where`s` - current aggregation state Finalized aggregation state
Make sure `aggr lambda resizes state data for coming data & handle nulls as needed.

And the simple example of aggregation via summing:

o) t:+`a`b!(1 2 2;3 4 0N);
o) d:`init`aggr`fin!({[cg;data] cg#0}; {[s;cg;ix;data] .[`s;();,;(cg-#s)#0]; .[`data;();~[^];0]; @[`s;ix;+;data]; s};{[s] s});
o) 0N#(.?[t;();`a!`a;`b!(d;`b)])
a b
---
1 3
2 4

<<< prev ORDER
next >>> DISTINCT