Grouping/aggregation
Grouping fields is done in the third argument. It should be a dict with keys representing resulting field names and values representing source fields. A field dict in the fourth argument plays a different role when grouping dict is present. It defines grouping expressions to apply.
E.g.:
o)t:((+:)`a`b`c`d!(1 2 2;3 4 4;6 7 7;8 9 10));
o)gf: `a`b`c!`a`b`c; // grouping fields
o)ge: (,`d)!(,(sum;`d)); //grouping expressions
o)0N#.(?[t;(); gf; ge])
a b c d
--------
1 3 6 8
2 4 7 19
o)
Expressions for group dict values are supported. Conventional functional syntax is used:
o)t:(+:)`a`b`c`d!(1 2 2;3 4 4;6 7 7;8 9 10);
o)gf: `a`b!`a`b; // grouping fields
o)ge: (,`e)!(,(sum;(+;`c;`d))); //grouping expressions
o)0N#.(?[t;(); gf; ge])
a b e
------
1 3 14
2 4 33
o)
As for grouping expressions, the following list of monads is currently supported and optimized for performance:
Exp | Meaning | Description |
---|---|---|
#: | count | number of non-null values in group |
*: | first | first non-null value in group |
last | last | last non-null value in group |
min | minimum | minimum non-null value in group |
max | maximum | maximum non-null value in group |
sum | sum | sum of non-null values in group |
avg | arithmetic average | average of non-null values in group |
User-defined monads are accepted too but they may lead to slower performance as all intermediate group vectors must be preserved during query processing and a more general approach is used:
o)t:((+:)`a`b`c`d!(1 2 3;3 4 5;6 7 8;8 9 10));
o)gf:{+/x};
o)0N#.(?[t;();`a`b`c!`a`b`c;(,`d)!(,(gf;`d))])
a b c d
--------
1 3 6 8
2 4 7 9
3 5 8 10
o)
Generic aggregation only (without explicit grouping fields) is done like this:
o)t1:((+:)`a`b`c!(1 2 3 2;3 4 5 6;6 7 8 8));
o)0N#.(?[(`t1;t1);();();`d`e!(({+/x};~`t1`c);({+/x};~`t1`b))])
d e
-----
29 18
o)
Pay attention to an empty list in the third argument that enables aggregation only.
Aggregation only (without explicit grouping fields) using special monads is done like:
o)t:((+:)`a`b`c`d!(1 2 3;3 4 5;6 7 8;8 9 10));
o)0N#.(?[(`t;t);();0b;`a`b`c`d!((sum;~`t`a);(avg;~`t`b);(max;~`t`c);(min;~`t`d))])
a b c d
-------
6 4 8 8
o)
Grouping using incremental lambdas
As it has been noted before, grouping using custom/user monads is supported, but it can result in excessive memory usage. It happens due to necessity ofaccumulating all intermediate data in grouping buckets until all data is collected. Another way of grouping is supported specifically to make it workincrementally. Incremental grouping does its job by splitting entire data in chunks and processing each data chunk as comes.
Essentially, incremental grouping requires three user-defined lambdas. First lambda - to initialize grouping state, preparing data structures. Second - doing incremental aggregation itself. And third - to finalize grouping on completion.
Key | Meaning | Input parameters | Expected result |
---|---|---|---|
`init | Initialize state | [cg; data], whеre`cg` - number of groups found so far, `data` - current data chunk | Any shaped initialized aggregation state |
`aggr | Incremental aggregation | [s; cg; ix; data], whеre`s` - current aggregation state, `cg` - number of groups found so far, `ix` - vector of group indices for each element in `data`,`data` - current data chunk | Aggregation state |
`fin | Finalize aggregation state | [s], whеre`s` - current aggregation state | Finalized aggregation state |
And the simple example of aggregation via summing:
o) t:+`a`b!(1 2 2;3 4 0N);
o) d:`init`aggr`fin!({[cg;data] cg#0}; {[s;cg;ix;data] .[`s;();,;(cg-#s)#0]; .[`data;();~[^];0]; @[`s;ix;+;data]; s};{[s] s});
o) 0N#(.?[t;();`a!`a;`b!(d;`b)])
a b
---
1 3
2 4