🔍

Grouping/aggregation

Grouping fields is done in the third argument. It should be a dict with keys representing resulting field names and values representing source fields. A field dict in the fourth argument plays a different role when grouping dict is present. It defines grouping expressions to apply.

E.g.:

o)t:((+:)`a`b`c`d!(1 2 2;3 4 4;6 7 7;8 9 10));
o)gf: `a`b`c!`a`b`c; // grouping fields
o)ge: (,`d)!(,(sum;`d)); //grouping expressions
o)0N#.(?[t;(); gf; ge])
a b c d
--------
1 3 6 8
2 4 7 19
o)

Expressions for group dict values are supported. Conventional functional syntax is used:

o)t:(+:)`a`b`c`d!(1 2 2;3 4 4;6 7 7;8 9 10);
o)gf: `a`b!`a`b; // grouping fields
o)ge: (,`e)!(,(sum;(+;`c;`d))); //grouping expressions
o)0N#.(?[t;(); gf; ge])
a b e
------
1 3 14
2 4 33
o)

As for grouping expressions, the following list of monads is currently supported and optimized for performance:

Exp	Meaning	Description
#:	count	number of non-null values in group
*:	first	first non-null value in group
last	last	last non-null value in group
min	minimum	minimum non-null value in group
max	maximum	maximum non-null value in group
sum	sum	sum of non-null values in group
avg	arithmetic average	average of non-null values in group

User-defined monads are accepted too but they may lead to slower performance as all intermediate group vectors must be preserved during query processing and a more general approach is used:

o)t:((+:)`a`b`c`d!(1 2 3;3 4 5;6 7 8;8 9 10));
o)gf:{+/x};
o)0N#.(?[t;();`a`b`c!`a`b`c;(,`d)!(,(gf;`d))])
a b c d
--------
1 3 6 8
2 4 7 9
3 5 8 10
o)

See "Grouping using incremental lambdas" to speed-up custom user-defined aggregations.

Generic aggregation only (without explicit grouping fields) is done like this:

o)t1:((+:)`a`b`c!(1 2 3 2;3 4 5 6;6 7 8 8));
o)0N#.(?[(`t1;t1);();();`d`e!(({+/x};~`t1`c);({+/x};~`t1`b))])
d  e
-----
29 18
o)

Pay attention to an empty list in the third argument that enables aggregation only.

Aggregation only (without explicit grouping fields) using special monads is done like:

o)t:((+:)`a`b`c`d!(1 2 3;3 4 5;6 7 8;8 9 10));
o)0N#.(?[(`t;t);();0b;`a`b`c`d!((sum;~`t`a);(avg;~`t`b);(max;~`t`c);(min;~`t`d))])
a b c d
-------
6 4 8 8
o)

Grouping using incremental lambdas

As it has been noted before, grouping using custom/user monads is supported, but it can result in excessive memory usage. It happens due to necessity ofaccumulating all intermediate data in grouping buckets until all data is collected. Another way of grouping is supported specifically to make it workincrementally. Incremental grouping does its job by splitting entire data in chunks and processing each data chunk as comes.

Essentially, incremental grouping requires three user-defined lambdas. First lambda - to initialize grouping state, preparing data structures. Second - doing incremental aggregation itself. And third - to finalize grouping on completion.

Key	Meaning	Input parameters	Expected result
`init	Initialize state	[cg; data], whеre`cg` - number of groups found so far, `data` - current data chunk	Any shaped initialized aggregation state
`aggr	Incremental aggregation	[s; cg; ix; data], whеre`s` - current aggregation state, `cg` - number of groups found so far, `ix` - vector of group indices for each element in `data`,`data` - current data chunk	Aggregation state
`fin	Finalize aggregation state	[s], whеre`s` - current aggregation state	Finalized aggregation state

Make sure `aggr lambda resizes state data for coming data & handle nulls as needed.

And the simple example of aggregation via summing:

o) t:+`a`b!(1 2 2;3 4 0N);
o) d:`init`aggr`fin!({[cg;data] cg#0}; {[s;cg;ix;data] .[`s;();,;(cg-#s)#0]; .[`data;();~[^];0]; @[`s;ix;+;data]; s};{[s] s});
o) 0N#(.?[t;();`a!`a;`b!(d;`b)])
a b
---
1 3
2 4

<<< prev ORDER

next >>> DISTINCT