🔍

Relational queries are nice tool to have in a vector language. They form a special sub-language operating on tables somewhat similar to SQL. The term "record" referenced below means a row of fields taken from a table.

Queries support is currently limited to "select" polyad & "upsert" triad.

Select

In order to execute a query, one must first define a query expression, compile it into internal representation and apply "fetch" verb to get results. All three stages are done explicitly.

Overall syntax

Overall syntax for query stages looks like:

q: ?[ t; w; g; f; ot; os ]; // define query
cq: opt . q; // compile query
res:  fe#cq; // fetch query results

whеre

t is a table-like data source. It might be a table/query/join/union expression.

w is a nested list of conditions. It might be an empty list to define no conditions.

g is a grouping dict or 0b expression for none.

f is a field dict.

ot is an optional "take" argument. It might be either a scalar or a two-element integer vector. It defines a range of record indices to limit results.

os is an optional "sorting" argument which must be a nested list of fields to sort with direction.

opt is an optional "compilation options" argument which must be a dict with specific fields. It overrides default query compile options (e.g. optimization level) locally. It is implemented since 0.6.0 version.

Compilation options

"Compilation options" dict field list consists of several possible values:

  • `qO (optimization level) and must have 0..3 integer values only.
  • `qP (enable parallel execution) and must have 0b/1b values.
Opt level Meaning Usage recommendations
0 All optimizations are disabled For "debug" cases or for "perfect" hand-written queries which lack any possible improvement.
1 Quick & basic optimizations For generated queries & large hand-written which might have some room for improvement.
2 General optimizations For large generated queries over large datasets.
3 General & relatively slow optimizations For large generated queries over large datasets. Query should perform as fast as possible.

Compilation option for parallel execution is ignored when the platform is started using single thread only (without -c option).

Parallel execution is guaranteed to provide the same results as "single" thread execution provided no side-effects (namespace mutation, chunk order/size expectations, IO - reagents,files, etc) are used in user query lambdas.

For dummy example see below.

o) t:(+:)`a`b`c`d!(1 2 3;3 4 5;6 7 8;("111";"222";"333"));
o) 0N#(`qO`qP!(3;1b)) . ?[t;();0b;`a`b`c`d!`a`b`c`d]
a b c d
-----------
1 3 6 "111"
2 4 7 "222"
3 5 8 "333"

It is generally recommended to avoid overriding global optimization level besides extra-ordinary cases. It's better to use global / command-line option at start time.

"Compilation options" argument and qO,qP options are implemented since 0.6.0 version.

Compilation & fetching results

Let's have a look at a simple example first. It just creates a single table, a "q" query, compiles it into "cq" and fetches all table data into "res".

o)t:(+:)`a`b`c`d!(1 2 3;3 4 5;6 7 8;("111";"222";"333"));
o)q:?[ t; (); 0b; `a`b`c`d!`a`b`c`d ];
o)cq:. q;
o)res:0N#cq;
o)res
a b c d
-----------
1 3 6 "111"
2 4 7 "222"
3 5 8 "333"
o)

Note a space between "dot" monad and "q" variable: cq:. q

As you can see, the verb "fetch" is a dyad expecting either null to fetch all query results in one go, or an amount of records if positive number.

Fields/renamіng

Fields defined in fourth argument are a dictionary whеre keys are expected as a symbol vector. Keys define query field names. Dictionary values define fields as symbols or expressions from data source (first argument). To rename query field, just change dictionary keys as appropriate.

o)t:(+:)`a`b`c`d!(1 2 3;3 4 5;6 7 8;("111";"222";"333"));
o)q:?[ t; (); 0b; `ra`rb`rc`rd!`a`b`c`d ];
o)cq:. q;
o)res:0N#cq;
o)res
ra rb rc rd
--------------
1  3  6  "111"
2  4  7  "222"
3  5  8  "333"
o)

Table aliases

While field references are unique symbols (e.g. from a single source table), underlying query engine is fine with scalar symbols as fields. However, queries might become too complex if different tables have the same field names. To disambiguate between tables, table aliases are used.

Defining table alias is done via providing a two-element list as a data source instead of a single table. The first element is a symbol defining alias symbol, the second one is a table, join or another data source to bind.

To reference a field via alias, use monadic ~ with a two-element symbol vector:

o)t:(+:)`a`b`c`d!(1 2 3;3 4 5;6 7 8;("111";"222";"333"));
o)q:?[ (`alias;t); (); 0b; `ra`rb`rc`rd!((~`alias`a);(~`alias`b);(~`alias`c);(~`alias`d)) ];
o)cq:. q;
o)res:0N#cq;
o)res
ra rb rc rd
--------------
1  3  6  "111"
2  4  7  "222"
3  5  8  "333"
o)

Field expressions

Field definition in queries can also contain various expressions calculated in-flight.

Expressions are defined using functional/"lispy" syntax:

o)t:(+:)`a`b`c!(1 2 3;3 4 5;6 7 8);
o)0N#.(?[t;();0b;`d`e!((-:;`a);(+;`b;`c))])
d  e
-----
-1 9
-2 11
-3 13
o)

Virtual field "i"

To identify table rows, use virtual field `i. It's merely an automatically generated zero-based index.

o)t:(+:)`a`b`c!(1 2 3;3 4 5;6 7 8);
o)0N#.(?[t;();0b;`a`b`c`i!`a`b`c`i])
a b c i
-------
1 3 6 0
2 4 7 1
3 5 8 2
o)

Constants

Constants can be used in place of field, conditions, etc.

o)t:(+:)`a`b`c!(1 2 3;3 4 5;6 7 8);
o)0N#.(?[t;();0b;`a`b`c`const!(`a;`b;`c;10)])
a b c const
-----------
1 3 6 10
2 4 7 10
3 5 8 10
o)

To distinguish between symbol constants and fields, symbol constants must be enlisted.

o)t:(+:)`a`b`c!(1 2 3;3 4 5;6 7 8);
o)0N#.(?[t;();0b;`a`b`c`const!(`a;`b;`c;,`d)])
a b c const
-----------
1 3 6 d
2 4 7 d
3 5 8 d
o)

next >>> WHERE