sql.o
Before using the SQL-like syntax in o, you need to ‘load’ it as it is unavailable out-of-the-box.
std/sql.o
script contains an injection for the O parser, so it must be loaded separately from the script that uses it.You can load on platform startup or with the
load "sql"
command in REPL. SELECT and QDEF
The SQL-like syntax allows both ‘query definitions’ and executes queries.
The familiar select
expressions execute immediately.
Query definitions use qdef
instead of select
. Otherwise, their syntax is the same.
Query definitions are needed for two reasons.
First, they allow us to introduce subqueries naturally.
a: qdef ... from ... ; // subquery, not executed
select ... from ... a
Second, they give us control over the laziness of the query execution. Given a query definition, we can ‘compile’ it and pull the desired number of records.
a:qdef ... from ...;
...
10#.a
The top-level diagram of the SQL-like syntax:
DISTINCT
select
or qdef
is followed by distinct
the query rеturns only unique records.If the query has a
by
clause it should not use distinct
keyword.LIMIT and ORDER
It is possible to limit the number of records rеturned by the query by adding a range-sort
clause right after select/qdef
. The grammar for the clause is as follows:
o)load "sql";
o)t: +`a`b`c!(!10;|!10;10+!10)
a b c
------
0 9 10
1 8 11
2 7 12
3 6 13
4 5 14
5 4 15
6 3 16
7 2 17
8 1 18
9 0 19
o)select [3] a, c from t
a c
----
0 10
1 11
2 12
o)select [3 3] a, c from t
a c
----
3 13
4 14
5 15
o)select [3 3;<b] a, c from t
a c
----
6 16
5 15
4 14
o)
Output fields
The comma-separated list of output fields follows the select
(or qdef
) keyword.
The list can be empty. An empty list of output fields means to rеturn all of them.
o)t:+:`a`b`c!(1 2 3 4;2 3 5 7;1 4 9 16);
o)select from t
a b c
------
1 2 1
2 3 4
3 5 9
4 7 16
o)select a, b from t
a b
---
1 2
2 3
3 5
4 7
o)
A output field can be renamed by prefixing it with the new name and a colon.
o)select x:a,y:b from t
x y
---
1 2
2 3
3 5
4 7
o)
For calculated fields the parser automatically chooses unique names, such as f1
, f2
, etc.
o)select a+1,b*b from t
f1 f2
-----
2 4
3 9
4 25
5 49
o)select a,a,a from t
a a1 a2
-------
1 1 1
2 2 2
3 3 3
4 4 4
o)
Note that aliases cannot contain dots. kdb/q
has a similar constraint. It has a utility function to remove dots from field names.
A comma may follow the last output field just before the from
keyword as a convenience feature. This helps a little when you put each output field on a separate line as follows:
select OrigTicker,
LastDate:last date,
OtherCustomerRef, // <- comma is OK
from mytable
Field references and query parameters
The SQL-like syntax supports fully qualified field references. You can refer to a field using the standard SQL syntax — table1.field1
.
The SQL-like parser automatically captures the name of the variable containing data as a table alias.
select t.a from t
select t1.a + 100 from t1:longTableName
Any unqualified identifier, such as a
, is a field reference. It should be unambiguous and should not be found in multiple input tables.
An identifier prefixed with @
is a query parameter, a reference to a local (or global) variable. For example:
o)c:10;
o)t:+:`a`b`c!(1 2 3 4;2 3 5 7;1 4 9 16);
o)select a, b from t where c<@c
a b
---
1 2
2 3
3 5
o)
The query may run for a long time, by changes in the variables referred to by query parameters will not affect the query.
Query expressions
The syntax does not support arbitrary o-expression in the calculated fields and elsewhere. Yet the range of allowed expressions is quite rich. It is expected to expand as people start to use the syntax.
The rationale for handling the expressions within the SQL-like parser (and not in the o-parser) is to:
- Allow natural field references, such as
table1.field1
; - Stop side effects, such as modification of a global variable;
- Stop unnecessary quoting/unquoting.
The current grammar of query expressions:
Dyads and monads allowed in the SQL-like expression must transparently work with vectors.
Thus the following verbs are allowed:
- Dyads:
+
-
*
%
mod
^
(fill)=
<>
>=
<=
>
<
like
in
~
|
or
&
and
shift
rotate
xexp
xlog
bor
band
bxor
':
'
each
/
\\
- Monads:
^
null
~
not
bnot
-
neg
%
(reciprocate)|
(reverse)$
(tostring)sin
cos
tan
asin
acos
atan
exp
log
sqrt
floor
ceil
round
trunc
frac
abs
o)t:+:`a`b`c!(1 2 3 4;2 3 5 7;1 4 9 16)
a b c
------
1 2 1
2 3 4
3 5 9
4 7 16
o)select ab: a*b, c from t
ab c
-----
2 1
6 4
15 9
28 16
More complex expression can be introduced using a lambda imported into an SQL expression using a parameter. Inline lambdas are not supported.
median:{x[>x][#x%2]};
select @median[price] from orders
FROM
The from
clause is mandatory. The simplest form of the from
clause is from t
, where t
is the table's name. Instead of the table, you may use a subquery as follows.
o)t:+:`a`b`c!(1 2 3 4;2 3 5 7;1 4 9 16);
o)select a,b from (qdef from t where c<10)
a b
---
1 2
2 3
3 5
o)
A subquery must be put in parentheses. You may use select
in the subquery just as well.
Similarly to the fields, an input table (or a subquery) can have an alias. As with field aliases, table alias cannot contain dots.
o)longTableName:+:`a`b`c!(1 2 3 4;2 3 5 7;1 4 9 16)
o)select from t:longTableName
a b c
------
1 2 1
2 3 4
3 5 9
4 7 16
o)
WHERE
The where
clause contains a list of arbitrary expressions separated by ,
.
o)select a,b from t where c<10
a b
---
1 2
2 3
3 5
o)
GROUP
The by
clause contains the list of fields (or expressions) to group the records.
The complete grammar of the element of the group list:
Similarly to the output fields the grouping fields can be renamed.
o)t: +`a`b`c!(10#!3;|!10;10+!10)
a b c
------
0 9 10
1 8 11
2 7 12
0 6 13
1 5 14
2 4 15
0 3 16
1 2 17
2 1 18
0 0 19
o)select b, c from t by a
a b c
---------------------
0 9 6 3 0 10 13 16 19
1 8 5 2 11 14 17
2 7 4 1 12 15 18
o)select su:sum b, av: avg c from t by a
a su av
---------
0 18 14.5
1 15 14
2 12 15
o)
JOIN
Any number of tables can be combined using ij
(inner join), lj
(left join).
o)t1:+:`a`b`c!(1 2 3 4;2 3 5 7;1 4 9 16);
o)t2:+:`a`f!(1 2 3;("one";"two";"three"));
o)select t2.f from t1 ij t2 on t1.a = t2.a
f
-------
"one"
"two"
"three"
o)
Multiple tables can be joined together, formіng a join chain. The tables can be joined on multiple fields.
Instead of on t1.a = t2.a and t1.b = t2.b and t1.c = t2.c
you may write on (a,b,c)
to the same effect.
o)select t2.f from t1 ij t2 on (a)
f
-------
"one"
"two"
"three"
o)
Note that the list of fields must be put in parentheses. When you join more than two tables, the shortcut notation assumes that the join uses the leftmost table.
In the following example — select ... from t1 ij t2 on (a) ij t3 on (b) ...
— the second join is a join between table t1
and t3
on the field b
.
UNION
The input tables can be combined using a ua
(union all) operation.
select ... from t1 ua t2
A ua
(union all) operation has higher precedence than a join. In the example below, t2 ua t3
is performed first, followed by a join with between t1
and the result of the union.
select ... from t1 ij a:t2 ua t3 on (id)
Note that an alias before the ua
expression cannot be omitted. In general, both sides of any join operation must have aliases. In the case of a table referred to by its name, the alias is implicit.
The complete grammar of the join chain:
Arbitrary parentheses are not allowed in the from
clause. The parentheses can only be used with a subquery or a shortcut field lists of a join.