🔍

sql.o

Before using the SQL-like syntax in o, you need to ‘load’ it as it is unavailable out-of-the-box.

The std/sql.o script contains an injection for the O parser, so it must be loaded separately from the script that uses it.
You can load on platform startup or with the load "sql" command in REPL.

SELECT and QDEF

The SQL-like syntax allows both ‘query definitions’ and executes queries.
The familiar select expressions execute immediately.
Query definitions use qdef instead of select. Otherwise, their syntax is the same.

Query definitions are needed for two reasons.
First, they allow us to introduce subqueries naturally.

    a: qdef ... from ... ;   //  subquery, not executed
    select ... from ... a

Second, they give us control over the laziness of the query execution. Given a query definition, we can ‘compile’ it and pull the desired number of records.

    a:qdef ... from ...;
    ...
    10#.a

The top-level diagram of the SQL-like syntax:

DISTINCT

If select or qdef is followed by distinct the query rеturns only unique records.

If the query has a by clause it should not use distinct keyword.

LIMIT and ORDER

It is possible to limit the number of records rеturned by the query by adding a range-sort clause right after select/qdef. The grammar for the clause is as follows:

The range-sort clause corresponds to the 5th and 6th arguments of the functional query.

o)load "sql";
o)t: +`a`b`c!(!10;|!10;10+!10)
a b c
------
0 9 10
1 8 11
2 7 12
3 6 13
4 5 14
5 4 15
6 3 16
7 2 17
8 1 18
9 0 19
o)select [3] a, c from t
a c
----
0 10
1 11
2 12
o)select [3 3] a, c from t
a c
----
3 13
4 14
5 15
o)select [3 3;<b] a, c from t
a c
----
6 16
5 15
4 14
o)

Output fields

The comma-separated list of output fields follows the select (or qdef) keyword.
The list can be empty. An empty list of output fields means to rеturn all of them.

o)t:+:`a`b`c!(1 2 3 4;2 3 5 7;1 4 9 16);
o)select from t
a b c
------
1 2 1
2 3 4
3 5 9
4 7 16
o)select a, b from t
a b
---
1 2
2 3
3 5
4 7
o)

A output field can be renamed by prefixing it with the new name and a colon.

o)select x:a,y:b from t
x y
---
1 2
2 3
3 5
4 7
o)

For calculated fields the parser automatically chooses unique names, such as f1, f2, etc.

o)select a+1,b*b from t
f1 f2
-----
2  4 
3  9 
4  25
5  49
o)select a,a,a from t
a a1 a2
-------
1 1  1 
2 2  2 
3 3  3 
4 4  4
o)

Note that aliases cannot contain dots. kdb/q has a similar constraint. It has a utility function to remove dots from field names.

A comma may follow the last output field just before the from keyword as a convenience feature. This helps a little when you put each output field on a separate line as follows:

select OrigTicker,
       LastDate:last date,
       OtherCustomerRef, // <- comma is OK
from mytable

Field references and query parameters

The SQL-like syntax supports fully qualified field references. You can refer to a field using the standard SQL syntax — table1.field1.
The SQL-like parser automatically captures the name of the variable containing data as a table alias.

select t.a from t

select t1.a + 100 from t1:longTableName

Any unqualified identifier, such as a, is a field reference. It should be unambiguous and should not be found in multiple input tables.

An identifier prefixed with @ is a query parameter, a reference to a local (or global) variable. For example:

o)c:10;
o)t:+:`a`b`c!(1 2 3 4;2 3 5 7;1 4 9 16);
o)select a, b from t where c<@c
a b
---
1 2
2 3
3 5
o)

The values of query parameters are ‘captured’ upon query instantiation.
The query may run for a long time, by changes in the variables referred to by query parameters will not affect the query.

Query expressions

The syntax does not support arbitrary o-expression in the calculated fields and elsewhere. Yet the range of allowed expressions is quite rich. It is expected to expand as people start to use the syntax.

The rationale for handling the expressions within the SQL-like parser (and not in the o-parser) is to:

  • Allow natural field references, such as table1.field1;
  • Stop side effects, such as modification of a global variable;
  • Stop unnecessary quoting/unquoting.

The current grammar of query expressions:

Dyads and monads allowed in the SQL-like expression must transparently work with vectors.
Thus the following verbs are allowed:

  • Dyads:
    • + - * % mod
    • ^ (fill)
    • = <> >= <= > < like in
    • ~
    • | or & and
    • shift rotate
    • xexp xlog
    • bor band bxor
    • ': ' each / \\
  • Monads:
    • ^ null
    • ~ not bnot
    • - neg
    • % (reciprocate)
    • | (reverse)
    • $ (tostring)
    • sin cos tan
    • asin acos atan
    • exp log sqrt
    • floor ceil round
    • trunc frac
    • abs

o)t:+:`a`b`c!(1 2 3 4;2 3 5 7;1 4 9 16)
a b c
------
1 2 1
2 3 4
3 5 9
4 7 16
o)select ab: a*b, c from t
ab c
-----
2  1
6  4
15 9
28 16

More complex expression can be introduced using a lambda imported into an SQL expression using a parameter. Inline lambdas are not supported.

median:{x[>x][#x%2]};
select @median[price] from orders

FROM

The from clause is mandatory. The simplest form of the from clause is from t, where t is the table's name. Instead of the table, you may use a subquery as follows.

o)t:+:`a`b`c!(1 2 3 4;2 3 5 7;1 4 9 16);
o)select a,b from (qdef from t where c<10)
a b
---
1 2
2 3
3 5
o)

A subquery must be put in parentheses. You may use select in the subquery just as well.

Similarly to the fields, an input table (or a subquery) can have an alias. As with field aliases, table alias cannot contain dots.

o)longTableName:+:`a`b`c!(1 2 3 4;2 3 5 7;1 4 9 16)
o)select from t:longTableName
a b c 
------
1 2 1 
2 3 4 
3 5 9 
4 7 16
o)

WHERE

The where clause contains a list of arbitrary expressions separated by ,.

o)select a,b from t where c<10
a b
---
1 2
2 3
3 5
o)

GROUP

The by clause contains the list of fields (or expressions) to group the records.

The complete grammar of the element of the group list:

Similarly to the output fields the grouping fields can be renamed.

o)t: +`a`b`c!(10#!3;|!10;10+!10)
a b c
------
0 9 10
1 8 11
2 7 12
0 6 13
1 5 14
2 4 15
0 3 16
1 2 17
2 1 18
0 0 19
o)select b, c from t by a
a b       c
---------------------
0 9 6 3 0 10 13 16 19
1 8 5 2   11 14 17
2 7 4 1   12 15 18
o)select su:sum b, av: avg c from t by a
a su av
---------
0 18 14.5
1 15 14
2 12 15
o)

JOIN

Any number of tables can be combined using ij (inner join), lj (left join).

o)t1:+:`a`b`c!(1 2 3 4;2 3 5 7;1 4 9 16);
o)t2:+:`a`f!(1 2 3;("one";"two";"three"));
o)select t2.f from t1 ij t2 on t1.a = t2.a
f
-------
"one"  
"two"  
"three"
o)

Multiple tables can be joined together, formіng a join chain. The tables can be joined on multiple fields.

Instead of on t1.a = t2.a and t1.b = t2.b and t1.c = t2.c you may write on (a,b,c) to the same effect.

o)select t2.f from t1 ij t2 on (a)
f
-------
"one"  
"two"  
"three"
o)

Note that the list of fields must be put in parentheses. When you join more than two tables, the shortcut notation assumes that the join uses the leftmost table.
In the following example — select ... from t1 ij t2 on (a) ij t3 on (b) ... — the second join is a join between table t1 and t3 on the field b.

UNION

The input tables can be combined using a ua (union all) operation.

select ... from t1 ua t2

A ua (union all) operation has higher precedence than a join. In the example below, t2 ua t3 is performed first, followed by a join with between t1 and the result of the union.

select ... from t1 ij a:t2 ua t3 on (id)

Note that an alias before the ua expression cannot be omitted. In general, both sides of any join operation must have aliases. In the case of a table referred to by its name, the alias is implicit.

The complete grammar of the join chain:

Arbitrary parentheses are not allowed in the from clause. The parentheses can only be used with a subquery or a shortcut field lists of a join.

Resume: complete SQL-like syntax grammar