🔍

Partitioned tables

Special way of storing tables splitting whole table by records. Criteria for splitting is duplicating vаlues of single or several fields. In other words, it's table grouping by fields. This format naturally suits time series data.

Fields to group partition by are called partition fields. For instance, partitioning by dates/months/years/etc.

Partitioned table is a table consisting of a set of partitioned vectors. They also have keys/field names - symbol vector & vаlues/field vectors.

Partitioned tables will be called ptables for brevity.

Ptable support

Almost all common verbs made for tables should operate on partitioned tables transparently.

Some exceptions exist though:

  • All mutating verbs - amend, dmends, upserts, etc. Cast back to dict first & mutate as usual.
  • Neither scalar nor composite indices are supported. That has same implications for verbs like ? ("find"), queries, etc.
  • ~ ("match") verb is not implemented for ptables.
  • Queries has their own set of limitations for ptables. See corresponding section.

Ptable creation

Creating ptables consists of several stages:

  • Creating non-partitioned table collecting all partitioning fields. Let's call it gf.
  • Creating partition information table. This table keeps all persistent information on each partition. It's info in our example.
  • Creating mount-related table. This table keeps current partition data. mnt field will keep it.
  • Optional domain symbol in sym field.
  • Collecting all above data under single dict (dict with meta-information) & casting it to ptable type.
Zero-sized partitioned tables are not supported. At least one partition should exist.

Our small example as follows:

o) gf:+`a`b!(1 2;10 20);
o) info:+`size`segId`refPath!(2 2; 0 0; (0N0;0N0));
o) p1: +`c`d!(100 100;200 200);
o) p2: +`c`d!(1000 1000;2000 2000);
o) mnt:+`mntval!(p1;p2);
o) mt: `gf`info`mnt!(gf;info;mnt);
o) pt: `ptable$mt;
o) pt
a b  c    d   
--------------
1 10 100  200 
1 10 100  200 
2 20 1000 2000
2 20 1000 2000

Reverse conversion back to dict is also supported via casting at any time.

o) gf:+`a`b!(1 2;10 20);
o) info:+`size`segId`refPath!(2 2; 0 0; (0N0;0N0));
o) p1: +`c`d!(100 100;200 200);
o) p2: +`c`d!(1000 1000;2000 2000);
o) mnt:+`mntval!(p1;p2);
o) mt: `gf`info`mnt!(gf;info;mnt);
o) pt: `ptable$mt;
o) d:`dict$pt;
o) d
gf  | +`a`b!(1 2;10 20)
info| +`size`segId`refPath!(2 2;0 0;(0N0;0N0))
mnt | +,`mntval!((+`c`d!(100 100;200 200);+`c`d!(1000 1000;2000 2000)))

Now, let's make a detailed overview of required information in meta-dict.

Meta dict fields Description / comments
`gf `info `mnt `sym Top level meta-dict fields
All partitining fields (table) All persistent info (table) Mounting info (table) Domain symbol
`gf1 `gf2 `gf3 ... `size `segId `refPath `mntval Meta-dict table fields
All partiting fields table `size - vector of longs keeping partition size mounted partition data
`segId - segment id, vector of longs (reserved)
`refPath - list/vector of reference paths for each partition
Possible combinations for partition info
val1 val2 val3 `size - (long), segId - (long), `refPath - 0N0 Immediate partition table vаlue Partition table is in RAM. Saved on unmount.
val1 val2 val3 `size - (long), segId - (long), `refPath - 0N0 0N0 Special case for partition table. Only partitioning fields exist.
val1 val2 val3 `size - (long), segId - (long), `refPath - file symbol (without slash at the end) Immediate partition table vаlue Partition is loaded entirely on mount. Saved on unmount in a single file.
val1 val2 val3 `size - (long), segId - (long), `refPath - file symbol (with slash at the end) Projected table vаlue Partition is in splayed table on disk.
val1 val2 val3 `size - (long), segId - (long), `refPath - file or namespace symbol or namespace path list Same vаlue as in `refPath Lazily mounted partition.

For refPath having vаlue other that 0N0, it's recommended to keep non-zero corresponding size field. Otherwise, on turning meta-dict to ptable a possibly expensive operation of loading partition & determіning its size occurs. In other words, even though lazy partition are supported, they require knowing their size in advance.

Ptable related functions

All ptable-related functions reside in the standard library std/core.o.

Make sure it's loaded before using them.

Example

t:+`created`id!(2022.10.10D10:00:01.0 2022.10.10D10:10:01.0 2022.10.10D10:10:05.0;1 2 3);
f:`:/tmp/pt/pt221010/; f set t;
t:+`created`id!(2022.10.11D10:00:01.0 2022.10.11D10:10:05.0;10 20);
f:`:/tmp/pt/pt221011/; f set t;
t:+`created`id!(2022.10.12D10:00:07.0 2022.10.12D10:10:07.0 2022.10.10D10:10:10.0;100+!3);
f:`:/tmp/pt/pt221012/; f set t;
t:+`created`id!(2022.10.13D10:00:01.0 2022.10.13D10:10:05.0;1000 2000);
f:`:/tmp/pt/pt221013/; f set t;

load "core";

ptdir:":/tmp/pt/";
dates: 2022.10.10 2022.10.11 2022.10.12;
names: { fmt["pt%%%";(`year`mm`dd$x) mod 100i] }'dates;
paths: { `$format[":%/";x] } 'names;
size: {#get[0b;`$format[ptdir,"%/";x]]}'names; // can be optimized further to "get" specific field length...

gf: +`crDate!dates;
info: +`size`segId`refPath!(size;(# dates)#0; paths);
mnt: +`mntval!paths;
pd: `gf`info`mnt`root!(gf;info;mnt;`$ptdir);
pt: .o.pnew[0N0; pd];
.o.pset[`$ptdir; pt];
pt:.o.pget[`$ptdir];

pt:.o.pmnt[pt; 0N0];            // simulate mounting all partitions

idx: dates?2022.10.11;
pt:.o.pumnt[pt; idx];           // umount specific partition before changing

pd:`dict$mv pt;
0N!pd;

// append new record in idx partition
.[`pd; (`mnt;`mntval;idx); { ptab: `$ptdir,`v`char$1_`int$$x; .[ptab; (); ,; (2022.10.11D10:10:30.000000000; 30)]; x }];
// correct "size" value
.[`pd; (`info;`size;idx); +; 1];

pt:`ptable$pd;
show pt;