Partitioned tables
Special way of storing tables splitting whole table by records. Criteria for splitting is duplicating vаlues of single or several fields. In other words, it's table grouping by fields. This format naturally suits time series data.
Fields to group partition by are called partition fields. For instance, partitioning by dates/months/years/etc.
Partitioned table is a table consisting of a set of partitioned vectors. They also have keys/field names - symbol vector & vаlues/field vectors.
Partitioned tables will be called ptables for brevity.
Ptable support
Almost all common verbs made for tables should operate on partitioned tables transparently.
Some exceptions exist though:
- All mutating verbs - amend, dmends, upserts, etc. Cast back to dict first & mutate as usual.
- Neither scalar nor composite indices are supported. That has same implications for verbs like
?
("find"), queries, etc. ~
("match") verb is not implemented for ptables.- Queries has their own set of limitations for ptables. See corresponding section.
Ptable creation
Creating ptables consists of several stages:
- Creating non-partitioned table collecting all partitioning fields. Let's call it
gf
. - Creating partition information table. This table keeps all persistent information on each partition. It's
info
in our example. - Creating mount-related table. This table keeps current partition data.
mnt
field will keep it. - Optional domain symbol in
sym
field. - Collecting all above data under single dict (dict with meta-information) & casting it to ptable type.
Our small example as follows:
o) gf:+`a`b!(1 2;10 20);
o) info:+`size`segId`refPath!(2 2; 0 0; (0N0;0N0));
o) p1: +`c`d!(100 100;200 200);
o) p2: +`c`d!(1000 1000;2000 2000);
o) mnt:+`mntval!(p1;p2);
o) mt: `gf`info`mnt!(gf;info;mnt);
o) pt: `ptable$mt;
o) pt
a b c d
--------------
1 10 100 200
1 10 100 200
2 20 1000 2000
2 20 1000 2000
Reverse conversion back to dict is also supported via casting at any time.
o) gf:+`a`b!(1 2;10 20);
o) info:+`size`segId`refPath!(2 2; 0 0; (0N0;0N0));
o) p1: +`c`d!(100 100;200 200);
o) p2: +`c`d!(1000 1000;2000 2000);
o) mnt:+`mntval!(p1;p2);
o) mt: `gf`info`mnt!(gf;info;mnt);
o) pt: `ptable$mt;
o) d:`dict$pt;
o) d
gf | +`a`b!(1 2;10 20)
info| +`size`segId`refPath!(2 2;0 0;(0N0;0N0))
mnt | +,`mntval!((+`c`d!(100 100;200 200);+`c`d!(1000 1000;2000 2000)))
Now, let's make a detailed overview of required information in meta-dict.
Meta dict fields | Description / comments | |||
---|---|---|---|---|
`gf | `info | `mnt | `sym | Top level meta-dict fields |
All partitining fields (table) | All persistent info (table) | Mounting info (table) | Domain symbol | |
`gf1 `gf2 `gf3 ... | `size `segId `refPath | `mntval | Meta-dict table fields | |
All partiting fields table | `size - vector of longs keeping partition size | mounted partition data | ||
`segId - segment id, vector of longs (reserved) | ||||
`refPath - list/vector of reference paths for each partition | ||||
Possible combinations for partition info | ||||
val1 val2 val3 | `size - (long), segId - (long), `refPath - 0N0 | Immediate partition table vаlue | Partition table is in RAM. Saved on unmount. | |
val1 val2 val3 | `size - (long), segId - (long), `refPath - 0N0 | 0N0 | Special case for partition table. Only partitioning fields exist. | |
val1 val2 val3 | `size - (long), segId - (long), `refPath - file symbol (without slash at the end) | Immediate partition table vаlue | Partition is loaded entirely on mount. Saved on unmount in a single file. | |
val1 val2 val3 | `size - (long), segId - (long), `refPath - file symbol (with slash at the end) | Projected table vаlue | Partition is in splayed table on disk. | |
val1 val2 val3 | `size - (long), segId - (long), `refPath - file or namespace symbol or namespace path list | Same vаlue as in `refPath | Lazily mounted partition. |
refPath
having vаlue other that 0N0, it's recommended to keep non-zero corresponding size
field. Otherwise, on turning meta-dict to ptable a possibly expensive operation of loading partition & determіning its size occurs. In other words, even though lazy partition are supported, they require knowing their size in advance.Ptable related functions
All ptable-related functions reside in the standard library std/core.o
.
Make sure it's loaded before using them.
Example
t:+`created`id!(2022.10.10D10:00:01.0 2022.10.10D10:10:01.0 2022.10.10D10:10:05.0;1 2 3);
f:`:/tmp/pt/pt221010/; f set t;
t:+`created`id!(2022.10.11D10:00:01.0 2022.10.11D10:10:05.0;10 20);
f:`:/tmp/pt/pt221011/; f set t;
t:+`created`id!(2022.10.12D10:00:07.0 2022.10.12D10:10:07.0 2022.10.10D10:10:10.0;100+!3);
f:`:/tmp/pt/pt221012/; f set t;
t:+`created`id!(2022.10.13D10:00:01.0 2022.10.13D10:10:05.0;1000 2000);
f:`:/tmp/pt/pt221013/; f set t;
load "core";
ptdir:":/tmp/pt/";
dates: 2022.10.10 2022.10.11 2022.10.12;
names: { fmt["pt%%%";(`year`mm`dd$x) mod 100i] }'dates;
paths: { `$format[":%/";x] } 'names;
size: {#get[0b;`$format[ptdir,"%/";x]]}'names; // can be optimized further to "get" specific field length...
gf: +`crDate!dates;
info: +`size`segId`refPath!(size;(# dates)#0; paths);
mnt: +`mntval!paths;
pd: `gf`info`mnt`root!(gf;info;mnt;`$ptdir);
pt: .o.pnew[0N0; pd];
.o.pset[`$ptdir; pt];
pt:.o.pget[`$ptdir];
pt:.o.pmnt[pt; 0N0]; // simulate mounting all partitions
idx: dates?2022.10.11;
pt:.o.pumnt[pt; idx]; // umount specific partition before changing
pd:`dict$mv pt;
0N!pd;
// append new record in idx partition
.[`pd; (`mnt;`mntval;idx); { ptab: `$ptdir,`v`char$1_`int$$x; .[ptab; (); ,; (2022.10.11D10:10:30.000000000; 30)]; x }];
// correct "size" value
.[`pd; (`info;`size;idx); +; 1];
pt:`ptable$pd;
show pt;