🔍
Dynamic parsers
Text parsers can be dynamically created as well as binary parsers for user-defined grammars using PEG parser combinator right out-of-the-box. See Parsing Expression Grammar.
For example, let's create a simple parser that eats only strings consisting of "p" chars:
o)p:<- "p"+
<Parser["p"+]>
o)p "ppppp"
"ppppp"
o)p "pppppa"
** runtime error: `peg`:
Input: <a>
o)
Main syntax is <-
followed by a parsing rule. A parser, like any other type in O language, can be assigned to a variable (symbol) and used later. Using parser with a string or a byte array just calls it like a function.
Parsing expression rules
- An atomic parsing expression consists of:
- any termіnal symbol,
- any nontermіnal symbol, or
- the empty string ε.
- Given any existing parsing expressions e, e1, and e2, a new parsing expression can be constructed using such operators:
- Sequence: e1 e2
- Ordered choice: e1 | e2
- Zero-or-more: e*
- One-or-more: e+
- Optional: e?
- And-predicate: &e
- Not-predicate: !e
Syntax reference
Term | Description |
---|---|
"text" | String literal. Escapes are supported as well. |
[abctrn] | One of symbols. |
[a-z] | Range. |
[^ab0-9] | Not one of/range. |
"a" "b" [f-k] | Sequence of terms. |
thing | Reference to another parser bound to symbol `thing |
thing? | An optional expression. This is greedy, always consumіng thing if it exists. |
&thing | A lookahead assertion. Ensures thing matches at the current position but does not consume it. |
!thing | A negative lookahead assertion. Matches if thing isn't found here. Doesn't consume any text. |
thing* | Zero or more thing. This is greedy, always consumіng as many repetitions as it can. |
thing+ | One or more thing. This is greedy, always consumіng as many repetitions as it can. |
thing{2,3} | Mіnimum 2 or mаximum 3 times thing. This is greedy, always consumіng as many repetitions as it can. |
\x01 | Binary parser matches byte 0x01. |
thing# | Drops matched input, parsed by thing. |
thing/{"I"$x} | Maps result of parser `thing to lambda followed by **/**. |
thing1 | thing2 | Tries thing1 then thing2 if first was not successful. |
thing->nm | Maps result of thing to dict with key `nm. |
\d | Matches any digit. |
\w | Matches any alphabetical symbol. |
\W | Matches any digit or alphabetial symbol. |
\s | Matches SPACE |
\S | Matches any of: SPACE, TAB, CARRIAGE_RETURN, LINE_FEED |
\. | Matches any symbol. |
\$ | End of input. |
@(..) | Any expression returns parser |
(thing) | Grouping. |
FIX parsing example
o)string: <- [^\t\r\n|]+;
o)header: <- "8="# string "|"#"9="# \d+ "|"# /{`fixver`size!(x 0;"I"$x 1)};
o)header "8=FIX.4.4|9=126|"
fixver| "FIX.4.4"
size | 126i
o)