synthax

Synthax is a simple parser synthesizer for Crystal.

# JSON grammar

ws = some(' ' | '\r' | '\n' | '\t')
digit = '0'..'9'
digits = many(digit)
integer = maybe('-') & ('0' | ('1'..'9') & some(digit))
fraction = '.' & digits
exponent = ('E' | 'e') & ('+' | '-') & digits
number = keep(integer & maybe(fraction) & maybe(exponent), "number:value")
hex = digit | ('A'..'F') | ('a'..'f')
escape = '"' | '\\' | '/' | 'b' | 'f' | 'n' | 'r' | 't' | ('u' & hex & hex & hex & hex)
character = ((0x0020..0x10FFFF) - '"' - '\\') | ('\\' & escape)
string = '"' & keep(some(character), "string:value") & '"'
value = ahead
element = ws & value & ws
elements = sep(element, by: ',')
array = '[' & (elements | ws) & ']'
member = capture(ws & string & ':' & element, "pair")
members = sep(member, by: ',')
object = '{' & (members | ws) & '}'
value.put \
  capture(object) |
  capture(array) |
  capture(string) |
  capture(number) |
  lit("true") |
  lit("false") |
  lit("null")

json = element

Installation

  1. Add the dependency to your shard.yml:

    dependencies:
      synthax:
        github: homonoidian/synthax
  2. Run shards install

Usage

capture and keep

A Tree has children (0 to some N of them) and attributes (string to string pairs).

capture(other, id) lets you enclose trees produced by other in a new parent tree with the given id. All keeps directly in other will add attributes onto this new parent tree. There is always an implicit root tree. It is the parent of all top level captures and keeps.

keep(other, id) takes the fragment of source code matched by other and extends the current tree with an attribute called id, with the matched fragment of source code as the value. The tree produced by other is discarded. It's a bit like named capture in regex.

Performance

It's pretty horrible but okay for that phase where you don't have thousands upon thousands of lines of code / frequent reparsing thereof. Fast parsing is the least of concerns when you're prototyping a language/etc.

If you need to go through millions of characters routinely this is the worst shard to pick I guess. I think recursive descent & a state-machine-ish lexer is better for that purpose.

For 10mb JSON example (including anify):

        JSON.parse  10.86  ( 92.08ms) (± 5.62%)  33.9MB/op        fastest
Synthax JSON parse   1.68  (596.44ms) (± 2.88%)   225MB/op   6.48× slower

To run the benchmark yourself use: crystal run examples/json.cr -Dbenchmark --release

Development

Just do it.

Contributing

  1. Fork it (https://github.com/homonoidian/synthax/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Contributors