module Sthx

Overview

Synthax is a simple parser synthesizer DSL for Crystal.

Extended Modules

Defined in:

synthax.cr
synthax/apply.cr
synthax/cursor.cr
synthax/dsl.cr
synthax/error.cr
synthax/kit.cr
synthax/rule.cr
synthax/tree.cr

Constant Summary

VERSION = "0.3.1"

Macro Summary

Instance Method Summary

Macro Detail

macro capture(var) #

Creates a capture with the same name as the given var (which should be the name of a variable that stores the captured rule).

boolean = keep("true" | "false", "value")
null = "null"
value = capture(boolean) | capture(null)

"true".apply?(value)
# => root ⸢0-4⸥
#    └─ boolean ⸢0-4⸥ value="true"

"false".apply?(value)
# => root ⸢0-5⸥
#    └─ boolean ⸢0-5⸥ value="false"

"null".apply?(value)
# => root ⸢0-4⸥
#    └─ null ⸢0-4⸥

[View source]
macro lit(string) #

Creates a literal string capture rule (capture whose name is the same as the captured string).

boolean = lit("true") | lit("false")

"true".apply?(boolean)
# => root ⸢0-4⸥
#    └─ true ⸢0-4⸥

"false".apply?(boolean)
# => root ⸢0-5⸥
#    └─ false ⸢0-5⸥

[View source]

Instance Method Detail

def ahead(*, __file = __FILE__, __line = __LINE__) : Rule #

Forward declares a rule.

expr = ahead
expr.put seq('(', maybe(expr), ')')

expr.apply?("(((())))") # => Tree
expr.apply?("((())")    # => nil

[View source]
def capture(object, id : String) #

Groups children and attributes produced by object into a tree with the given id.

See #rule for a list of supported objects.


[View source]
def firstof(*objects) #

Attempts to match each object of objects, the first one to succeed wins and the rest are not checked.

Rule#| is the infix operator for #firstof if you include DSL.

See #rule for a list of supported objects.

word = lit("ox") | lit("oxygen") | lit("brick")

word.apply?("ox")
# => root ⸢0-2⸥
#    └─ ox ⸢0-2⸥

word.apply?("brick")
# => root ⸢0-5⸥
#    └─ brick ⸢0-5⸥

word.apply?("oxygen")
# => root ⸢0-2⸥
#    └─ ox ⸢0-2⸥

# Fails because "ox" succeeded and "ygen" is considered a trailing string,
# which is invalid in exact mode.
word.apply?("oxygen", exact: true) # => nil

word.apply?("broom") # => nil
word.apply?("stick") # => nil

This method is faster than #longestof. However, if branches are beginning ambiguously, the only alternative to #longestof is to disambiguate them by hand by placing the longer ones first; which is certainly not a comforting experience, even more so during experimentation.


[View source]
def keep(object, name : String) #

Creates (if necessary) an attribute with the given name, binds it to the full text matched by object.

See #rule for a list of supported objects.


[View source]
def kept(attrname : String) #

Matches the value of an attribute with the given attrname. Most relevant for cases like XML. The attribute must be defined on the current tree when this rule is reached. Otherwise it will result in a parse error.

Make sure to use captures accordingly to "scope" #keeps and the corresponding #kepts!

tag = "<" & keep(many('a'..'z'), "tagname") & "></" & kept("tagname") & ">"
tags = sep(capture(tag), by: " THEN ")

"<foo></bar>".apply?(tags) # => nil
"<foo></foo>".apply?(tags)
# => root ⸢0-11⸥
#    └─ tag ⸢0-11⸥ tagname="foo"

"<foo></foo> THEN <bar></bar> THEN <baz></baz>".apply?(tags)
# => root ⸢0-45⸥
#    ├─ tag ⸢0-11⸥ tagname="foo"
#    ├─ tag ⸢17-28⸥ tagname="bar"
#    └─ tag ⸢34-45⸥ tagname="baz"

Certain dynamism is of course possible, for example with alternatives, provided you keep everything under the same capture/root:

wsep = many(' ' | '\n')
alphas = many('a'..'z')

head = (keep("class", "kind") & wsep & keep(alphas, "name") & wsep & "class body") |
       (keep("module", "kind") & wsep & keep(alphas, "name") & wsep & "module body") |
       (keep("function", "kind") & wsep & keep(alphas, "name") & wsep & "function body")

defn = capture(head & wsep & "end" & wsep & kept("kind") & wsep & kept("name"), "defn")
defns = sep(defn, by: wsep)

example = <<-END
class foo
  class body
end class foo

function bar
  function body
end function bar

module baz
  module body
end module baz
END

example.apply?(defns)
# => root ⸢0-124⸥
#    ├─ defn ⸢0-36⸥ kind="class" name="foo"
#    ├─ defn ⸢38-83⸥ kind="function" name="bar"
#    └─ defn ⸢85-124⸥ kind="module" name="baz"

[View source]
def longestof(*objects) #

Matches all of objects and selects the one with most progress into the source string.

Rule#^ is the infix operator for #longestof if you include DSL.

See #rule for a list of supported objects.

word = lit("ox") ^ lit("oxygen") ^ lit("brick")

word.apply?("ox")
# => root ⸢0-2⸥
#    └─ ox ⸢0-2⸥

word.apply?("brick")
# => root ⸢0-5⸥
#    └─ brick ⸢0-5⸥

word.apply?("oxygen")
# => root ⸢0-6⸥
#    └─ oxygen ⸢0-6⸥

word.apply?("oxygen", exact: true)
# => root ⸢0-6⸥
#    └─ oxygen ⸢0-6⸥

word.apply?("broom") # => nil
word.apply?("stick") # => nil

[View source]
def many(object) #

Matches object one or more times.

See #rule for a list of supported objects.

ws = many(' ')

ws.apply?("")    # => nil
ws.apply?(" ")   # => Tree
ws.apply?("   ") # => Tree

[View source]
def maybe(object) #

Matches object zero or one times.

See #rule for a list of supported objects.

integer = maybe('-') & many('0'..'9') & maybe('.')

integer.apply?("123")   # => Tree
integer.apply?("123.")  # => Tree
integer.apply?("-123")  # => Tree
integer.apply?("-123.") # => Tree

[View source]
def refuse(object, error) #

Refuses to match object if error object matches. Basically this rule stands for unless error match body.

Rule#- is the infix operator for #refuse if you include DSL.

See #rule for a list of supported objects.

name = many(('a'..'z') | ('A'..'Z')) - "John" - "Susy"

name.apply?("Marco") # => Tree
name.apply?("David") # => Tree
name.apply?("John")  # => nil
name.apply?("Susy")  # => nil

[View source]
def repeat(object, times : Int) #

Matches a repetition of object an exact number of times.

Rule#* is the infix operator for #repeat if you include DSL.

See #rule for a list of supported objects.

pin = repeat('0'..'9', 4) # or rule('0'..'9') * 4

pin.apply?("")      # => nil
pin.apply?("123")   # => nil
pin.apply?("1234")  # => Tree
pin.apply?("12345") # => nil

[View source]
def repeat(object, *, min : Int, stop : Int | Nil) #

Matches a repetition of object of at minimum min times, and stops at stop times (can be nil which means it will try to match object until it does not match).

Rule#* is the infix operator for #repeat if you include DSL.

See #rule for a list of supported objects.

digits = repeat('0'..'9', min: 2, stop: 4)

digits.apply?("")      # => nil
digits.apply?("1")     # => nil
digits.apply?("12")    # => Tree
digits.apply?("123")   # => Tree
digits.apply?("1234")  # => nil
digits.apply?("12345") # => nil

[View source]
def rule(object : Char) : Rule #

Creates a Rule from the given character object.

rule('x').apply?("x") # => Tree
rule('x').apply?("y") # => nil

[View source]
def rule(object : Range(Int, Int)) : Rule #

Creates a Rule from the given Unicode codepoint range object.

# Range used by e.g. JSON to match characters of a string.
rule(0x0020..0x10FFFF).apply?("x") # => Tree

[View source]
def rule(object : Range(Char, Char)) : Rule #

Creates a Rule from the given character range object.

rule('0'..'9').apply?("0")  # => Tree
rule('0'..'9').apply?("2")  # => Tree
rule('0'..'9').apply?("9")  # => Tree
rule('0'..'9').apply?("a")  # => nil
rule('0'...'9').apply?("9") # => nil

[View source]
def rule(object : String) : Rule #

Creates a Rule from the given string object.

A string is a sequence (Seq) of characters. So writing #rule("foo") is the same as writing seq(rule("f"), rule("o"), rule("o")) which can also be written as 'f' & 'o' & 'o'.

rule("hello").apply?("hello") # => Tree
rule("hello").apply?("hell")  # => nil
rule("hello").apply?("foo")   # => nil

Empty string objects are allowed can sometimes be useful, signifying an empty but successful match. For instance if you have a hash and you want to match x: 100 and x as a shorthand way of x: x, you may use the following set of rules:

_ws = some(' ')
_id = many('a'..'z')
_colon = _ws & ":" & _ws
_digits = some('0'..'9')

number = capture(_digits, "number")
pair_key = capture(_id, "pair/key")
pair_value = capture(_colon & number, "pair/value") | capture("", "pair/no-value")

pair = _ws & capture(pair_key & pair_value, "pair") & _ws

pair.apply?("x: 100")
# => root ⸢0-6⸥
#    └─ pair ⸢0-6⸥
#       ├─ pair/key ⸢0-1⸥
#       └─ pair/value ⸢1-6⸥
#          └─ number ⸢3-6⸥

pair.apply?("x")
# => root ⸢0-1⸥
#    └─ pair ⸢0-1⸥
#       ├─ pair/key ⸢0-1⸥
#       └─ pair/no-value ⸢1-1⸥

pair.apply?("123") # => nil

[View source]
def rule(object : Rule) : Rule #

Returns object, allowing you to safely convert anything to a rule using #rule; but if it's a rule already no conversion is going to be done.


[View source]
def sep(object, by sep) #

Matches one or more objects separated by sep.

See #rule for a list of supported objects.

ws = sep('0'..'9', ',')

ws.apply?("")           # => nil
ws.apply?("1")          # => Tree
ws.apply?("1,2")        # => Tree
ws.apply?("1,2,3,4,5")  # => Tree
ws.apply?("1,2,3,4,5,") # => nil (trailing separator not supported!)

[View source]
def seq(*objects) #

Matches only when all objects appear in a sequence, ordered exactly as in the arguments.

Rule#& is the infix operator for #seq if you include DSL.

See #rule for a list of supported objects.

xy = 'x' & 'y'
xy.apply?("")   # => nil
xy.apply?("x")  # => nil
xy.apply?("xa") # => nil
xy.apply?("xy") # => Tree
xy.apply?("yx") # => nil

[View source]
def some(object) #

Matches object zero or more times.

See #rule for a list of supported objects.

ws = some(' ')

ws.apply?("")    # => Tree
ws.apply?(" ")   # => Tree
ws.apply?("   ") # => Tree

[View source]