module Sthx
Overview
Synthax is a simple parser synthesizer DSL for Crystal.
Extended Modules
Defined in:
synthax.crsynthax/apply.cr
synthax/cursor.cr
synthax/dsl.cr
synthax/error.cr
synthax/kit.cr
synthax/rule.cr
synthax/tree.cr
Constant Summary
-
VERSION =
"0.3.1"
Macro Summary
-
capture(var)
Creates a capture with the same name as the given var (which should be the name of a variable that stores the captured rule).
-
lit(string)
Creates a literal string capture rule (capture whose name is the same as the captured string).
Instance Method Summary
-
#ahead(*, __file = __FILE__, __line = __LINE__) : Rule
Forward declares a rule.
-
#capture(object, id : String)
Groups children and attributes produced by object into a tree with the given id.
-
#firstof(*objects)
Attempts to match each object of objects, the first one to succeed wins and the rest are not checked.
-
#keep(object, name : String)
Creates (if necessary) an attribute with the given name, binds it to the full text matched by object.
-
#kept(attrname : String)
Matches the value of an attribute with the given attrname.
-
#longestof(*objects)
Matches all of objects and selects the one with most progress into the source string.
-
#many(object)
Matches object one or more times.
-
#maybe(object)
Matches object zero or one times.
-
#refuse(object, error)
Refuses to match object if error object matches.
-
#repeat(object, times : Int)
Matches a repetition of object an exact number of times.
-
#repeat(object, *, min : Int, stop : Int | Nil)
Matches a repetition of object of at minimum min times, and stops at stop times (can be
nil
which means it will try to match object until it does not match). -
#rule(object : Char) : Rule
Creates a
Rule
from the given character object. -
#rule(object : Range(Int, Int)) : Rule
Creates a
Rule
from the given Unicode codepoint range object. -
#rule(object : Range(Char, Char)) : Rule
Creates a
Rule
from the given character range object. -
#rule(object : String) : Rule
Creates a
Rule
from the given string object. -
#rule(object : Rule) : Rule
Returns object, allowing you to safely convert anything to a rule using
#rule
; but if it's a rule already no conversion is going to be done. -
#sep(object, by sep)
Matches one or more objects separated by sep.
-
#seq(*objects)
Matches only when all objects appear in a sequence, ordered exactly as in the arguments.
-
#some(object)
Matches object zero or more times.
Macro Detail
Creates a capture with the same name as the given var (which should be the name of a variable that stores the captured rule).
boolean = keep("true" | "false", "value")
null = "null"
value = capture(boolean) | capture(null)
"true".apply?(value)
# => root ⸢0-4⸥
# └─ boolean ⸢0-4⸥ value="true"
"false".apply?(value)
# => root ⸢0-5⸥
# └─ boolean ⸢0-5⸥ value="false"
"null".apply?(value)
# => root ⸢0-4⸥
# └─ null ⸢0-4⸥
Creates a literal string capture rule (capture whose name is the same as the captured string).
boolean = lit("true") | lit("false")
"true".apply?(boolean)
# => root ⸢0-4⸥
# └─ true ⸢0-4⸥
"false".apply?(boolean)
# => root ⸢0-5⸥
# └─ false ⸢0-5⸥
Instance Method Detail
Forward declares a rule.
expr = ahead
expr.put seq('(', maybe(expr), ')')
expr.apply?("(((())))") # => Tree
expr.apply?("((())") # => nil
Groups children and attributes produced by object into a tree with the given id.
See #rule
for a list of supported objects.
Attempts to match each object of objects, the first one to succeed wins and the rest are not checked.
Rule#|
is the infix operator for #firstof
if you include DSL
.
See #rule
for a list of supported objects.
word = lit("ox") | lit("oxygen") | lit("brick")
word.apply?("ox")
# => root ⸢0-2⸥
# └─ ox ⸢0-2⸥
word.apply?("brick")
# => root ⸢0-5⸥
# └─ brick ⸢0-5⸥
word.apply?("oxygen")
# => root ⸢0-2⸥
# └─ ox ⸢0-2⸥
# Fails because "ox" succeeded and "ygen" is considered a trailing string,
# which is invalid in exact mode.
word.apply?("oxygen", exact: true) # => nil
word.apply?("broom") # => nil
word.apply?("stick") # => nil
This method is faster than #longestof
. However, if branches are beginning
ambiguously, the only alternative to #longestof
is to disambiguate them by
hand by placing the longer ones first; which is certainly not a comforting
experience, even more so during experimentation.
Creates (if necessary) an attribute with the given name, binds it to the full text matched by object.
See #rule
for a list of supported objects.
Matches the value of an attribute with the given attrname. Most relevant for cases like XML. The attribute must be defined on the current tree when this rule is reached. Otherwise it will result in a parse error.
Make sure to use captures accordingly to "scope" #keep
s and the corresponding
#kept
s!
tag = "<" & keep(many('a'..'z'), "tagname") & "></" & kept("tagname") & ">"
tags = sep(capture(tag), by: " THEN ")
"<foo></bar>".apply?(tags) # => nil
"<foo></foo>".apply?(tags)
# => root ⸢0-11⸥
# └─ tag ⸢0-11⸥ tagname="foo"
"<foo></foo> THEN <bar></bar> THEN <baz></baz>".apply?(tags)
# => root ⸢0-45⸥
# ├─ tag ⸢0-11⸥ tagname="foo"
# ├─ tag ⸢17-28⸥ tagname="bar"
# └─ tag ⸢34-45⸥ tagname="baz"
Certain dynamism is of course possible, for example with alternatives, provided you keep everything under the same capture/root:
wsep = many(' ' | '\n')
alphas = many('a'..'z')
head = (keep("class", "kind") & wsep & keep(alphas, "name") & wsep & "class body") |
(keep("module", "kind") & wsep & keep(alphas, "name") & wsep & "module body") |
(keep("function", "kind") & wsep & keep(alphas, "name") & wsep & "function body")
defn = capture(head & wsep & "end" & wsep & kept("kind") & wsep & kept("name"), "defn")
defns = sep(defn, by: wsep)
example = <<-END
class foo
class body
end class foo
function bar
function body
end function bar
module baz
module body
end module baz
END
example.apply?(defns)
# => root ⸢0-124⸥
# ├─ defn ⸢0-36⸥ kind="class" name="foo"
# ├─ defn ⸢38-83⸥ kind="function" name="bar"
# └─ defn ⸢85-124⸥ kind="module" name="baz"
Matches all of objects and selects the one with most progress into the source string.
Rule#^
is the infix operator for #longestof
if you include DSL
.
See #rule
for a list of supported objects.
word = lit("ox") ^ lit("oxygen") ^ lit("brick")
word.apply?("ox")
# => root ⸢0-2⸥
# └─ ox ⸢0-2⸥
word.apply?("brick")
# => root ⸢0-5⸥
# └─ brick ⸢0-5⸥
word.apply?("oxygen")
# => root ⸢0-6⸥
# └─ oxygen ⸢0-6⸥
word.apply?("oxygen", exact: true)
# => root ⸢0-6⸥
# └─ oxygen ⸢0-6⸥
word.apply?("broom") # => nil
word.apply?("stick") # => nil
Matches object one or more times.
See #rule
for a list of supported objects.
ws = many(' ')
ws.apply?("") # => nil
ws.apply?(" ") # => Tree
ws.apply?(" ") # => Tree
Matches object zero or one times.
See #rule
for a list of supported objects.
integer = maybe('-') & many('0'..'9') & maybe('.')
integer.apply?("123") # => Tree
integer.apply?("123.") # => Tree
integer.apply?("-123") # => Tree
integer.apply?("-123.") # => Tree
Refuses to match object if error object matches. Basically this rule
stands for unless error match body
.
Rule#-
is the infix operator for #refuse
if you include DSL
.
See #rule
for a list of supported objects.
name = many(('a'..'z') | ('A'..'Z')) - "John" - "Susy"
name.apply?("Marco") # => Tree
name.apply?("David") # => Tree
name.apply?("John") # => nil
name.apply?("Susy") # => nil
Matches a repetition of object an exact number of times.
Rule#*
is the infix operator for #repeat
if you include DSL
.
See #rule
for a list of supported objects.
pin = repeat('0'..'9', 4) # or rule('0'..'9') * 4
pin.apply?("") # => nil
pin.apply?("123") # => nil
pin.apply?("1234") # => Tree
pin.apply?("12345") # => nil
Matches a repetition of object of at minimum min times, and stops
at stop times (can be nil
which means it will try to match object
until it does not match).
Rule#*
is the infix operator for #repeat
if you include DSL
.
See #rule
for a list of supported objects.
digits = repeat('0'..'9', min: 2, stop: 4)
digits.apply?("") # => nil
digits.apply?("1") # => nil
digits.apply?("12") # => Tree
digits.apply?("123") # => Tree
digits.apply?("1234") # => nil
digits.apply?("12345") # => nil
Creates a Rule
from the given character object.
rule('x').apply?("x") # => Tree
rule('x').apply?("y") # => nil
Creates a Rule
from the given Unicode codepoint range object.
# Range used by e.g. JSON to match characters of a string.
rule(0x0020..0x10FFFF).apply?("x") # => Tree
Creates a Rule
from the given character range object.
rule('0'..'9').apply?("0") # => Tree
rule('0'..'9').apply?("2") # => Tree
rule('0'..'9').apply?("9") # => Tree
rule('0'..'9').apply?("a") # => nil
rule('0'...'9').apply?("9") # => nil
Creates a Rule
from the given string object.
A string is a sequence (Seq
) of characters. So writing #rule("foo")
is
the same as writing seq(rule("f"), rule("o"), rule("o"))
which can also
be written as 'f' & 'o' & 'o'
.
rule("hello").apply?("hello") # => Tree
rule("hello").apply?("hell") # => nil
rule("hello").apply?("foo") # => nil
Empty string objects are allowed can sometimes be useful, signifying an
empty but successful match. For instance if you have a hash and you want
to match x: 100
and x
as a shorthand way of x: x
, you may use the
following set of rules:
_ws = some(' ')
_id = many('a'..'z')
_colon = _ws & ":" & _ws
_digits = some('0'..'9')
number = capture(_digits, "number")
pair_key = capture(_id, "pair/key")
pair_value = capture(_colon & number, "pair/value") | capture("", "pair/no-value")
pair = _ws & capture(pair_key & pair_value, "pair") & _ws
pair.apply?("x: 100")
# => root ⸢0-6⸥
# └─ pair ⸢0-6⸥
# ├─ pair/key ⸢0-1⸥
# └─ pair/value ⸢1-6⸥
# └─ number ⸢3-6⸥
pair.apply?("x")
# => root ⸢0-1⸥
# └─ pair ⸢0-1⸥
# ├─ pair/key ⸢0-1⸥
# └─ pair/no-value ⸢1-1⸥
pair.apply?("123") # => nil
Returns object, allowing you to safely convert anything to a rule using
#rule
; but if it's a rule already no conversion is going to be done.
Matches one or more objects separated by sep.
See #rule
for a list of supported objects.
ws = sep('0'..'9', ',')
ws.apply?("") # => nil
ws.apply?("1") # => Tree
ws.apply?("1,2") # => Tree
ws.apply?("1,2,3,4,5") # => Tree
ws.apply?("1,2,3,4,5,") # => nil (trailing separator not supported!)
Matches only when all objects appear in a sequence, ordered exactly as in the arguments.
Rule#&
is the infix operator for #seq
if you include DSL
.
See #rule
for a list of supported objects.
xy = 'x' & 'y'
xy.apply?("") # => nil
xy.apply?("x") # => nil
xy.apply?("xa") # => nil
xy.apply?("xy") # => Tree
xy.apply?("yx") # => nil