Module: stdgo.regexp.syntax
Overview
Package syntax parses regular expressions into parse trees and compiles parse trees into programs. Most clients of regular expressions will use the facilities of package regexp (such as Compile and Match) instead of this package.
Syntax
The regular expression syntax understood by this package when parsing with the Perl flag is as follows. Parts of the syntax can be disabled by passing alternate flags to Parse.
Single characters:
. any character, possibly including newline (flag s=true)
[xyz] character class
[^xyz] negated character class
\d Perl character class
\D negated Perl character class
[[:alpha:]] ASCII character class
[[:^alpha:]] negated ASCII character class
\pN Unicode character class (one-letter name)
\p{Greek} Unicode character class
\PN negated Unicode character class (one-letter name)
\P{Greek} negated Unicode character class
Composites:
xy x followed by y
x|y x or y (prefer x)
Repetitions:
x* zero or more x, prefer more
x+ one or more x, prefer more
x? zero or one x, prefer one
x{n,m} n or n+1 or ... or m x, prefer more
x{n,} n or more x, prefer more
x{n} exactly n x
x*? zero or more x, prefer fewer
x+? one or more x, prefer fewer
x?? zero or one x, prefer zero
x{n,m}? n or n+1 or ... or m x, prefer fewer
x{n,}? n or more x, prefer fewer
x{n}? exactly n x
Implementation restriction: The counting forms x{n,m}, x{n,}, and x{n} reject forms that create a minimum or maximum repetition count above 1000. Unlimited repetitions are not subject to this restriction.
Grouping:
(re) numbered capturing group (submatch)
(?P<name>re) named & numbered capturing group (submatch)
(?:re) non-capturing group
(?flags) set flags within current group; non-capturing
(?flags:re) set flags during re; non-capturing
Flag syntax is xyz (set) or -xyz (clear) or xy-z (set xy, clear z). The flags are:
i case-insensitive (default false)
m multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false)
s let . match \n (default false)
U ungreedy: swap meaning of x* and x*?, x+ and x+?, etc (default false)
Empty strings:
^ at beginning of text or line (flag m=true)
$ at end of text (like \z not \Z) or line (flag m=true)
\A at beginning of text
\b at ASCII word boundary (\w on one side and \W, \A, or \z on the other)
\B not at ASCII word boundary
\z at end of text
Escape sequences:
\a bell (== \007)
\f form feed (== \014)
\t horizontal tab (== \011)
\n newline (== \012)
\r carriage return (== \015)
\v vertical tab character (== \013)
\* literal *, for any punctuation character *
\123 octal character code (up to three digits)
\x7F hex character code (exactly two digits)
\x{10FFFF} hex character code
\Q...\E literal text ... even if ... has punctuation
Character class elements:
x single character
A-Z character range (inclusive)
\d Perl character class
[:foo:] ASCII character class foo
\p{Foo} Unicode character class Foo
\pF Unicode character class F (one-letter name)
Named character classes as character class elements:
[\d] digits (== \d)
[^\d] not digits (== \D)
[\D] not digits (== \D)
[^\D] not not digits (== \d)
[[:name:]] named ASCII class inside character class (== [:name:])
[^[:name:]] named ASCII class inside negated character class (== [:^name:])
[\p{Name}] named Unicode property inside character class (== \p{Name})
[^\p{Name}] named Unicode property inside negated character class (== \P{Name})
Perl character classes (all ASCII-only):
\d digits (== [0-9])
\D not digits (== [^0-9])
\s whitespace (== [\t\n\f\r ])
\S not whitespace (== [^\t\n\f\r ])
\w word characters (== [0-9A-Za-z_])
\W not word characters (== [^0-9A-Za-z_])
ASCII character classes:
[[:alnum:]] alphanumeric (== [0-9A-Za-z])
[[:alpha:]] alphabetic (== [A-Za-z])
[[:ascii:]] ASCII (== [\x00-\x7F])
[[:blank:]] blank (== [\t ])
[[:cntrl:]] control (== [\x00-\x1F\x7F])
[[:digit:]] digits (== [0-9])
[[:graph:]] graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~])
[[:lower:]] lower case (== [a-z])
[[:print:]] printable (== [ -~] == [ [:graph:]])
[[:punct:]] punctuation (== [!-/:-@[-`{-~])
[[:space:]] whitespace (== [\t\n\v\f\r ])
[[:upper:]] upper case (== [A-Z])
[[:word:]] word characters (== [0-9A-Za-z_])
[[:xdigit:]] hex digit (== [0-9A-Fa-f])
Unicode character classes are those in unicode.Categories and unicode.Scripts.
Index
-
function _bw(_b:stdgo.Ref<stdgo.strings.Builder>, _args:haxe.Rest<stdgo.GoString>):Void
-
function _cleanAlt(_re:stdgo.Ref<stdgo.regexp.syntax.Regexp>):Void
-
function _cleanClass(_rp:stdgo.Ref<stdgo.Slice<stdgo.GoRune>>):stdgo.Slice<stdgo.GoRune>
-
function _dump(_re:stdgo.Ref<stdgo.regexp.syntax.Regexp>):stdgo.GoString
-
function _dumpInst(_b:stdgo.Ref<stdgo.strings.Builder>, _i:stdgo.Ref<stdgo.regexp.syntax.Inst>):Void
-
function _dumpProg(_b:stdgo.Ref<stdgo.strings.Builder>, _p:stdgo.Ref<stdgo.regexp.syntax.Prog>):Void
-
function _escape(_b:stdgo.Ref<stdgo.strings.Builder>, _r:stdgo.GoRune, _force:Bool):Void
-
function _isCharClass(_re:stdgo.Ref<stdgo.regexp.syntax.Regexp>):Bool
-
function _makePatchList(_n:stdgo.GoUInt32):stdgo.regexp.syntax.T_patchList
-
function _matchRune(_re:stdgo.Ref<stdgo.regexp.syntax.Regexp>, _r:stdgo.GoRune):Bool
-
function _negateClass(_r:stdgo.Slice<stdgo.GoRune>):stdgo.Slice<stdgo.GoRune>
-
function _nextRune(_s:stdgo.GoString):{ _2:stdgo.Error; _1:stdgo.GoString; _0:stdgo.GoRune; }
-
function _repeatIsValid(_re:stdgo.Ref<stdgo.regexp.syntax.Regexp>, _n:stdgo.GoInt):Bool
-
function benchmarkEmptyOpContext(_b:stdgo.Ref<stdgo.testing.B>):Void
-
function benchmarkIsWordChar(_b:stdgo.Ref<stdgo.testing.B>):Void
-
function emptyOpContext(_r1:stdgo.GoRune, _r2:stdgo.GoRune):stdgo.regexp.syntax.EmptyOp
-
function testAppendRangeCollapse(_t:stdgo.Ref<stdgo.testing.T_>):Void
-
function testFoldConstants(_t:stdgo.Ref<stdgo.testing.T_>):Void
-
function testParseFoldCase(_t:stdgo.Ref<stdgo.testing.T_>):Void
-
function testParseInvalidRegexps(_t:stdgo.Ref<stdgo.testing.T_>):Void
-
function testParseLiteral(_t:stdgo.Ref<stdgo.testing.T_>):Void
-
function testParseMatchNL(_t:stdgo.Ref<stdgo.testing.T_>):Void
-
function testParseNoMatchNL(_t:stdgo.Ref<stdgo.testing.T_>):Void
-
function testParseSimple(_t:stdgo.Ref<stdgo.testing.T_>):Void
-
function testToStringEquivalentParse(_t:stdgo.Ref<stdgo.testing.T_>):Void
-
function matchEmptyWidth( _before:stdgo.GoRune, _after:stdgo.GoRune):Bool
-
function _skipNop( _pc:stdgo.GoUInt32):stdgo.Ref<stdgo.regexp.syntax.Inst>
-
function _capNames( _names:stdgo.Slice<stdgo.GoString>):Void
-
function equal( _y:stdgo.Ref<stdgo.regexp.syntax.Regexp>):Bool
-
function new(?code:Null<stdgo.regexp.syntax.ErrorCode>, ?expr:stdgo.GoString):Void
Constants
import stdgo.regexp.syntax.Syntax
final __Op_name_0:stdgo.GoString = (("NoMatchEmptyMatchLiteralCharClassAnyCharNotNLAnyCharBeginLineEndLineBeginTextEndTextWordBoundaryNoWordBoundaryCaptureStarPlusQuestRepeatConcatAlternate" : stdgo.GoString))
final __Op_name_1:stdgo.GoString = (("opPseudo" : stdgo.GoString))
final _instSize:stdgo.GoUInt64 = ((40i64 : stdgo.GoUInt64))
maxSize is the maximum size of a compiled regexp in Insts. It too is somewhat arbitrarily chosen, but the idea is to be large enough to allow significant regexps while at the same time small enough that the compiled form will not take up too much memory. 128 MB is enough for a 3.3 million Inst structures, which roughly corresponds to a 3.3 MB regexp.
byte, 2 uint32, slice is 5 64-bit words
final _maxFold:stdgo.GoUInt64 = ((125251i64 : stdgo.GoUInt64))
final _maxHeight:stdgo.GoUInt64 = ((1000i64 : stdgo.GoUInt64))
maxHeight is the maximum height of a regexp parse tree. It is somewhat arbitrarily chosen, but the idea is to be large enough that no one will actually hit in real use but at the same time small enough that recursion on the Regexp tree will not hit the 1GB Go stack limit. The maximum amount of stack for a single recursive frame is probably closer to 1kB, so this could potentially be raised, but it seems unlikely that people have regexps nested even this deeply. We ran a test on Google's C++ code base and turned up only a single use case with depth \> 100; it had depth 128. Using depth 1000 should be plenty of margin. As an optimization, we don't even bother calculating heights until we've allocated at least maxHeight Regexp structures.
final _maxRunes:stdgo.GoUInt64 = ((33554432i64 : stdgo.GoUInt64))
maxRunes is the maximum number of runes allowed in a regexp tree counting the runes in all the nodes. Ignoring character classes p.numRunes is always less than the length of the regexp. Character classes can make it much larger: each \pL adds 1292 runes. 128 MB is enough for 32M runes, which is over 26k \pL instances. Note that repetitions do not make copies of the rune slices, so \pL{1000} is only one rune slice, not 1000. We could keep a cache of character classes we've seen, so that all the \pL we see use the same rune list, but that doesn't remove the problem entirely: consider something like [\pL01234][\pL01235][\pL01236]...[\pL^&*()]. And because the Rune slice is exposed directly in the Regexp, there is not an opportunity to change the representation to allow partial sharing between different character classes. So the limit is the best we can do.
final _maxSize:stdgo.GoUInt64 = ((3355443i64 : stdgo.GoUInt64))
maxSize is the maximum size of a compiled regexp in Insts. It too is somewhat arbitrarily chosen, but the idea is to be large enough to allow significant regexps while at the same time small enough that the compiled form will not take up too much memory. 128 MB is enough for a 3.3 million Inst structures, which roughly corresponds to a 3.3 MB regexp.
final _meta:stdgo.GoString = (("\\.+*?()|[]{}^$" : stdgo.GoString))
final _minFold:stdgo.GoUInt64 = ((65i64 : stdgo.GoUInt64))
minimum and maximum runes involved in folding. checked during test.
final _noMatch:stdgo.GoUInt64 = ((0i64 : stdgo.GoUInt64))
final _opLeftParen:stdgo.regexp.syntax.Op = ((129 : stdgo.regexp.syntax.Syntax.Op))
Pseudo-ops for parsing stack.
final _opPseudo:stdgo.regexp.syntax.Op = ((128 : stdgo.regexp.syntax.Syntax.Op))
where pseudo-ops start
final _opVerticalBar:stdgo.regexp.syntax.Op = ((129 : stdgo.regexp.syntax.Syntax.Op))
Pseudo-ops for parsing stack.
final _runeSize:stdgo.GoUInt64 = ((4i64 : stdgo.GoUInt64))
maxRunes is the maximum number of runes allowed in a regexp tree counting the runes in all the nodes. Ignoring character classes p.numRunes is always less than the length of the regexp. Character classes can make it much larger: each \pL adds 1292 runes. 128 MB is enough for 32M runes, which is over 26k \pL instances. Note that repetitions do not make copies of the rune slices, so \pL{1000} is only one rune slice, not 1000. We could keep a cache of character classes we've seen, so that all the \pL we see use the same rune list, but that doesn't remove the problem entirely: consider something like [\pL01234][\pL01235][\pL01236]...[\pL^&*()]. And because the Rune slice is exposed directly in the Regexp, there is not an opportunity to change the representation to allow partial sharing between different character classes. So the limit is the best we can do.
rune is int32
final _testFlags:stdgo.regexp.syntax.Flags = ((204 : stdgo.regexp.syntax.Syntax.Flags))
final classNL:stdgo.regexp.syntax.Flags = ((512 : stdgo.regexp.syntax.Syntax.Flags))
allow character classes like [^a-z] and [[:space:]] to match newline
final dotNL:stdgo.regexp.syntax.Flags = ((512 : stdgo.regexp.syntax.Syntax.Flags))
allow . to match newline
final emptyBeginLine:stdgo.regexp.syntax.EmptyOp = ((32 : stdgo.regexp.syntax.Syntax.EmptyOp))
final emptyBeginText:stdgo.regexp.syntax.EmptyOp = ((32 : stdgo.regexp.syntax.Syntax.EmptyOp))
final emptyEndLine:stdgo.regexp.syntax.EmptyOp = ((32 : stdgo.regexp.syntax.Syntax.EmptyOp))
final emptyEndText:stdgo.regexp.syntax.EmptyOp = ((32 : stdgo.regexp.syntax.Syntax.EmptyOp))
final emptyNoWordBoundary:stdgo.regexp.syntax.EmptyOp = ((32 : stdgo.regexp.syntax.Syntax.EmptyOp))
final emptyWordBoundary:stdgo.regexp.syntax.EmptyOp = ((32 : stdgo.regexp.syntax.Syntax.EmptyOp))
final errInternalError:stdgo.regexp.syntax.ErrorCode = (((("regexp/syntax: internal error" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
Unexpected error
final errInvalidCharClass:stdgo.regexp.syntax.ErrorCode = (((("invalid character class" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
Parse errors
final errInvalidCharRange:stdgo.regexp.syntax.ErrorCode = (((("invalid character class range" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
final errInvalidEscape:stdgo.regexp.syntax.ErrorCode = (((("invalid escape sequence" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
final errInvalidNamedCapture:stdgo.regexp.syntax.ErrorCode = (((("invalid named capture" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
final errInvalidPerlOp:stdgo.regexp.syntax.ErrorCode = (((("invalid or unsupported Perl syntax" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
final errInvalidRepeatOp:stdgo.regexp.syntax.ErrorCode = (((("invalid nested repetition operator" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
final errInvalidRepeatSize:stdgo.regexp.syntax.ErrorCode = (((("invalid repeat count" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
final errInvalidUTF8:stdgo.regexp.syntax.ErrorCode = (((("invalid UTF-8" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
final errLarge:stdgo.regexp.syntax.ErrorCode = (((("expression too large" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
final errMissingBracket:stdgo.regexp.syntax.ErrorCode = (((("missing closing ]" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
final errMissingParen:stdgo.regexp.syntax.ErrorCode = (((("missing closing )" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
final errMissingRepeatArgument:stdgo.regexp.syntax.ErrorCode = (((("missing argument to repetition operator" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
final errNestingDepth:stdgo.regexp.syntax.ErrorCode = (((("expression nests too deeply" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
final errTrailingBackslash:stdgo.regexp.syntax.ErrorCode = (((("trailing backslash at end of expression" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
final errUnexpectedParen:stdgo.regexp.syntax.ErrorCode = (((("unexpected )" : stdgo.GoString)) : stdgo.regexp.syntax.Syntax.ErrorCode))
final foldCase:stdgo.regexp.syntax.Flags = ((512 : stdgo.regexp.syntax.Syntax.Flags))
case-insensitive match
final instAlt:stdgo.regexp.syntax.InstOp = ((10 : stdgo.regexp.syntax.Syntax.InstOp))
final instAltMatch:stdgo.regexp.syntax.InstOp = ((10 : stdgo.regexp.syntax.Syntax.InstOp))
final instCapture:stdgo.regexp.syntax.InstOp = ((10 : stdgo.regexp.syntax.Syntax.InstOp))
final instEmptyWidth:stdgo.regexp.syntax.InstOp = ((10 : stdgo.regexp.syntax.Syntax.InstOp))
final instFail:stdgo.regexp.syntax.InstOp = ((10 : stdgo.regexp.syntax.Syntax.InstOp))
final instMatch:stdgo.regexp.syntax.InstOp = ((10 : stdgo.regexp.syntax.Syntax.InstOp))
final instNop:stdgo.regexp.syntax.InstOp = ((10 : stdgo.regexp.syntax.Syntax.InstOp))
final instRune:stdgo.regexp.syntax.InstOp = ((10 : stdgo.regexp.syntax.Syntax.InstOp))
final instRune1:stdgo.regexp.syntax.InstOp = ((10 : stdgo.regexp.syntax.Syntax.InstOp))
final instRuneAny:stdgo.regexp.syntax.InstOp = ((10 : stdgo.regexp.syntax.Syntax.InstOp))
final instRuneAnyNotNL:stdgo.regexp.syntax.InstOp = ((10 : stdgo.regexp.syntax.Syntax.InstOp))
final literal:stdgo.regexp.syntax.Flags = ((512 : stdgo.regexp.syntax.Syntax.Flags))
treat pattern as literal string
final matchNL:stdgo.regexp.syntax.Flags = ((12 : stdgo.regexp.syntax.Syntax.Flags))
final nonGreedy:stdgo.regexp.syntax.Flags = ((512 : stdgo.regexp.syntax.Syntax.Flags))
make repetition operators default to non-greedy
final oneLine:stdgo.regexp.syntax.Flags = ((512 : stdgo.regexp.syntax.Syntax.Flags))
treat ^ and $ as only matching at beginning and end of text
final opAlternate:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches alternation of Subs
final opAnyChar:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches any character
final opAnyCharNotNL:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches any character except newline
final opBeginLine:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches empty string at beginning of line
final opBeginText:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches empty string at beginning of text
final opCapture:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
capturing subexpression with index Cap, optional name Name
final opCharClass:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches Runes interpreted as range pair list
final opConcat:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches concatenation of Subs
final opEmptyMatch:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches empty string
final opEndLine:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches empty string at end of line
final opEndText:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches empty string at end of text
final opLiteral:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches Runes sequence
final opNoMatch:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches no strings
final opNoWordBoundary:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches word non-boundary \\\B\
final opPlus:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches Sub[0] one or more times
final opQuest:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches Sub[0] zero or one times
final opRepeat:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches Sub[0] at least Min times, at most Max (Max == -1 is no limit)
final opStar:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches Sub[0] zero or more times
final opWordBoundary:stdgo.regexp.syntax.Op = ((19 : stdgo.regexp.syntax.Syntax.Op))
matches word boundary \\\b\
final perl:stdgo.regexp.syntax.Flags = ((212 : stdgo.regexp.syntax.Syntax.Flags))
as close to Perl as possible
final perlX:stdgo.regexp.syntax.Flags = ((512 : stdgo.regexp.syntax.Syntax.Flags))
allow Perl extensions
final posix:stdgo.regexp.syntax.Flags = ((0 : stdgo.regexp.syntax.Syntax.Flags))
POSIX syntax
final simple:stdgo.regexp.syntax.Flags = ((512 : stdgo.regexp.syntax.Syntax.Flags))
regexp contains no counted repetition
final unicodeGroups:stdgo.regexp.syntax.Flags = ((512 : stdgo.regexp.syntax.Syntax.Flags))
allow \p{Han}, \P{Han} for Unicode group and negation
final wasDollar:stdgo.regexp.syntax.Flags = ((512 : stdgo.regexp.syntax.Syntax.Flags))
regexp OpEndText was $, not \z
Variables
import stdgo.regexp.syntax.Syntax
var __Op_index_0:stdgo.GoArray<stdgo.GoUInt8>
var _anyRune:stdgo.Slice<stdgo.GoInt32>
var _anyRuneNotNL:stdgo.Slice<stdgo.GoInt32>
var _anyTable:stdgo.Ref<stdgo.unicode.RangeTable>
var _code1:stdgo.Slice<stdgo.GoInt32>
var _code10:stdgo.Slice<stdgo.GoInt32>
var _code11:stdgo.Slice<stdgo.GoInt32>
var _code12:stdgo.Slice<stdgo.GoInt32>
var _code13:stdgo.Slice<stdgo.GoInt32>
var _code14:stdgo.Slice<stdgo.GoInt32>
var _code15:stdgo.Slice<stdgo.GoInt32>
var _code16:stdgo.Slice<stdgo.GoInt32>
var _code17:stdgo.Slice<stdgo.GoInt32>
var _code2:stdgo.Slice<stdgo.GoInt32>
var _code3:stdgo.Slice<stdgo.GoInt32>
var _code4:stdgo.Slice<stdgo.GoInt32>
var _code5:stdgo.Slice<stdgo.GoInt32>
var _code6:stdgo.Slice<stdgo.GoInt32>
var _code7:stdgo.Slice<stdgo.GoInt32>
var _code8:stdgo.Slice<stdgo.GoInt32>
var _code9:stdgo.Slice<stdgo.GoInt32>
var _compileTests:stdgo.Slice<stdgo.regexp.syntax.T__struct_1>
var _foldcaseTests:stdgo.Slice<stdgo.regexp.syntax.T_parseTest>
var _instOpNames:stdgo.Slice<stdgo.GoString>
var _invalidRegexps:stdgo.Slice<stdgo.GoString>
var _literalTests:stdgo.Slice<stdgo.regexp.syntax.T_parseTest>
var _matchnlTests:stdgo.Slice<stdgo.regexp.syntax.T_parseTest>
var _nomatchnlTests:stdgo.Slice<stdgo.regexp.syntax.T_parseTest>
var _onlyPOSIX:stdgo.Slice<stdgo.GoString>
var _onlyPerl:stdgo.Slice<stdgo.GoString>
var _opNames:stdgo.Slice<stdgo.GoString>
var _parseTests:stdgo.Slice<stdgo.regexp.syntax.T_parseTest>
var _perlGroup:stdgo.GoMap<stdgo.GoString, stdgo.regexp.syntax.T_charGroup>
var _posixGroup:stdgo.GoMap<stdgo.GoString, stdgo.regexp.syntax.T_charGroup>
var _simplifyTests:stdgo.Slice<stdgo.regexp.syntax.T__struct_2>
var _sink:stdgo.AnyInterface
Functions
import stdgo.regexp.syntax.Syntax
function _appendClass
function _appendClass(_r:stdgo.Slice<stdgo.GoRune>, _x:stdgo.Slice<stdgo.GoRune>):stdgo.Slice<stdgo.GoRune>
appendClass returns the result of appending the class x to the class r. It assume x is clean.
function _appendFoldedClass
function _appendFoldedClass(_r:stdgo.Slice<stdgo.GoRune>, _x:stdgo.Slice<stdgo.GoRune>):stdgo.Slice<stdgo.GoRune>
appendFoldedClass returns the result of appending the case folding of the class x to the class r.
function _appendFoldedRange
function _appendFoldedRange(_r:stdgo.Slice<stdgo.GoRune>, _lo:stdgo.GoRune, _hi:stdgo.GoRune):stdgo.Slice<stdgo.GoRune>
appendFoldedRange returns the result of appending the range lo-hi and its case folding-equivalent runes to the class r.
function _appendLiteral
function _appendLiteral(_r:stdgo.Slice<stdgo.GoRune>, _x:stdgo.GoRune, _flags:stdgo.regexp.syntax.Flags):stdgo.Slice<stdgo.GoRune>
appendLiteral returns the result of appending the literal x to the class r.
function _appendNegatedClass
function _appendNegatedClass(_r:stdgo.Slice<stdgo.GoRune>, _x:stdgo.Slice<stdgo.GoRune>):stdgo.Slice<stdgo.GoRune>
appendNegatedClass returns the result of appending the negation of the class x to the class r. It assumes x is clean.
function _appendNegatedTable
function _appendNegatedTable(_r:stdgo.Slice<stdgo.GoRune>, _x:stdgo.Ref<stdgo.unicode.RangeTable>):stdgo.Slice<stdgo.GoRune>
appendNegatedTable returns the result of appending the negation of x to the class r.
function _appendRange
function _appendRange(_r:stdgo.Slice<stdgo.GoRune>, _lo:stdgo.GoRune, _hi:stdgo.GoRune):stdgo.Slice<stdgo.GoRune>
appendRange returns the result of appending the range lo-hi to the class r.
function _appendTable
function _appendTable(_r:stdgo.Slice<stdgo.GoRune>, _x:stdgo.Ref<stdgo.unicode.RangeTable>):stdgo.Slice<stdgo.GoRune>
appendTable returns the result of appending x to the class r.
function _bw
function _bw(_b:stdgo.Ref<stdgo.strings.Builder>, _args:haxe.Rest<stdgo.GoString>):Void
function _checkUTF8
function _checkUTF8(_s:stdgo.GoString):stdgo.Error
function _cleanAlt
function _cleanAlt(_re:stdgo.Ref<stdgo.regexp.syntax.Regexp>):Void
cleanAlt cleans re for eventual inclusion in an alternation.
function _cleanClass
function _cleanClass(_rp:stdgo.Ref<stdgo.Slice<stdgo.GoRune>>):stdgo.Slice<stdgo.GoRune>
cleanClass sorts the ranges (pairs of elements of r), merges them, and eliminates duplicates.
function _dump
function _dump(_re:stdgo.Ref<stdgo.regexp.syntax.Regexp>):stdgo.GoString
dump prints a string representation of the regexp showing the structure explicitly.
function _dumpInst
function _dumpInst(_b:stdgo.Ref<stdgo.strings.Builder>, _i:stdgo.Ref<stdgo.regexp.syntax.Inst>):Void
function _dumpProg
function _dumpProg(_b:stdgo.Ref<stdgo.strings.Builder>, _p:stdgo.Ref<stdgo.regexp.syntax.Prog>):Void
function _dumpRegexp
function _dumpRegexp(_b:stdgo.Ref<stdgo.strings.Builder>, _re:stdgo.Ref<stdgo.regexp.syntax.Regexp>):Void
dumpRegexp writes an encoding of the syntax tree for the regexp re to b. It is used during testing to distinguish between parses that might print the same using re's String method.
function _escape
function _escape(_b:stdgo.Ref<stdgo.strings.Builder>, _r:stdgo.GoRune, _force:Bool):Void
function _isCharClass
function _isCharClass(_re:stdgo.Ref<stdgo.regexp.syntax.Regexp>):Bool
can this be represented as a character class? single-rune literal string, char class, ., and .|\n.
function _isUpperFold
function _isUpperFold(_r:stdgo.GoRune):Bool
function _isValidCaptureName
function _isValidCaptureName(_name:stdgo.GoString):Bool
isValidCaptureName reports whether name is a valid capture name: [A-Za-z0-9_]+. PCRE limits names to 32 bytes. Python rejects names starting with digits. We don't enforce either of those.
function _isalnum
function _isalnum(_c:stdgo.GoRune):Bool
function _literalRegexp
function _literalRegexp(_s:stdgo.GoString, _flags:stdgo.regexp.syntax.Flags):stdgo.Ref<stdgo.regexp.syntax.Regexp>
function _makePatchList
function _makePatchList(_n:stdgo.GoUInt32):stdgo.regexp.syntax.T_patchList
function _matchRune
function _matchRune(_re:stdgo.Ref<stdgo.regexp.syntax.Regexp>, _r:stdgo.GoRune):Bool
does re match r?
function _mergeCharClass
function _mergeCharClass(_dst:stdgo.Ref<stdgo.regexp.syntax.Regexp>, _src:stdgo.Ref<stdgo.regexp.syntax.Regexp>):Void
mergeCharClass makes dst = dst | src. |
---|
The caller must ensure that dst.Op \>= src.Op, to reduce the amount of copying.
function _minFoldRune
function _minFoldRune(_r:stdgo.GoRune):stdgo.GoRune
minFoldRune returns the minimum rune fold-equivalent to r.
function _mkCharClass
function _mkCharClass(_f:()):stdgo.GoString
function _negateClass
function _negateClass(_r:stdgo.Slice<stdgo.GoRune>):stdgo.Slice<stdgo.GoRune>
negateClass overwrites r and returns r's negation. It assumes the class r is already clean.
function _nextRune
function _nextRune(_s:stdgo.GoString):{
_2:stdgo.Error;
_1:stdgo.GoString;
_0:stdgo.GoRune;
}
function _parse
function _parse(_s:stdgo.GoString, _flags:stdgo.regexp.syntax.Flags):{
_1:stdgo.Error;
_0:stdgo.Ref<stdgo.regexp.syntax.Regexp>;
}
function _repeatIsValid
function _repeatIsValid(_re:stdgo.Ref<stdgo.regexp.syntax.Regexp>, _n:stdgo.GoInt):Bool
repeatIsValid reports whether the repetition re is valid. Valid means that the combination of the top-level repetition and any inner repetitions does not exceed n copies of the innermost thing. This function rewalks the regexp tree and is called for every repetition, so we have to worry about inducing quadratic behavior in the parser. We avoid this by only calling repeatIsValid when min or max \>= 2. In that case the depth of any \>= 2 nesting can only get to 9 without triggering a parse error, so each subtree can only be rewalked 9 times.
function _simplify1
function _simplify1(_op:stdgo.regexp.syntax.Op, _flags:stdgo.regexp.syntax.Flags, _sub:stdgo.Ref<stdgo.regexp.syntax.Regexp>, _re:stdgo.Ref<stdgo.regexp.syntax.Regexp>):stdgo.Ref<stdgo.regexp.syntax.Regexp>
simplify1 implements Simplify for the unary OpStar, OpPlus, and OpQuest operators. It returns the simple regexp equivalent to
Regexp{Op: op, Flags: flags, Sub: {sub}}
under the assumption that sub is already simple, and without first allocating that structure. If the regexp to be returned turns out to be equivalent to re, simplify1 returns re instead.
simplify1 is factored out of Simplify because the implementation for other operators generates these unary expressions. Letting them call simplify1 makes sure the expressions they generate are simple.
function _testParseDump
function _testParseDump(_t:stdgo.Ref<stdgo.testing.T_>, _tests:stdgo.Slice<stdgo.regexp.syntax.T_parseTest>, _flags:stdgo.regexp.syntax.Flags):Void
Test Parse -\> Dump.
function _u32
function _u32(_i:stdgo.GoUInt32):stdgo.GoString
function _unhex
function _unhex(_c:stdgo.GoRune):stdgo.GoRune
function _unicodeTable
function _unicodeTable(_name:stdgo.GoString):{
_1:stdgo.Ref<stdgo.unicode.RangeTable>;
_0:stdgo.Ref<stdgo.unicode.RangeTable>;
}
unicodeTable returns the unicode.RangeTable identified by name and the table of additional fold-equivalent code points.
function _writeRegexp
function _writeRegexp(_b:stdgo.Ref<stdgo.strings.Builder>, _re:stdgo.Ref<stdgo.regexp.syntax.Regexp>):Void
writeRegexp writes the Perl syntax for the regular expression re to b.
function benchmarkEmptyOpContext
function benchmarkEmptyOpContext(_b:stdgo.Ref<stdgo.testing.B>):Void
function benchmarkIsWordChar
function benchmarkIsWordChar(_b:stdgo.Ref<stdgo.testing.B>):Void
function compile
function compile(_re:stdgo.Ref<stdgo.regexp.syntax.Regexp>):{
_1:stdgo.Error;
_0:stdgo.Ref<stdgo.regexp.syntax.Prog>;
}
Compile compiles the regexp into a program to be executed. The regexp should have been simplified already (returned from re.Simplify).
function emptyOpContext
function emptyOpContext(_r1:stdgo.GoRune, _r2:stdgo.GoRune):stdgo.regexp.syntax.EmptyOp
EmptyOpContext returns the zero-width assertions satisfied at the position between the runes r1 and r2. Passing r1 == -1 indicates that the position is at the beginning of the text. Passing r2 == -1 indicates that the position is at the end of the text.
function isWordChar
function isWordChar(_r:stdgo.GoRune):Bool
IsWordChar reports whether r is considered a “word character” during the evaluation of the \b and \B zero-width assertions. These assertions are ASCII-only: the word characters are [A-Za-z0-9_].
function parse
function parse(_s:stdgo.GoString, _flags:stdgo.regexp.syntax.Flags):{
_1:stdgo.Error;
_0:stdgo.Ref<stdgo.regexp.syntax.Regexp>;
}
Parse parses a regular expression string s, controlled by the specified Flags, and returns a regular expression parse tree. The syntax is described in the top-level comment.
function testAppendRangeCollapse
function testAppendRangeCollapse(_t:stdgo.Ref<stdgo.testing.T_>):Void
function testCompile
function testCompile(_t:stdgo.Ref<stdgo.testing.T_>):Void
function testFoldConstants
function testFoldConstants(_t:stdgo.Ref<stdgo.testing.T_>):Void
function testParseFoldCase
function testParseFoldCase(_t:stdgo.Ref<stdgo.testing.T_>):Void
function testParseInvalidRegexps
function testParseInvalidRegexps(_t:stdgo.Ref<stdgo.testing.T_>):Void
function testParseLiteral
function testParseLiteral(_t:stdgo.Ref<stdgo.testing.T_>):Void
function testParseMatchNL
function testParseMatchNL(_t:stdgo.Ref<stdgo.testing.T_>):Void
function testParseNoMatchNL
function testParseNoMatchNL(_t:stdgo.Ref<stdgo.testing.T_>):Void
function testParseSimple
function testParseSimple(_t:stdgo.Ref<stdgo.testing.T_>):Void
function testSimplify
function testSimplify(_t:stdgo.Ref<stdgo.testing.T_>):Void
function testToStringEquivalentParse
function testToStringEquivalentParse(_t:stdgo.Ref<stdgo.testing.T_>):Void
Classes
import stdgo.regexp.syntax.*
class Inst
An Inst is a single instruction in a regular expression program.
var arg:stdgo.GoUInt32
var op:stdgo.regexp.syntax.InstOp
var out:stdgo.GoUInt32
var rune:stdgo.Slice<stdgo.GoInt32>
Inst function new
function new(?op:Null<stdgo.regexp.syntax.InstOp>, ?out:stdgo.GoUInt32, ?arg:stdgo.GoUInt32, ?rune:stdgo.Slice<stdgo.GoInt32>):Void
Inst function _op
function _op():stdgo.regexp.syntax.InstOp
op returns i.Op but merges all the Rune special cases into InstRune
Inst function matchEmptyWidth
function matchEmptyWidth( _before:stdgo.GoRune, _after:stdgo.GoRune):Bool
MatchEmptyWidth reports whether the instruction matches an empty string between the runes before and after. It should only be called when i.Op == InstEmptyWidth.
Inst function matchRune
function matchRune( _r:stdgo.GoRune):Bool
MatchRune reports whether the instruction matches (and consumes) r. It should only be called when i.Op == InstRune.
Inst function matchRunePos
function matchRunePos( _r:stdgo.GoRune):stdgo.GoInt
MatchRunePos checks whether the instruction matches (and consumes) r. If so, MatchRunePos returns the index of the matching rune pair (or, when len(i.Rune) == 1, rune singleton). If not, MatchRunePos returns -1. MatchRunePos should only be called when i.Op == InstRune.
Inst function string
function string():stdgo.GoString
class Prog
A Prog is a compiled regular expression program.
var inst:stdgo.Slice<stdgo.regexp.syntax.Inst>
var numCap:stdgo.GoInt
var start:stdgo.GoInt
Prog function new
function new(?inst:stdgo.Slice<stdgo.regexp.syntax.Inst>, ?start:stdgo.GoInt, ?numCap:stdgo.GoInt):Void
Prog function _skipNop
function _skipNop( _pc:stdgo.GoUInt32):stdgo.Ref<stdgo.regexp.syntax.Inst>
skipNop follows any no-op or capturing instructions.
Prog function prefix
function prefix():{
_1:Bool;
_0:stdgo.GoString;
}
Prefix returns a literal string that all matches for the regexp must start with. Complete is true if the prefix is the entire match.
Prog function startCond
function startCond():stdgo.regexp.syntax.EmptyOp
StartCond returns the leading empty-width conditions that must be true in any match. It returns ^EmptyOp(0) if no matches are possible.
Prog function string
function string():stdgo.GoString
class Regexp
A Regexp is a node in a regular expression syntax tree.
var cap:stdgo.GoInt
var flags:stdgo.regexp.syntax.Flags
var max:stdgo.GoInt
var min:stdgo.GoInt
var name:stdgo.GoString
var op:stdgo.regexp.syntax.Op
var rune:stdgo.Slice<stdgo.GoInt32>
var rune0:stdgo.GoArray<stdgo.GoInt32>
var sub:stdgo.Slice<stdgo.Ref<stdgo.regexp.syntax.Regexp>>
var sub0:stdgo.GoArray<stdgo.Ref<stdgo.regexp.syntax.Regexp>>
Regexp function new
function new(?op:Null<stdgo.regexp.syntax.Op>, ?flags:Null<stdgo.regexp.syntax.Flags>, ?sub:stdgo.Slice<stdgo.Ref<stdgo.regexp.syntax.Regexp>>, ?sub0:stdgo.GoArray<stdgo.Ref<stdgo.regexp.syntax.Regexp>>, ?rune:stdgo.Slice<stdgo.GoInt32>, ?rune0:stdgo.GoArray<stdgo.GoInt32>, ?min:stdgo.GoInt, ?max:stdgo.GoInt, ?cap:stdgo.GoInt, ?name:stdgo.GoString):Void
Regexp function _capNames
function _capNames( _names:stdgo.Slice<stdgo.GoString>):Void
Regexp function capNames
function capNames():stdgo.Slice<stdgo.GoString>
CapNames walks the regexp to find the names of capturing groups.
Regexp function equal
function equal( _y:stdgo.Ref<stdgo.regexp.syntax.Regexp>):Bool
Equal reports whether x and y have identical structure.
Regexp function maxCap
function maxCap():stdgo.GoInt
MaxCap walks the regexp to find the maximum capture index.
Regexp function simplify
function simplify():stdgo.Ref<stdgo.regexp.syntax.Regexp>
Simplify returns a regexp equivalent to re but without counted repetitions and with various other simplifications, such as rewriting /(?:a+)+/ to /a+/. The resulting regexp will execute correctly but its string representation will not produce the same parse tree, because capturing parentheses may have been duplicated or removed. For example, the simplified form for /(x){1,2}/ is /(x)(x)?/ but both parentheses capture as $1. The returned regexp may share structure with or be the original.
Regexp function string
function string():stdgo.GoString
class T_error
An Error describes a failure to parse a regular expression and gives the offending expression.
var code:stdgo.regexp.syntax.ErrorCode
var expr:stdgo.GoString
T_error function new
function new(?code:Null<stdgo.regexp.syntax.ErrorCode>, ?expr:stdgo.GoString):Void
T_error function error
function error():stdgo.GoString
Typedefs
import stdgo.regexp.syntax.*
typedef EmptyOp
typedef EmptyOp = stdgo.GoUInt8;
An EmptyOp specifies a kind or mixture of zero-width assertions.
typedef ErrorCode
typedef ErrorCode = stdgo.GoString;
An ErrorCode describes a failure to parse a regular expression.
typedef Flags
typedef Flags = stdgo.GoUInt16;
Flags control the behavior of the parser and record information about regexp context.
typedef InstOp
typedef InstOp = stdgo.GoUInt8;
An InstOp is an instruction opcode.
typedef Op
typedef Op = stdgo.GoUInt8;
An Op is a single regular expression operator.
typedef T__struct_0
typedef T__struct_0 = {
};
typedef T__struct_1
typedef T__struct_1 = {
regexp:stdgo.GoString;
prog:stdgo.GoString;
};
typedef T__struct_2
typedef T__struct_2 = {
simple:stdgo.GoString;
regexp:stdgo.GoString;
};