SNOBOL/SPITBOL Patterns for Lua

libspipat Lua wrapper

Abstract

The following document is the lspipat Lua 5.1 module documentation and reference.


Thanks To...

lspipat would not be possible without:

  • Phil Budne, for spipat. lspipat is merely a spipat wrapper.
  • Robert Dewar who has created Macro SPITBOL and the GNAT.Spitbol package. spipat was derived from GNAT.Spitbol, which is based on Macro SPITBOL.

Table of Contents

Introduction
1. Resources
2. Comparison with SNOBOL
3. Installation
Dependencies
Configuration Options
4. Usage
5. Examples
6. Variable Deferring Techniques
Recursive Patterns
I. Module Reference
smatch — Perform pattern match on a subject string
ssub — Substitute substrings matching a pattern in a subject
siter — Return iterator of substrings matching a pattern in a subject
free — Finalize pattern
Conversion — Convert a value to a pattern — Render a pattern as a string
dump — Dump a pattern to stdout
Concatenation and Alternation — Concatenate patterns — Alternate patterns
Assignment Calls — Call Immediately — Deferred Call
Cursor Assignment Calls — Cursor Assignment
Predicates — Predicate Constructor
String Primitives — Match any character in a set — Match any character not in a set — Match characters up to a break character — Match characters up to a break character (extending) — Match nothing or characters from a set — Match characters from a set
Arbno — Matches a pattern any number of times
Fence — Abort match when alternations are sought
Integer Primitives — Match a number of characters — Match null string if number of characters have been matched — Match null string if number of characters remain to be matched — Match characters until number of characters have been matched — Match characters until number of characters remain to be matched
Miscelleanous Primitives — Matches any string — Matches parentheses balanced strings — Immediately abort pattern match — Null alternation — Match the entire remaining subject string — Match the null string in every alternative
POSIX Extended Regular Expressions — Matches a pattern equivalent to a regular expression

List of Tables

2.1. Comparision of SPITBOL and lspipat operators
2. Dynamic Function Return Values
3. String Primitives
4. Fence Primitive
5. Integer Primitives
6. Miscelleanous Primitives

List of Examples

6.1. Function Closures for Deferring Purposes
6.2. Custom Constructors for Deferring Purposes
6.3. Generic Retrievers for Deferring Purposes
6.4. Recursive Patterns
6.5. Recursive Pattern Trick
6. Replacements with spipat.ssub
7. Iterating through substrings with spipat.siter
8. Finalizing a pattern
9. Explicit pattern construction & implicit conversion to strings
10. Concatenations and Alternations
11. Regular Expressions

Introduction

lspipat is a wrapper to spipat that brings support for a first-class SNOBOL/SPITBOL-like pattern data type. Patterns can be constructed and subsequently combined with other patterns, strings, numbers and functions using binary and unary operators allowing the construction of grammars describing any Context Free Language. Patterns can be matched against any Lua string. A major difference to other pattern matching techniques like regular expressions, besides the supported language class, is the possibility to construct patterns/grammars in a readable and intuitive way, somewhat reminiscent of the BNF.

They can include pattern elements that have side-effects (i.e. Lua code executed during pattern matching) or produce and influence pattern elements dynamically. For instance, functions can be specified that are executed during matching to produce the parameters necessary for the interpretation of a pattern element. Code can be embedded that generates entire patterns on the fly. Matching previously matched substrings and implementing recursive patterns is only one application of the powerful dynamic pattern elements traditionally offered by SNOBOL pattern matching and thus by lspipat.

SNOBOL/SPITBOL pattern matching was traditionally used in compiler construction and prototyping, artificial intelligence research and the humanities.

Chapter 1. Resources

These internet resources are more or less directly related to lspipat and might be useful to you:

Chapter 2. Comparison with SNOBOL

Just as patterns in SNOBOL are combined and constructed dynamically with binary and unary operators, lspipat also uses operators available in Lua to construct patterns in a simple and intuitive way. The operators and pattern-construction functions were chosen, so the pattern construction syntax is as similar as possible to SNOBOL/SPITBOL. The following table shows a comparision of operators between SPITBOL and lspipat:

Table 2.1. Comparision of SPITBOL and lspipat operators

OperationSPITBOLlspipatNotes
Alternation|+

Refer to Concatenation and Alternation. Cannot be used to combine two strings.

Concatenation(space)*
Immediate Assignment/Call$%

% and / have the same precedence as * in Lua. Also only call versions are supported (see Chapter 6, Variable Deferring Techniques).

Deferred Assignment/Call./
Cursor Assignment@ (unary)# (unary)

Refer to Cursor Assignment Calls. lspipat only supports a call version (see Chapter 6, Variable Deferring Techniques).

Setcur
Defer Expression* (unary)- (unary) or Pred

Refer to Predicates. In general, expressions can be wrapped in (anonymous) functions to defer them.

Interrogation/Predicate? (unary)
Pattern Match?smatch

Refer to smatch. S ? P is roughly equivalent to S:smatch(P) in Lua.

(space)
Substring Replacement=ssub

Refer to ssub. S P = R is roughly equivalent to S:ssub(P, R, 1) in Lua.


Chapter 3. Installation

lspipat uses an autotools buildsystem. The standard INSTALL file contains instructions on how to use it from a package builder's perspective. Nevertheless, there are some quirks that should be mentioned.

Dependencies

  • spipat 0.9.3+: You are advised to apply the patch spipat-patches/0.9.3+_image.patch first before building spipat, even though it is not mandatory. It fixes a header file (so lspipat can make use of customized render-to-string functionality) and various bugs.

  • Lua 5.1: You probably have this already. The configure script should be able to cope with Ubuntu and Lua Binaries distributions. The standalone Lua compiler is only required if compilation of Lua scripts is enabled.

Configuration Options

The following special configure script options are supported:

--enable-lua-libdir=DIR

Change the installation directory of lspipat. It defaults to LIBDIR/lua/5.1. You probably want this to point to some directory in Lua's module search path, so the default should be ok.

--disable-lua-precompile

Disable precompilation of Lua source files. Naturally, a Lua compiler will not be required when this option is used.

--disable-lua-strip

Do not strip (i.e. remove debugging symbols from) compiled Lua sources.

--disable-html-doc

Do not generate HTML documentation. The documentation is usually derived from Docbook using XSLTProc. Disabling this may be useful if you have got some problem with the tool chain but are satisfied with the precompiled documentation in the distribution.

Furthermore, you should note that render-to-string results are not reminiscent of lspipat syntax (used in this document) by default. For lspipat to be able to customize these renderings, configure has to find some spipat headers which are not normally installed. Therefore it is highly recommended to add spipat's source directory to the C include search path using the CPPFLAGS variable before running configure.

Thus, supposing that spipat sources are located in your home directory, the most common way to install lspipat would be:

./configure CPPFLAGS=-I~/spipat-0.9.3+
make install

Chapter 4. Usage

After lspipat has been installed properly, you will be able use it in your Lua program by simply requiring lspipat (i.e. require "lspipat").

The module table will be called spipat, but many functions (especially pattern constructors) will be registered as globals as well. Also, some operators will be overloaded. For details on all that (operators, globals, etc.) refer to Module Reference.

Chapter 5. Examples

The samples directory in the lspipat source package contains some small examples that I hope give you some inspiration on how and where to use lspipat.

samples/exp2bf.lua

exp2bf.lua expression

Compiles simple arithmetic expressions to Brainfuck programs that when executed evaluate the expression and print the result (8-bit unsigned integer arithmetics). Prints these programs to stdout.

Use that for whatever you can imagine ;-)

samples/wave.lua

wave.lua wavefile

Validates/parses WAV files and prints some information about it.

This is an example of how to use lspipat to do pattern matching on "binary" data (formats, protocols). Some primitives were implemented in Lua for that reason - in the future there might be a separate C-module to do the encoding/decoding of integers in different byte-orders more efficiently.

samples/regexp.lua

Small regular expression example/test - uses a comprehensive regular expression describing IPs.

Chapter 6. Variable Deferring Techniques

Table of Contents

Recursive Patterns

In SNOBOL, arbitrary expressions could be deferred (i.e. their evaluation could be deferred) by using the unary asterisk operator. With lspipat however, you will have to pass functions (which can be constructed anonymously) to the appropriate constructors to achieve the same goal.

Deferring expressions which should be combined with other patterns is one application of the Pred constructor and - operator respectively.

Deferring variables is just a special case of deferring expressions. In this chapter, different ways of optimizing variable deferrings will be explained using a simple example.

For instance if you would like to assign a matched quotation character to a local variable and use that to subsequently match a simple quote/string, you could use function closures to write something like that:

Example 6.1. Function Closures for Deferring Purposes

local cquote
string = Any("\"'") / function(c) cquote = c end
       * Break(function() return cquote end)
       * -function() return cquote end

You may find this solution a bit verbose, compared with SNOBOL's elegant syntax. To save some typing you could define your own constructors that take the name of a global variable (as a string) and construct patterns whose arguments are retrieved by a function closure accessing the globals table.

Example 6.2. Custom Constructors for Deferring Purposes

function _Break(name)
       return Break(function() return _G[name] end)
end
function _Pred(name)
       return -function() return _G[name] end
end

string = Any("\"'") / function(c) cquote = c end
       * _Break "cquote"
       * _Pred  "cquote"

Of course, if you do not want to pollute the global namespace your custom functions could just as well access a local table. Furthermore, you could optimize the code by defining one generic table access function which is suitable to be used for lspipat's pattern constructors - being able to pass so called cookies to functions comes in handy.

Example 6.3. Generic Retrievers for Deferring Purposes

function getGlobal(name) return _G[name] end
function _Break(name) return Break(getGlobal, name) end
function _Pred(name) return Pred(getGlobal, name) end
-- ...

Fortunately, lspipat already defines such constructors (deferring global variables) for you. Whereever possible, there will be versions of constructors with leading underscores that work similar to the ones in the example above. You can of course overwrite these constructors, e.g. with versions accessing a special local table.

Recursive Patterns

Recursive patterns can be implemented just as described above. Supposing you want to match the repetition of the predefined pattern P (greedy) you could write something like that:

Example 6.4. Recursive Patterns

foo = P * -"foo" + ""

Sometimes however when using global variables is inappropriate, you might want to do the following trick:

Example 6.5. Recursive Pattern Trick

local function foo() return foo end
foo = P * -foo + ""

It works because foo is still a function in the scope of the assignment's right side, but a pattern afterwards so the function - to which no (direct) reference exists anymore - will return the pattern foo after the assignment.

Module Reference


A compilation of all functions in the lspipat module, global functions registered by the module, methods and overloaded operators follows.

Table of Contents

smatch — Perform pattern match on a subject string
ssub — Substitute substrings matching a pattern in a subject
siter — Return iterator of substrings matching a pattern in a subject
free — Finalize pattern
Conversion — Convert a value to a pattern — Render a pattern as a string
dump — Dump a pattern to stdout
Concatenation and Alternation — Concatenate patterns — Alternate patterns
Assignment Calls — Call Immediately — Deferred Call
Cursor Assignment Calls — Cursor Assignment
Predicates — Predicate Constructor
String Primitives — Match any character in a set — Match any character not in a set — Match characters up to a break character — Match characters up to a break character (extending) — Match nothing or characters from a set — Match characters from a set
Arbno — Matches a pattern any number of times
Fence — Abort match when alternations are sought
Integer Primitives — Match a number of characters — Match null string if number of characters have been matched — Match null string if number of characters remain to be matched — Match characters until number of characters have been matched — Match characters until number of characters remain to be matched
Miscelleanous Primitives — Matches any string — Matches parentheses balanced strings — Immediately abort pattern match — Null alternation — Match the entire remaining subject string — Match the null string in every alternative
POSIX Extended Regular Expressions — Matches a pattern equivalent to a regular expression

Name

smatch — Perform pattern match on a subject string

Synopsis

spipat.smatch ( subject , pattern [, flags] )

subject:smatch ( pattern [, flags] )

Description

Tries to match pattern against subject using the given flags.

Parameters

  1. subject (string): A string against which the pattern match will be performed
  2. pattern (userdata): The pattern used for matching
  3. flags (number or nil): Optional spipat flags.

Spipat Flags

Flags are added (e.g. spipat.match_anchored + spipat.match_debug), due to the lack of a logical/binary or operator in Lua.

  • spipat.match_anchored: Match in anchored mode
  • spipat.match_debug: Match with progress being printed to stdout. Useful for pattern debugging as the name suggests.

Return Values

In case of an exception during matching, raises an error. In case no substring matches, returns a single nil value. Otherwise returns

  1. number: Start of matched substring
  2. number: End of matched substring

Name

ssub — Substitute substrings matching a pattern in a subject

Synopsis

spipat.ssub ( subject , pattern , replacement [ , n [, flags]] )

subject:ssub ( pattern , replacement [ , n [, flags]] )

Description

Substitutes regions in subject matching pattern either with a string if replacement is a string or if replacement is a function, the result of calling that function. This may be useful for deferring the evaluation of replacement strings which depend on (are built from) results of the matching process (e.g. call-on-match or call-immediately function executions).

Parameters

  1. subject (string): The subject for the first pattern match
  2. pattern (userdata): The pattern used for matching
  3. replacement (string or function): Replacement string or a function that's executed after matching to produce the replacement string
  4. n (number or nil): Optional maximal number of match/replacement operations. The first match is performed on subject, subsequent matches on the result of the preceding replacements. Naturally replacement stops when the pattern does not match anymore. If n is absent or nil, replacement only stops when pattern does not match anymore.
  5. flags (number or nil): Optional spipat flags, as in the section called “Spipat Flags”.

Return Values

In case of an exception during matching, raises an error. Otherwise returns

  1. string: The result of the last replacement performed or the original subject if no substring matched at all
  2. number: The number of match/replacement operations actually performed

Example

Example 6. Replacements with spipat.ssub

> print(spipat.ssub("abc ccC bab", Span("abc") / function(s) str = s end, function() return "["..str:upper().."]" end, 2))
[ABC] [CC]C BaB
>


Name

siter — Return iterator of substrings matching a pattern in a subject

Synopsis

spipat.siter ( subject , pattern [, flags] )

subject:siter ( pattern [, flags] )

Description

Returns an iterator function performing a pattern match on subject and returning the matched substring (start/end positions in subject). Each time it is called, it begins matching where the last substring ended, but using the same subject.

Parameters

  1. subject (string): The subject used for pattern matching
  2. pattern (userdata): The pattern used for matching. Naturally, anchoring the pattern using any of the possible methods is nonsense.
  3. flags (number or nil): Optional spipat flags, as in the section called “Spipat Flags”.

Return Values

In case of an exception during matching, raises an error. Otherwise returns

  1. function: The iterator function. Calling it returns
    1. number: Start of matched substring
    2. number: End of matched substring

Example

Example 7. Iterating through substrings with spipat.siter

> str = "abc"
> for s, e in str:siter(Len(1)) do print(str:sub(s, e)) end
a
b
c
>


Name

free — Finalize pattern

Synopsis

spipat.free ( pattern )

pattern:free ()

Description

Finalizes pattern, i.e. frees memory associated with it and unreferences any other Lua values (other patterns, functions, etc.) so they can get garbage collected.

Finalizing an already finalized pattern does nothing. Using a finalized pattern in any function or operator working with a pattern will raise an error.

Tip

free does early what would otherwise be done when the pattern is garbage collected, so in most cases you will not need it at all. It may be useful when you would like to free a large pattern you do not need anymore but removing all references to that pattern and enforcing a full garbage collection cycle is not feasible.

Parameters

  1. pattern (userdata): The pattern to be finalized

Return Values

Returns nothing.

Example

Example 8. Finalizing a pattern

> p = Arb()
> p:free()
> print(p * "foo")
stdin:1: Pattern already freed
>


Name

topattern — Convert a value to a pattern

tostring — Render a pattern as a string

Synopsis

spipat.topattern ( value )

topattern ( value )

value:topattern ()

tostring ( pattern )

Description

topattern creates a pattern for a string or number, matching that string or number. If value is already a pattern it returns that pattern without modification. In case of an unsupported value type or miscelleaneous error, topattern always returns nil.

Tip

topattern is useful to explicitly create pattern, e.g. when an operator requires at least one operand to be a pattern but both are strings, numbers or functions.

Lua's built-in tostring function called on a pattern renders that pattern as a string reminiscent of lspipat's pattern construction syntax.

Example

Example 9. Explicit pattern construction & implicit conversion to strings

> print("2" + 3)
5
> print(topattern("2") + 3)
("2" + "3")
>


Name

dump — Dump a pattern to stdout

Synopsis

spipat.dump ( pattern )

Description

dump prints information about a pattern to stdout. The kind of information displayed is similar to tostring's rendering.

It is useful for debugging purposes.

Parameters

  1. pattern (userdata): The pattern to be dumped

Return Values

Returns nothing.


Name

* — Concatenate patterns

+ — Alternate patterns

Synopsis

pattern * value

value * pattern

pattern * pattern

pattern + value

value + pattern

pattern + pattern

Description

The * operator constructs a concatenation of two values if at least one of them is a pattern and returns the result as a pattern. A concatenation matches the left operand immediately followed by the right operand.

The + operator constructs an alternation between two values if at least one of them is a pattern and returns the result as a pattern. An alternation matches the left operand and if unsuccessful the right operand.

The non-pattern values may be strings or numbers, which are matched just like a pattern built by topattern.

Note

Even though the patterns participating in the composition will be copied, references will be kept, so they will not be garbage collected until all patterns using them are garbage collected.

Return Values

  1. pattern (userdata): Result of the pattern composition

Example

Example 10. Concatenations and Alternations

> pat = (topattern("ABC") + "AB") * (topattern("DEF") + "CDE") * (topattern("GH") + "IJ")
> assert(spipat.smatch("ABCCDEGH", pat))
> assert(spipat.smatch("ABCDEFIJ", pat))
>


Name

% — Call Immediately

/ — Deferred Call

Synopsis

pattern % function

pattern / function

Description

The % operator constructs a pattern matching operand pattern and calling a Lua function whenever pattern matches during a pattern match (i.e. function may be called more than once while matching regardless of whether the match fails or succeeds).

On the other hand, the / operator constructs a pattern matching operand pattern and calling a Lua function at most once - only if the match succeeds.

In both cases, function receives the following arguments when called:

  1. string: The substring matched by pattern

Its return value is ignored.

Note

Unlike assignment operators in SNOBOL, the % and / operators in Lua have the same precedence as the concatenation operator *, so using parentheses is advised.

Tip

Deferred assignments (assign on match & assign immediately) are not directly possible but can be easily implemented using function closures as described in Chapter 6, Variable Deferring Techniques.

Note

Even though the pattern operands will be copied, references will be kept, so they will not be garbage collected until all patterns using them are garbage collected.

Furthermore, references to functions will be kept so they will not be garbage collected until the patterns constructed by the operators are garbage collected.

Return Values

  1. pattern (userdata): Pattern built by the operators

Name

Setcur — Cursor Assignment

Synopsis

spipat.Setcur ( function [, cookie] )

Setcur ( function [, cookie] )

# function

spipat._Setcur ( string )

_Setcur ( string )

Description

Setcur is a pattern constructor returning a pattern matching the null string "" (i.e. always succeeds when matched) and immediately calling a Lua function when matched. This function receives the following arguments when called:

  1. number: The cursor in the subject string. In other words, the number of characters matched so far from the beginning of the subject string.
  2. cookie: Any Lua value specified as a cookie in the pattern constructor or nil if no cookie was specified.

Its return value is ignored.

Tip

The unary # operator is equivalent to the Setcur constructor with no cookie specified.

_Setcur is similar to Setcur but actually assigns the cursor position to the global variable whose name is specified by a string value. This means that _Setcur(str) does not assign the cursor position to the global variable str but rather to the variable with the name str contains, e.g. foo if str == "foo". So generally _Setcur is equivalent to:

function _Setcur(val)
	return #function(str) _G[val] = str end
end

In a similar manner, other kinds of deferred assignments can be implemented using function closures as described in Chapter 6, Variable Deferring Techniques.

Note

References to function and cookie will be kept so they will not be garbage collected until the pattern constructed by Setcur is garbage collected.

Return Values

  1. pattern (userdata): Pattern built by the constructor

Name

Pred — Predicate Constructor

Synopsis

spipat.Pred ( function [, cookie] )

Pred ( function [, cookie] )

- function

spipat._Pred ( string )

_Pred ( string )

- string

Description

Pred constructs a pattern which allows you to transparently define its matching behaviour using a function called when this pattern is attempted to be matched. It receives the following arguments when invoked:

  1. cookie: Any Lua value specified as a cookie in the pattern constructor or nil if no cookie was specified.

The function's return value defines the behaviour dynamically, as shown in the following table:

Table 2. Dynamic Function Return Values

ValueTypeBehaviour
nilnil

Match the "" string, i.e. succeed.

trueboolean
false

Pattern match fails, like when using the Fail primitive.

any number

Try to match that number as a string, as if converted to a pattern.

any string

Try to match that string, as if converted to a pattern.

any pattern

Try to match that pattern. Returning a pattern assigned to a variable is the way to implement recursive patterns.


Tip

The unary - operator applied to a function is equivalent to the Pred constructor with no cookie specified.

_Pred is similar to Pred but actually gets the Lua value defining its behaviour from the global variable whose name is specified by a string value. This means that _Pred(str) does not get the value from the global variable str but rather from the variable with the name str contains, e.g. foo if str == "foo". So generally _Pred is equivalent to:

function _Pred(val)
	return -function() return _G[val] end
end

In a similar manner, other kinds of variable deferring as well as recursive patterns can be implemented using function closures as described in Chapter 6, Variable Deferring Techniques.

Tip

The unary - operator applied to a string which is not convertable to a number is equivalent to the _Pred constructor - naturally this should be true for all global variable names. This constraint comes from the way Lua handles operations by default (it checks whether it is an arithmetic operation before evaluating any metamethod - see metatables).

Note

References to function and cookie will be kept so they will not be garbage collected until the pattern constructed by Pred is garbage collected.

Return Values

  1. pattern (userdata): Pattern built by the constructor

Name

Any — Match any character in a set

NotAny — Match any character not in a set

Break — Match characters up to a break character

BreakX — Match characters up to a break character (extending)

NSpan — Match nothing or characters from a set

Span — Match characters from a set

Synopsis

[spipat.]Any ( set )

[spipat.]Any ( function [, cookie] )

[spipat.]_Any ( string )

[spipat.]NotAny ( set )

[spipat.]NotAny ( function [, cookie] )

[spipat.]_NotAny ( string )

[spipat.]Break ( set )

[spipat.]Break ( function [, cookie] )

[spipat.]_Break ( string )

[spipat.]BreakX ( set )

[spipat.]BreakX ( function [, cookie] )

[spipat.]_BreakX ( string )

[spipat.]NSpan ( set )

[spipat.]NSpan ( function [, cookie] )

[spipat.]_NSpan ( string )

[spipat.]Span ( set )

[spipat.]Span ( function [, cookie] )

[spipat.]_Span ( string )

Description

String primitives are pattern constructors that in their first form all take a string or number (which is converted to a string) as their sole argument (set).

In their second form they take a Lua function and an optional cookie as arguments. When the constructed pattern is about to be matched, the function is called and is supposed to return a string or number (which is converted to a string) to supply the primitive's argument dynamically. It receives the following arguments when invoked:

  1. cookie: Any Lua value specified as a cookie in the pattern constructor or nil if no cookie was specified.

The primitives with a leading underscore (e.g. _Any) are similar but actually get their argument from a global variable with the name a string argument contains. This means that for instance _Any(str) does not get its character set from the global variable str but rather from the variable with the name str contains, e.g. foo if str == "foo". So generally _Any is equivalent to:

function _Any(val)
	return Any(function() return _G[val] end)
end

In a similar manner, other kinds of variable deferring can be implemented using function closures as described in Chapter 6, Variable Deferring Techniques.

Note

References to function and cookie will be kept so they will not be garbage collected until the pattern constructed is garbage collected.

The following table describes what these primitives do:

Table 3. String Primitives

PrimitiveDescription
Any( S )

Where S is a string, matches a single character that is any one of the characters in S. Fails if the current character is not one of the given set of characters.

NotAny( S )

Where S is a string, matches a single character that is not one of the characters of S. Fails if the current characer is one of the given set of characters.

Break( S )

Where S is a string, matches a string of zero or more characters up to but not including a break character that is one of the characters given in the string S. Can match the null string, but cannot match the last character in the string, since a break character is required to be present.

BreakX( S )

Where S is a string, behaves exactly like Break(S) when it first matches, but if a string is successfully matched, then a susequent failure causes an attempt to extend the matched string.

NSpan( S )

Where S is a string, matches a string of zero or more characters that is among the characters given in the string. Always matches the longest possible such string. Always succeeds, since it can match the null string.

Span( S )

Where S is a string, matches a string of one or more characters that is among the characters given in the string. Always matches the longest possible such string. Fails if the current character is not one of the given set of characters.


Return Values

  1. pattern (userdata): Pattern built by the constructor

Name

Arbno — Matches a pattern any number of times

Synopsis

spipat.Arbno ( P )

Arbno ( P )

Description

Where P is any pattern, matches any number of instances of the pattern, starting with zero occurrences. It is thus equivalent to ("" + (P * ("" + (P * ("" ....)))). The pattern P may contain any number of pattern elements including the use of alternation and concatenation.

Arbno is a pattern constructor taking exactly one argument which is either a pattern or string (which is treated like it is converted to a pattern first).

Note

A reference to P will be kept if it is a pattern so it will not be garbage collected until the pattern constructed is garbage collected.

Return Values

  1. pattern (userdata): Pattern built by Arbno

Name

Fence — Abort match when alternations are sought

Synopsis

spipat.Fence ( [P] )

Fence ( [P] )

Description

Fence is a pattern constructor taking no or exactly one pattern as an argument.

Note

A reference to pattern P will be kept so it will not be garbage collected until the pattern constructed is garbage collected.

The following table describes what the two versions do:

Table 4. Fence Primitive

PrimitiveDescription
Fence()

Matches the null string at first, and then if a failure causes alternatives to be sought, aborts the match (like a Cancel). Note that using Fence at the start of a pattern has the same effect as matching in anchored mode.

Fence( P )

Where P is a pattern, attempts to match the pattern P including trying all possible alternatives of P. If none of these alternatives succeeds, then the Fence pattern fails. If one alternative succeeds, then the pattern match proceeds, but on a subsequent failure, no attempt is made to search for alternative matches of P. The pattern P may contain any number of pattern elements including the use of alternatiion and concatenation.


Return Values

  1. pattern (userdata): Pattern built by Fence

Name

Len — Match a number of characters

Pos — Match null string if number of characters have been matched

RPos — Match null string if number of characters remain to be matched

Tab — Match characters until number of characters have been matched

RTab — Match characters until number of characters remain to be matched

Synopsis

[spipat.]Len ( [n] )

[spipat.]Len ( function [, cookie] )

[spipat.]_Len ( string )

[spipat.]Pos ( [n] )

[spipat.]Pos ( function [, cookie] )

[spipat.]_Pos ( string )

[spipat.]RPos ( [n] )

[spipat.]RPos ( function [, cookie] )

[spipat.]_RPos ( string )

[spipat.]Tab ( [n] )

[spipat.]Tab ( function [, cookie] )

[spipat.]_Tab ( string )

[spipat.]RTab ( [n] )

[spipat.]RTab ( function [, cookie] )

[spipat.]_RTab ( string )

Description

Integer primitives are pattern constructors that in their first form all take a number or string (which is converted to a number) as their sole argument (n). This number has to be an unsigned integer - sometimes a natural number depending on the primitive.

Tip

If the argument is ommitted, zero is assumed.

In their second form the primitives take a Lua function and an optional cookie as arguments. When the constructed pattern is about to be matched, the function is called and is supposed to return a number or string (which is converted to a number) to supply the primitive's argument dynamically. It receives the following arguments when invoked:

  1. cookie: Any Lua value specified as a cookie in the pattern constructor or nil if no cookie was specified.

The primitives with a leading underscore (e.g. _Len) are similar but actually get their argument from a global variable with the name a string argument contains. This means that for instance _Len(str) does not get its argument from the global variable str but rather from the variable with the name str contains, e.g. foo if str == "foo". So generally _Len is equivalent to:

function _Len(val)
	return Len(function() return _G[val] end)
end

In a similar manner, other kinds of variable deferring can be implemented using function closures as described in Chapter 6, Variable Deferring Techniques.

Note

References to function and cookie will be kept so they will not be garbage collected until the pattern constructed is garbage collected.

The following table describes what these primitives do:

Table 5. Integer Primitives

PrimitiveDescription
Len( N )

Where N is a natural number, matches the given number of characters. For example, Len(10) matches any string that is exactly ten characters long.

Pos( N )

Where N is a natural number, matches the null string if exactly N characters have been matched so far, and otherwise fails.

RPos( N )

Where N is a natural number, matches the null string if exactly N characters remain to be matched, and otherwise fails.

Tab( N )

Where N is a natural number, matches characters from the current position until exactly N characters have been matched in all. Fails if more than N characters have already been matched.

RTab( N )

Where N is a natural number, matches characters from the current position until exactly N characters remain to be matched in the string. Fails if fewer than N unmatched characters remain in the string.


Return Values

  1. pattern (userdata): Pattern built by the constructor

Name

Arb — Matches any string

Bal — Matches parentheses balanced strings

Abort — Immediately abort pattern match

Fail — Null alternation

Rem — Match the entire remaining subject string

Succeed — Match the null string in every alternative

Synopsis

spipat.Arb ()

Arb ()

spipat.Bal ()

Bal ()

spipat.Abort ()

Abort ()

spipat.Fail ()

Fail ()

spipat.Rem ()

Rem ()

spipat.Succeed ()

Succeed ()

Description

These are simple pattern constructor functions.

The following table describes what these primitives do:

Table 6. Miscelleanous Primitives

PrimitiveDescription
Arb()

Matches any string. First it matches the null string, and then on a subsequent failure, matches one character, and then two characters, and so on. It only fails if the entire remaining string is matched.

Bal()

Matches a non-empty string that is parentheses balanced with respect to ordinary () characters. Examples of balanced strings are "ABC", "A((B)C)", and "A(B)C(D)E". Bal matches the shortest possible balanced string on the first attempt, and if there is a subsequent failure, attempts to extend the string.

Abort()

Immediately aborts the entire pattern match, signalling failure. This is a specialized pattern element, which is useful in conjunction with some of the special pattern elements that have side effects.

Fail()

The null alternation. Matches no possible strings, so it always signals failure. This is a specialized pattern element, which is useful in conjunction with some of the special pattern elements that have side effects.

Rem()

Matches from the current point to the last character in the string. This is a specialized pattern element, which is useful in conjunction with some of the special pattern elements that have side effects.

Succeed()

Repeatedly matches the null string (it is equivalent to the alternation ("" + "" + "" ....). This is a special pattern element, which is useful in conjunction with some of the special pattern elements that have side effects.


Return Values

  1. pattern (userdata): Pattern built by the constructor

Name

RegExp — Matches a pattern equivalent to a regular expression

Synopsis

spipat.RegExp ( expression [, captures] )

RegExp ( expression [, captures] )

Description

RegExp constructs from a POSIX Extended Regular Expression, a pattern that is equivalent to that regular expression and can be combined with other patterns freely.

It can optionally construct the pattern to save the captures from a regular expression match in a Lua table.

Warning

Even though this implementation should support almost all elements of EREs, it is considered experimental. You are advised to use the usual pattern construction primitives.

Parameters

  1. expression (string): The POSIX ERE which is compiled to a pattern.
  2. captures (table): Optional table, or more precisely array, to hold subexpression captures. Naturally, it has to exist when RegExp is called. When a subexpression is captured (i.e. the pattern equivalent to what is enclosed in parentheses), the matching string is added to the end of the table. Thus taken that captures is initially empty, if RegExp("(a(b))", captures) matches, captures will be {"b", "ab"}.

Return Values

  1. pattern (userdata): Pattern built by RegExp

Example

Example 11. Regular Expressions

> print(RegExp "^[[:digit:]]*?(abc\\.|de?)")
Pos(0) * Arbno(Any(<CS>)) * ("abc." + "d" * ("" + "e"))
>