A proof of concept
10
Sep 2004
Contents
1. Objectives
2. Outside in
3. C Functions
4. The scanner
5. Lua handlers
1. Objectives
To better focus the discussion on ABCp the
so called "proof of concept" illustrates its main features.
You can download it from the download page.
To keep things simple I created a stand-alone executable
named abcp that statically links a library (libabcp.lib). The library
contains the ABCp scanner (i.e. the function that split an ABC file
in tokens). It should contain a proper parser, but this will be
the subject of the next version of this proof of concept!
If you already have read the proposal
(and I suggest you to do it), please consider that this poc is for
the "event trigger parser" described in section
4.
2. Outside in
Let's start from the executable for this proof of concept
and then dive into the source code to understand how it works.
abcp.exe contains a scanner for ABC files;
let's execute it on the following ABC file (t1.abc):
X:1
d3 e d2|d2 c2 B2|A3 B A2| |
with the command abcp t1.abc
and we will have:
t1.abc 001,001:T_FIELD X:
t1.abc 001,003:T_FIELDB 1
t1.abc 002,001:T_STARTLINE
t1.abc 002,001:T_NOTE d3
t1.abc 002,003:T_WSPACE
t1.abc 002,004:T_NOTE e
t1.abc 002,005:T_WSPACE
t1.abc 002,006:T_NOTE d2
t1.abc 002,008:T_BAR |
t1.abc 002,009:T_NOTE d2
t1.abc 002,011:T_WSPACE
t1.abc 002,012:T_NOTE c2
t1.abc 002,014:T_WSPACE
t1.abc 002,015:T_NOTE B2
t1.abc 002,017:T_BAR |
t1.abc 002,018:T_NOTE A3
t1.abc 002,020:T_WSPACE
t1.abc 002,021:T_NOTE B
t1.abc 002,022:T_WSPACE
t1.abc 002,023:T_NOTE A2
t1.abc 002,025:T_BAR |
t1.abc 002,026:T_ENDLINE
t1.abc 002,000:T_EOF |
As you can see it reports the tokens found in the file.
It does not attempt to make any sense of it!
It does not, for example, check that the field body of X
is a number; this is the task for a complete parser. For example
it entirely missed the fact that no K: field was found.
3. C functions
How is this achieved? The main function has simply
called the function abcScanFile() with two parameters:
the name of the file and a pointer to a function to be called each
time a match is found.
Here is the code:
int abcScanFile(char *filename,abcHandler handler)
{
abcScanner *as;
as=abcOpen(filename,1024);
if (as) {
while (as && (as->state != S_EOF)) {
abcNextToken(as);
if (handler) {
if (handler(as->token,as->tokstr,as->filename,as->line,m_column(as)) > 0)
m_eof(as);
}
}
abcClose(as);
return 0;
}
return 1;
}
|
A new scanner (abcScanner ) is created
by abcOpen() and the file is traversed using abcNextToken() .
Every time a token is found, the function handler()
gets called with the appropriate parameters.
The complexity now is on the shoulder of the handler()
function: the programmer does not need to know or use anything else
than abcScanFile().
4. The Scanner
Let's look at the scanner source code for while. If
you open the file abcpscan.re you will see how the
abcNextToken() function is implemented using the re2c
tool. Here is a little piece of code:
DEC { RETURN(s,T_DECOR); }
[(] D ([:] D?)* { RETURN(s,T_NPLET); }
[(] { RETURN(s,T_OPENSLUR); } |
On the left there is a regular expression (that could
be defined elsewhere as in DEC), on the right a piece of C code
that get executed when the input matches the regular expression.
To better understand how re2c works, look at the following
image:
Initially the variable YYCURSOR points
at the first non-matched char. When a match is found, YYCURSOR
is advanced and the C code inside the block is executed.
YYLIMIT point to the end of the line.
When YYCURSOR == YYLIMIT (or when *YYLIMIT == '\0' )
the line is empty.
I've defined some C macro to streamline the code so
it should be easy to modify.
In the example above the second
line says that whenever the input matches a parenthesis followed by a
digit and possibly followed by other digits separated by colons,
an nplet as defined in the ABC standard 2.0 draft
has been found.
The RETURN() macro stops the scanner and
reports to the caller that a token of type T_NPLET
has been found (together with the actual string, the line and column
where it has been found).
There is much more to say about the abcScanner
structure and how the lines are buffered but we can go into those
details some other time.
5. Lua handlers
We could now start writing C programs that take advantage
of our scanner. It only requires us to write appropriate actions
for tokens like T_NOTE, T_BAR, etc.
But this would mean that recompilation should be done
any time we want to change something. Another approach exists.
The abcp.exe executable is statically
linked with a Lua
interpreter and the handler function is as follows:
int myhandler(Tokens t, char *s,char *f,USHORT l, USHORT c)
{
if (L) {
lua_getglobal(L,"abcHandler");
lua_pushstring(L,abcTokenName(t)); /* The token as string */
lua_pushstring(L,s); /* The token string */
lua_pushstring(L,f); /* The filename */
lua_pushnumber(L,l); /* The line number */
lua_pushnumber(L,c); /* The column number */
return(lua_pcall(L,5,1,0));
}
else {
printf("%s %03d,%03d:%s %s\n",f,l,c,abcTokenName(t),s);
return 0;
}
}
|
This means that if Lua has been initialized (L
not NULL ) a Lua function named abcHandler
is called with the appropriate arguments.
The file test1.lua contains the following function:
function abcHandler(tok, str,file,ln, col)
print(string.format("%s %03d,%03d:%s %s",file,ln,col,tok,str))
return 0;
end
|
that performs exactly the same print as the C function.
Execute it with abcp t1.abc test1.lua
and you will have the same output as before created by the Lua function.
Not very useful but should make the point!
|