Rules of the game:
Note: Wiki not updated, actual design header: http:/people/ct/escaping.h
Contents
Set Up
define an array of str_esc
Where each entry is in the form of:
{
{ "unescaped", "escaped", flags, special_parser },
...
{ 0,0,0,0 }
}
unescaped
might be set to NOCHARS to indicate that escaped can't be trivially converted to unescaped
- The string size of at least one byte is assumed so "\0" stands for a literal nullbyte
- This strings might not contain "\0" on another than the first position, "" is illegal
escaped
- is the escaped representation of unescaped (or at least the start of it)
- escaped strings never ever contain "\0"
special_parser
- is either NULL or points to a parser_func which parses the remainder of an escape sequence
flags can be:
ESCAPE_GENERIC : use a generic escape instead of "escaped" when doing escaping (Example: accept "\ " but generate "\x20" for spaces)
Note that encodings which contain zero bytes on another than the first position need a special_parser.
generate the lookup tables
str_esclookup * str_esclookup_alloc (alloc_limits limits,
str_esc * conv,
isprint_cnt_func,
escape_func)
conv
- is the table from 1).
isprint_cnt_func
- is a pointer to a function which returns the number of next bytes which form a character which do not need escaping
escape_func
- is a pointer to a function which can generate a generic escape sequence for the next character('s)
Note: the lookup structure contains arrays/trees of integer indices rather than pointers, so it is relocateable and static. That makes it possible to generate it when compiling the program instead dynamically generating it.
The user has to lim_free() the lookup table after use.
converting unescaped to escaped strings:
t_uchar * str_save_escaped (alloc_limits limits,
const t_uchar * str,
int size,
const str_conv_lookup * lookup)
limits
- allocation domain, see hackerlab doc
str
- is the unescaped string
size
- is either the size if str (in case it might contain zero bytes) or -1 where the size is determined automatically
lookup
- is a pointer to the datastructure described before
the string is freshly allocated and need to be freed by the user Semantic:
for every char in str
is_print_cnt is invoked, found characters are copied verbatim
if the return was zero
it lookups the longest matching string from unescaped
if found and GENERIC flag is not set
copy the escaped string
else
escape_func is invoked to generate a generic escape
converting escaped to unescaped strings:
t_uchar * str_save_unescaped (alloc_limits limits,
const t_uchar * str,
size_t* size,
const str_esc_lookup * lookup)
limits
- allocation domain, see hackerlab doc
str
- pointer to an escaped C-String
size
- pointer where the function should store the final size of the unescaped string
lookup
- lookup datastructure, see above.
Semantic:
for ever character in str
is_print_cnt is invoked, found characters are copied verbatim
if return was zero
lookup longest matching escaped string
if found
if unescaped != NOCHARS
copy the unescaped string
if parser_func != 0
invoke parserfunc
else
free memory, abort, return zero ( this is the only case of an error, illegal escape sequence)
Note: the string is freshly allocated and need to be freed by the user
DISCLAIMER
This is only a very terse and not fully correct explanation, human imagination is required to get it right!