Re: [misc] Expression matching in calcurse: use regular expressions?

On Sat, Apr 21, 2012 at 10:12:14PM +0200, Baptiste Jonglez wrote:
> Hi,
> As a preliminary note, the way expression matching (for matching
> dates, times, durations, etc) is currently implemented in calcurse is,
> imho, mostly satisfying.
> It might well be a case of "reinventing the wheel", but there is some
> nice code in there; for instance, the DFA used by "parse_duration()"
> in "src/utils.c" is implemented quite neatly.
> However, I'm wondering if such an approach is viable. Maybe using a
> proper regex library would be a good fit (I don't think it has been
> discussed before).
> Since POSIX requires some regex(7) primitives (see also regex(3)),
> using these should not hamper calcurse's portability.
> Here is the case for using regex(7) primitives where applicable,
> instead of the current by-hand if-based (or DFA-based) parsing:
> - allows to use fewer LoC for the same functionality, thus:
>     - less time writing code in the first place (does not apply to
>       existing code, however)
>     - increased clarity: everything is expressing in terms of regexes
>     - less time understanding code (from a calcurse hacker perspective)
>     - less time modifying existing parsing code
>     - fewer parsing bugs: if we use the right regex, then chances are
>       that the parsing will be ok
> - would not damage calcurse's portability (provided there aren't too
>   much glibc-specific tricks in glibc's regex(3) primitives)
> Here is the case against:
> - it takes time to do
> - it would be a shame to trash away neat code like parse_duration()'s DFA

Actually, the right way to implement this is using a lexer generator,
such as lex(1) or flex(1). That has all the benefits you mentioned
above with the additional bonus of DFAs being built at compilation time
which means that:

* No time is wasted with building and minimizing DFAs at runtime.
* The lexer code can be optimized at compilation time.

I had the very same idea a while ago (I even thought of using lexers for
parsing data and configuration files) but I'd like to postpone
implementation until 3.0.0 is out.

> What do you think? I know I'm probably going to get some "patches
> welcome" answers; I'll try to put something together in the coming
> days, but I would first like to have some opinions about this.

While we're talking about patches... You should probably wait with
coding since Frederic and me are discussing on using another coding
standard. Either try to get them in quickly or wait until we came to a
conclusion :)

> Anyway, this is non-critical for the 3.0 release, so there's probably
> no need to hurry.
> Regards,
> Baptiste