Re: Syntax error if paragraph contains more than 1 printable character

2023-12-14 Thread James K. Lowden
On Wed, 13 Dec 2023 19:01:22 -0500
Steve Litt  wrote:

> >.+/\n  { ... return LINE; }
> >(\n[[:blank:]]*){2,} { return SEP; } // two or more blank lines
> >\n   { /* ignore */ }
> 
> Thanks James, this looks great!

You're welcome.  It occurs to me that

.+/\n

is the same as

.+

so, simpler still.  :-) 


> I won't need to consider end of line spaces because I now have a sed 1
> liner preprocessor that gets rid of trailing space :-).

Flex is a regex engine, and can do anything sed can do.  Your system is
simpler if it can deal with all acceptable input, without
preprocessing.  

Rather than remove trailing blanks from the input, I would remove them
in flex.  The problem can be solved with regular expressions but,
since we're only matching one value, it's easily done in an action: 

.+  {
for( auto p = yytext + yyleng - 1; p >= yytext; p-- ) {
if( *p != 0x20 ) break;
*p = '\0';
}


To solve it with regex, 

([[:blank:]]*[[:^space:]])+ { ... return LINE; }
[[:blank:]]+$   // ignore

--jkl





Re: Syntax error if paragraph contains more than 1 printable character

2023-12-14 Thread Steve Litt
James K. Lowden said on Wed, 13 Dec 2023 12:42:20 -0500
>Rather than remove trailing blanks from the input, I would remove them
>in flex.  The problem can be solved with regular expressions but,
>since we're only matching one value, it's easily done in an action: 
>
>   .+  {
>   for( auto p = yytext + yyleng - 1; p >= yytext; p-- ) {
>   if( *p != 0x20 ) break;
>   *p = '\0';
>   }

Nice!

>
>
>To solve it with regex, 
>
>   ([[:blank:]]*[[:^space:]])+ { ... return LINE; }
>   [[:blank:]]+$   // ignore

Nice!

Thanks James. I'll be investigating your techniques in the near future.
As you can see, your suggestions contributed to my working text to HTML
baby Hello World.

SteveT

Steve Litt 

Autumn 2023 featured book: Rapid Learning for the 21st Century
http://www.troubleshooters.com/rl21