On Sat, 15 Feb 2025 23:32:37 -0500
David Malcolm <dmalc...@redhat.com> wrote:

In defense of lack of free(3) ...

> > +const char *
> > +esc( size_t len, const char input[] ) {
> > +  static char spaces[] = "([,;]?[[:space:]])+";
> > +  static char spaceD[] = "(\n {6}D" "|" "[,;]?[[:space:]])+";
> > +  static char buffer[64 * 1024];
> > +  char *p = buffer;
> > +  const char *eoinput = input + len;
> > +
> > +  const char *spacex = is_reference_format()? spaceD : spaces;
> > +
> > +  for( const char *s=input; *s && s < eoinput; s++ ) {
> > +    *p = '\0';
> > +    gcc_assert( size_t(p - buffer) < sizeof(buffer) - 4 );

overflow guarded here

> > +    switch(*s) {
> > +    case '^': case '$':
> > +    case '(': case ')':
> > +    case '*': case '+': case '?':
> > +    case '[': case ']':
> > +    case '{': case '}':
> > +    case '|':
> > +    case '.':
> > +      *p++ = '\\';
> > +      *p++ = *s;
> > +      break;
> > +    case '\\':
> > +      *p++ = '[';
> > +      *p++ = *s;
> > +      *p++ = ']';
> > +      break;
> > +
> > +    case ';': case ',':
> > +      if( ! (s+1 < eoinput && s[1] == 0x20) ) {
> > +        *p++ = *s;
> > +        break;
> > +      }
> > +      __attribute__((fallthrough));
> > +    case 0x20: case '\n':
> > +      gcc_assert(p + sizeof(spacex) < buffer + sizeof(buffer));

and overflow guarded here, the only place where more than 4 characters can be 
inserted into the buffer.  

> > +      p = stpcpy( p, spacex );
> > +      while( s+1 < eoinput && is_separator_space(s+1)) {
> > +        s++;
> > +      }
> > +      break;
> > +    default:
> > +      *p++ = *s;
> > +      break;
> > +    }
> > +  }
> > +  *p = '\0';
...
> > +  return xstrdup(buffer);
> > +}
> 
> Has a fixed size 64k buffer; doesn't seem to have proper overflow
> handling. Could use a pretty_printer to accumulate chars.

Thank you for these comments.  Let me see if I can alleviate your concerns.  

This function is called from exactly one place, where the file-reader, lexio, 
parses a REPLACE directive.  The COBOL input says "REPLACE X BY Y" subject to 
some constraints.  Because we're using regex to find X, and because X might be 
any arbitrary string, the esc() function escapes regex metacharacters prior to 
executing the regex.  

IMHO it's unlikely the resulting regex input will exceed 64 KB.  It's unlikely 
to be even 64 bytes.  (Usually X is a COBOL identifier, limited to 64 bytes by 
ISO.)  Granted, a fixed maximum is a limitation.  But I put it to you: which is 
more likely?  For a regex to exceed 64 KB, or for heap allocation to fail to 
return adequate memory?  If there's some crazy input, I'd rather die at 64 KB 
than consume gigabytes of swap on the way to crashing.  

(As a matter of fact, the function returns a copy of the used portion of the 
static buffer, an unnecessary allocation.  The caller soon replaces it, and 
could have used a pointer into that static buffer.)  

If there is a built-in function in gcc to escape a regex, that would be 
preferable to this gnarly code.  Is that what you mean by "pretty printer"?  

> Returns a allocated buffer as "const char *", which should be just a
> "char  *".

The caller has no business writing to the allocated buffer, which is input to 
the regex call.  Caller and called agree on const.  Isn't that what "const" is 
for?  

--jkl

Reply via email to