On Sat, 15 Feb 2025 23:32:37 -0500 David Malcolm <dmalc...@redhat.com> wrote:
In defense of lack of free(3) ... > > +const char * > > +esc( size_t len, const char input[] ) { > > + static char spaces[] = "([,;]?[[:space:]])+"; > > + static char spaceD[] = "(\n {6}D" "|" "[,;]?[[:space:]])+"; > > + static char buffer[64 * 1024]; > > + char *p = buffer; > > + const char *eoinput = input + len; > > + > > + const char *spacex = is_reference_format()? spaceD : spaces; > > + > > + for( const char *s=input; *s && s < eoinput; s++ ) { > > + *p = '\0'; > > + gcc_assert( size_t(p - buffer) < sizeof(buffer) - 4 ); overflow guarded here > > + switch(*s) { > > + case '^': case '$': > > + case '(': case ')': > > + case '*': case '+': case '?': > > + case '[': case ']': > > + case '{': case '}': > > + case '|': > > + case '.': > > + *p++ = '\\'; > > + *p++ = *s; > > + break; > > + case '\\': > > + *p++ = '['; > > + *p++ = *s; > > + *p++ = ']'; > > + break; > > + > > + case ';': case ',': > > + if( ! (s+1 < eoinput && s[1] == 0x20) ) { > > + *p++ = *s; > > + break; > > + } > > + __attribute__((fallthrough)); > > + case 0x20: case '\n': > > + gcc_assert(p + sizeof(spacex) < buffer + sizeof(buffer)); and overflow guarded here, the only place where more than 4 characters can be inserted into the buffer. > > + p = stpcpy( p, spacex ); > > + while( s+1 < eoinput && is_separator_space(s+1)) { > > + s++; > > + } > > + break; > > + default: > > + *p++ = *s; > > + break; > > + } > > + } > > + *p = '\0'; ... > > + return xstrdup(buffer); > > +} > > Has a fixed size 64k buffer; doesn't seem to have proper overflow > handling. Could use a pretty_printer to accumulate chars. Thank you for these comments. Let me see if I can alleviate your concerns. This function is called from exactly one place, where the file-reader, lexio, parses a REPLACE directive. The COBOL input says "REPLACE X BY Y" subject to some constraints. Because we're using regex to find X, and because X might be any arbitrary string, the esc() function escapes regex metacharacters prior to executing the regex. IMHO it's unlikely the resulting regex input will exceed 64 KB. It's unlikely to be even 64 bytes. (Usually X is a COBOL identifier, limited to 64 bytes by ISO.) Granted, a fixed maximum is a limitation. But I put it to you: which is more likely? For a regex to exceed 64 KB, or for heap allocation to fail to return adequate memory? If there's some crazy input, I'd rather die at 64 KB than consume gigabytes of swap on the way to crashing. (As a matter of fact, the function returns a copy of the used portion of the static buffer, an unnecessary allocation. The caller soon replaces it, and could have used a pointer into that static buffer.) If there is a built-in function in gcc to escape a regex, that would be preferable to this gnarly code. Is that what you mean by "pretty printer"? > Returns a allocated buffer as "const char *", which should be just a > "char *". The caller has no business writing to the allocated buffer, which is input to the regex call. Caller and called agree on const. Isn't that what "const" is for? --jkl