On 10/01/16 00:16, David Malcolm wrote: > On Mon, 2016-09-26 at 14:30 +0000, Bernd Edlinger wrote: >> Hi, >> >> this started, because I saw get_next_line returns -1 on error, >> instead >> of false: PR 77699. >> >> But when I was there I also saw that read_line_num is using memmove >> on >> non-aliased src & dest, instead of memcpy. But then I see that >> also adding a NUL byte is superfluos, and all the copying as well.... >> >> >> So the result is a cleanup of input.c that avoids lots of copying >> altogether, because it is no longer necessary to do so. >> >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >> Is it OK for trunk? > > I'll attempt to review this with my "diagnostics maintainer" hat on - > but I'm not as familiar with input.c's implementation as with the restof > the diagnostics subsystem. >
Yes. I gave a few review comments when Dodji Seketeli wrote the line buffering code, but Jakub did most of the review. So I am already a bit familiar with the history of the code here. > Did you try running this under valgrind? > (e.g. "make selftest-valgrind" at least) > did not try. > FWIW I made the following notes to try to understand the existing > behavior: > > get_next_line: > sets its *line to values related to c->data, the buffered content > of all of the file so far... > c->data can be reallocated in maybe_grow, called by read_data, > called by maybe_read_line, called by get_next_line > invalidating the exposed ptr whenever the buffer growth needs a > re-allocation. > Additionally the file cache can be evicted, and the buffer can be re-used for another file. > read_next_line: > copies the exposed part of c->data to *line > Note that it's a different *line: > It calls get_next_line with &l, so *line in get_next_line is > writing to "l" in read_next_line, exposing part of c->data > read_next_line ensures that the passed-in-ptr's buffer is big > enough, then does a memcpy to it, effectively from c->data. > > goto_next_line: > also calls get_next_line, but doesn't bother copying the result > anywhere > > read_line_num: > calls goto_next_line repeatedly > then calls read_next_line, thus effectively read_next_line is > copying data to the passed-in-ptr > > location_get_source_line: > calls read_next_line, managing: > static char *buffer; > static ssize_t len; > as the place where the copies are written to > Hence only one such buffer is live per-process, and it can't leak. > Hence location_get_source_line can't leak, but the buffer exposed > to the caller is only valid until the next call to > location_get_source_line. > >> 2016-09-26 Bernd Edlinger <bernd.edlin...@hotmail.de> >> >> PR preprocessor/77699 >> * input.c (maybe_grow): Don't allocate one byte extra > headroom. >> (get_next_line): Return false on error. >> (read_next_line): Removed, use get_next_line instead. >> (read_line_num): Don't copy the line. >> (location_get_source_line): Don't use static data. >> >> >> Index: gcc/input.c >> =================================================================== >> --- gcc/input.c (revision 240471) >> +++ gcc/input.c (working copy) >> @@ -432,7 +432,7 @@ maybe_grow (fcache *c) >> return; >> >> size_t size = c->size == 0 ? fcache_buffer_size : c->size * 2; >> - c->data = XRESIZEVEC (char, c->data, size + 1); >> + c->data = XRESIZEVEC (char, c->data, size); >> c->size = size; >> } > > Seems reasonable; I can't see anywhere where an extra byte could get > used. > Yes. It was used only in the memmove which copied one extra byte, but that was set to zero afterwards. > >> @@ -534,7 +534,7 @@ get_next_line (fcache *c, char **line, ssize_t > *li >> } >> >> if (ferror (c->fp)) >> - return -1; >> + return false; > > OK > >> /* At this point, we've found the end of the of line. It either >> points to the '\n' or to one byte after the last byte of the >> @@ -597,36 +597,6 @@ get_next_line (fcache *c, char **line, ssize_t > *li >> return true; >> } >> >> -/* Reads the next line from FILE into *LINE. If *LINE is too small >> - (or NULL) it is allocated (or extended) to have enough space to >> - containe the line. *LINE_LENGTH must contain the size of the >> - initial*LINE buffer. It's then updated by this function to the >> - actual length of the returned line. Note that the returned line >> - can contain several zero bytes. Also note that the returned > string >> - is allocated in static storage that is going to be re-used by >> - subsequent invocations of read_line. */ >> - >> -static bool >> -read_next_line (fcache *cache, char ** line, ssize_t *line_len) >> -{ >> - char *l = NULL; >> - ssize_t len = 0; >> - >> - if (!get_next_line (cache, &l, &len)) >> - return false; >> - >> - if (*line == NULL) >> - *line = XNEWVEC (char, len); >> - else >> - if (*line_len < len) >> - *line = XRESIZEVEC (char, *line, len); >> - >> - memcpy (*line, l, len); >> - *line_len = len; >> - >> - return true; >> -} >> - >> /* Consume the next bytes coming from the cache (or from its >> underlying file if there are remaining unread bytes in the file) >> until we reach the next end-of-line (or end-of-file). There is > no >> @@ -651,7 +621,7 @@ goto_next_line (fcache *cache) >> >> static bool >> read_line_num (fcache *c, size_t line_num, >> - char ** line, ssize_t *line_len) >> + char **line, ssize_t *line_len) >> { >> gcc_assert (line_num > 0); >> >> @@ -705,12 +675,8 @@ read_line_num (fcache *c, size_t line_num, >> { >> /* We have the start/end of the line. Let's just copy >> it again and we are done. */ >> - ssize_t len = i->end_pos - i->start_pos + 1; >> - if (*line_len < len) >> - *line = XRESIZEVEC (char, *line, len); >> - memmove (*line, c->data + i->start_pos, len); >> - (*line)[len - 1] = '\0'; >> - *line_len = --len; >> + *line = c->data + i->start_pos; >> + *line_len = i->end_pos - i->start_pos; >> return true; >> } > > I notice that the old code adds a nul-terminator for this case (where > line_num < c->line_num), but not for the case where the line is > obtained via read_next_line. Ugh. I wonder if this code at one time > guaranteed NUL-termination, and then bit-rotted. > Yes, exactly that was the case initially. This code evolved from read_line which is in still in gcov.c, one day I will fix that one too, because it has a memory leak. > I believe that are callers of the input.h interface are set up > throughout to not expect nul-terminators on lines, so I think that > change is OK. > >> @@ -735,7 +701,7 @@ read_line_num (fcache *c, size_t line_num, >> >> /* The line we want is the next one. Let's read and copy it back > to >> the caller. */ >> - return read_next_line (c, line, line_len); >> + return get_next_line (c, line, line_len); >> } > > If I understand things, rather than returning a copy of the line data, > it will now return a ptr into c->data i.e. fcache's copy of the prefix > of the file that it's buffered so far, right? > Yes, the pointer is a const char* so will not modify the cache. All the copying should be unnecessary, and the line data is only valid until the next line is accessed. > >> /* Return the physical source line that corresponds to > FILE_PATH/LINE in a >> @@ -748,8 +714,8 @@ const char * >> location_get_source_line (const char *file_path, int line, >> int *line_len) >> { >> - static char *buffer; >> - static ssize_t len; >> + char *buffer; >> + ssize_t len; >> >> if (line == 0) >> return NULL; > > If I understand things, the old behavior was to have a single buffered > line copy, managed by the "static" vars in location_get_source_line, so > that the buffer is only valid until the next call to > location_get_source_line. > Yes. Same as read_line in gcov.c, initially it was a copy of that one. > If I understand the proposed change, the new behavior will be that the > result will be a ptr into the inside of fcache->data, and thus will > also be only guaranteed valid the next call to location_get_source_line > - since fcache->data could grow, and that growth could lead to a > reallocation. > Yes. Also the buffer can be reused, when another file is requested. > I like the patch: I believe it makes the implementation simpler, and > slightly more efficient. Before I'm comfortable approving this, please > could you update the patch to also update some of the comments in > input.c, to better reflect the lifetimes of the various entities. > > For example get_next_line's comment currently talks about "Space for > that line has been allocated in the cache thus *LINE has the same life > time as C.", which I think isn't currently true, and which the patch > may make less true :) > Yes. I will look over the comments. > I think we could also use some more selftests for > location_get_source_line... there's test_reading_source_line, but this > doesn't yet exercise the > if (line_num <= c->line_num) > case in read_line_num, when lines are accessed in reverse order. So > please also update that selftest to exercise that cache access pattern: > read the 3rd line, then the 2nd line. > Yes. I will add that test, maybe also a last line with missing EOL. Thanks Bernd.