As mentionned in my previous email, I'm now thinking of serializing the linemap from the header by pph'ed, here is how I think this could work:
From what I understand, the source_locations allocated for everything in a given set of headers (from the LC_ENTER for the header in the line_table up to, and including everything in between, the corresponding LC_LEAVE) is dependent on only one thing; the value of line_table->highest_location when the header was inserted (i.e. if in two different contexts we happen to have the same line_table->highest_location when inserting header A.h, all of the tokens for A.h in each context will have the same source_location). If the previous statement is true, then we calculate an offset between the line_table->highest_location as it was in the serialized version of the header and as it is in the C file in which we are about to de-serialize the line table. We then need to update some things based on that offset: for each line_map entry in the line_table: start_location for each source_location field on tokens coming in from the header Some other things need to be updated with simple logic: for each line_map entry in the line_table: included_from for the line_table itself: highest_location, highest_line, max_column_hint Some things just stay as is when de-serialized: for each line_map entry in the line_table: reason, sysp, to_file, to_line, column_bits After doing this we should have the same line_map we would have had when doing a regular compile (non-pph), and the tokens coming from the header file should have the correct source_location, simply by adding the calculated offset to the one they had previously (this also allows us to only serialize the source_location for each token as opposed to serializing it's expanded location, saving space in the pph). Let me know what you think, Gabriel On Tue, Jul 26, 2011 at 11:10 AM, Gabriel Charette <gch...@google.com> wrote: > On Tue, Jul 26, 2011 at 4:25 AM, Dodji Seketeli <do...@seketeli.org> wrote: >> Gabriel Charette <gch...@google.com> a écrit: >> >>> Alright, so after looking even more at the linemap code, here is what >>> I'm thinking now: >>> >>> So the main problem is: >>> with the linemap logic is that linemap_line_start adds a new table >>> entry if line_delta < 0 (I don't understand why this is needed, my >>> assumption is that it is because the rest of the logic depends on >>> set->highest_location, thus we want to make sure we are higher than >>> the highest so far? can someone clarify why we need this?) >> >> Here is how I understand this. Sorry if it's too obvious. >> >> The linemap_line_start model follows that of a parser that parses tokens >> from left to right, and from top to bottom. Let's imagine a main CU >> called M which has this layout: >> >> <---- start, locus #0 >> >> [... tokens ...] >> >> #include "A" <--- locus #1 >> >> [... tokens ...] >> >> There are going to be at least three line maps (instances of struct >> line_map) created and added to the line map set (struct line_maps) of M: >> one for loci from #0 to right before #1, at least one for loci coming >> from the included header A (there can be several line maps here, if A >> itself includes other headers) and another one from loci coming right >> after #1. >> >> A hard invariant here is that source_locations yielded by a given line >> map must be less than the struct line_map::start_location of the next >> line map. This is (partly) so that line map lookup (when we want to get >> the line map that corresponds to a given source_location) can be done >> using a binary search, which is faster than just doing a linear search. >> A side effect of this is that source_locations handed out by >> linemap_line_start increase monotonically. >> >> With these assumptions in mind, when the parser code calls >> linemap_line_start to get the source_location corresponding to a new >> line at column 0, the only case where the new line is less than the >> preceding line (so that line_delta < 0) is if the parser is entering a >> new file. E.g, it is entering the header A, coming from M. Or, it is >> getting back to M, leaving A. In that case, it seems required for >> linemap_line_start to create the new line map for A or for M. >> > > Thanks for this explanation, that's pretty much how I had understood > the linemap code, but you clarified some of the things I wasn't clear > about. > >>> My solution: >>> I will add a boolean flag to linemap_line_start's parameters, >>> allowEarlierStartLineStart, which when true, will not create a new >>> entry in the line_table even if the line_delta < 0. >> >> If I understand correctly, you are acting like if you were parsing from >> right to left and from bottom to top, for the content of A. >> > > Right. > >> In that context, you need to keep the hard invariant I talked about >> above, so that line map lookup keeps working, at least. >> >> In other words, when you generate a source_location for a token T0 that >> comes topologically /before/ the token T1 you generated source_location >> for at the previous step, you need to make sure that source_location of >> T0 is less than source_location of T1, and that the struct >> line_map::start_location of the current map is less than T0. If T0 >> comes from a different header than T1 (suppose both T1 and T0 comes from >> headers that are included by A) then a new line map (different from the >> line map for T1) needs to be created for T0. Now I am not sure if in >> your design, there is going to be only one pph for A, or if there is >> going to be one pph for each header included by A as well. I would have >> imagined the later case to be more appropriate. >> > > Correct we are using the multiple pph approach (one for each included > header in the header). > > This actually makes the problem harder than I had originally pictured > it yesterday. I'm now thinking using serialization is going to be a > better solution because of that. > >>> Instead of relying on highest_location to find the current line's >>> source location, it will use the start_location and add to it the >>> delta between to_line and map->to_line). >> >> If the map->start_location was allocated correctly, I guess this could >> work assuming the constraints I raised above are respected. >> >> Note however that there are corner cases like having so long lines in >> the source code being parsed that we fill the column-space of the line >> map so quickly that we need to allocate more line maps for a given file. >> These are corner cases taken care of by linemap_line_start today that >> you could miss if you are trying to walk in reverse order like this. >> Those would have been addressed by serializing the original line map set >> of A that we knew was correct, barring the value of their >> map->start_location and map->to_line. >> > > One more reason for serialization! > >>> I will also add a new linemap function, >>> linemap_position_for_line_and_column, which will be just like >>> linemap_position_for_column, except that it won't assume the >>> highest_line in the set is the one we are currently on (this function >>> will thus allow us to get a source_location without depending on the >>> current assumptions on the order these functions are called in). >> >> Heh. I have added a function with that exact name in my patch set to >> track locations across macro expansions. Look at the source code at >> http://tinyurl.com/3egxp47, search for >> linemap_position_for_line_and_column (obviously). The implicit >> assumption made by that function is that the {line,column} pair which >> you are requesting a source_location for belongs to the current line >> map, though. >> > > Good to know, I had made the same assumption in mine that we were > looking for {line, column} in the current line map. > ---- > So, I think serializing the linemap and tweaking it on the way in is > going to be easier, in particular because of nested pph (header > including headers). > I'll put down my strategy for serializing the linemap and send a > follow up email soon. > > Thanks, > Gabriel >