> let's try to talk about this, write it out a little > > there's a list of requested reads. > > the file is roughly broken into aligned pages, some of which are cached > locally and some of which can be fetched > > the reads may be smaller densely within a single page, or they made be large > and have many pages inside a single one, some cached, some not. they can > overlap pages at the start or end.
so one approach i started trending toward was having a starting index of the reads (0) and in order 1 looking for sequences of fully cached data and advancing the index, then 2 looking for sequences of fully uncached data and advancing the index, then 3 filling a partly cached item and advancing the index. the theory is that the total number of pages is finite and reasonable so it's fine to iterate over pages, but we need to batch the reads themselves. there was suggestion of mapping all the holes in the cache once and applying further vector operations to calculate all the needed pages at once in one pass. this might make sense :/ but it's nice to express some stabilization around the code since i like learning to hold rationality so let's look at that pattern -- notably 3, filling a partly cached item. an item has holes and data, and we might know about one at a time. the way it works, we could assume the read starts with a hole or with data either way. we could advance over data, then fetch the data for a hole, then advance over the next data. not complicated! ... the bounds ... at the start we'd have next_hole < tails[idx]. once next_hole > tails[idx], at that point we have to advance over data to tails[idx] rather than from [last_data:next_hole] and it's the last part of that item.