> let's try to talk about this, write it out a little
> 
> there's a list of requested reads.
> 
> the file is roughly broken into aligned pages, some of which are cached 
> locally and some of which can be fetched
> 
> the reads may be smaller densely within a single page, or they made be large 
> and have many pages inside a single one, some cached, some not. they can 
> overlap pages at the start or end.

so one approach i started trending toward was having a starting index of the 
reads (0)
and in order 1 looking for sequences of fully cached data and advancing the 
index, then 2 looking for sequences of fully uncached data and advancing the 
index, then 3 filling a partly cached item and advancing the index.

the theory is that the total number of pages is finite and reasonable so it's 
fine to iterate over pages, but we need to batch the reads themselves. there 
was suggestion of mapping all the holes in the cache once and applying further 
vector operations to calculate all the needed pages at once in one pass. this 
might make sense :/ but it's nice to express some stabilization around the code 
since i like learning to hold rationality

so let's look at that pattern -- notably 3, filling a partly cached item.

an item has holes and data, and we might know about one at a time.

the way it works, we could assume the read starts with a hole or with data 
either way.
we could advance over data, then fetch the data for a hole, then advance over 
the next data.
not complicated! ... the bounds ... at the start we'd have next_hole < 
tails[idx]. once next_hole > tails[idx], at that point we have to advance over 
data to tails[idx] rather than from [last_data:next_hole] and it's the last 
part of that item.

Reply via email to