often i try to implement an append-only random-access data structure.
it's a fun puzzle for my confused mind. i never finish it, but today i
thought about it a little bit and i think over the years i might be
able to put together a simplified form and maybe make it exist some
day.
here is where i wrote right now. i started spinning out when I renamed
Simple.Write to Simple.RangedData .
I was in the middle of implementing a leaf iterator, so as to make a
simple way to count the number of data leaves and associate them with
a depth. the approach, as written in the file, is to simply relink to
leaves with excessive depth each flush, so as to maintain O(log n)
random seek time.
i vaguely recall that approach has a major issue, but i might be
remembering wrongly, and it's still quite exciting to be able to spend
some minutes trying.
# block tree structure
# blocks written to are leaves
# depth is log2 leaves
# when a flush is made, all blocks are written, and also enough nodes such that every leaf can be accessed within depth lookups.
# consider we have an existing tree
# with say m flushes, containing n leaves (or m leaves). we'll likely call it n.
# each flush shows which leaves it has
# additionally, with the final flush, each leaf has an existing depth.
# when we reflush, we need to provide a new index for any leaves that become too deep.
# which leaves are too deep?
# we could basically walk them all to find out. this would be a consistent first approach.
class Simple:
class RangedData:
def __init__(self, offset, data):
self.start = offset
self.data = data
self.end = self.start + len(self.data)
class Flush:
# flush has a list of new leaves, and a list of indexes to old leaves with ranges
def __init__(self, *writes, prev_flush=None):
self.prev_flush = prev_flush
self.data = writes
# find extents
start = min((write.start for write in self.data))
end = max((write.end for write in self.data))
if prev_flush is None:
self.start = start
self.end = end
self.index = []
return
self.start = min(start, prev_flush.start)
self.end = max(end, prev_flush.end)
self.index = [(prev_flush.start, prev_flush.end, prev_flush)]
# find leaf count and leaf depths
offset = start
while offset < end:
#def lookup(self, offset
def leaves(self, start = None, end = None):
offset = self.start
data_iter = iter(self.data)
index_iter = iter(self.index)
next_write = next(data_iter, None)
next_index = next(index_iter, None)
while offset < self.end:
if next_write is not None:
assert offset <= next_write.start
if offset == next_write.start:
yield (0, next_write)
continue
def __init__(self, latest = None):
self.latest = latest
self.pending = []
def write(self, offset, data):
self.pending.append((offset, data))
def flush(self):
self.latest = self.Flush(*self.pending, prev_flush=self.latest)
self.pending = []