hi Antoine, On Sun, Jun 24, 2018 at 1:06 PM, Antoine Pitrou <anto...@python.org> wrote: > > Hi Wes, > > Le 24/06/2018 à 08:24, Wes McKinney a écrit : >> >> If this sounds interesting to the community, I could help to kickstart >> a design process which would likely take a significant amount of time. >> The requirements could be complex (i.e. we might want to support >> variable-size record fields while also providing random access >> guarantees). > > What do you call "variable-sized" here? A scheme where the length of a > record's field is determined by the value of another field in the same > record?
As an example, here is a fixed size record record foo { a: int32; b: float64; c: uint8; } With padding suppose this is 16 bytes per record; so if we have a column of these, then random accessing any value in any record is simple. Here's a variable-length record: record bar { a: string; b: list<int32>; } What I've seen done to represent this in memory is to have a fixed size record followed by a sidecar containing the variable-length data, so the fixed size portion might look something like a_offset: int32; a_length: int32; b_offset: int32; b_length: int32; So from this, you can do random access into the record. If you wanted to do random access on a _column_ of such records, it is similar to our current variable-length Binary type. So it might be that the underlying Arrow memory layout would be FixedSizeBinary for fixed-size records and variable Binary for variable-size records. - Wes > > > > Regards > > Antoine.