Thanks for the reply. I'll read the encoding page for the details, but here 
are the answers to your questions.

For the data I have in mind, each "row" is a logical object but I'm 
interested in efficiency gains from column-wise storage. The amount of data 
is typically in the megabytes range, but possibly up to low gigabytes. Data 
is typically read once, worked on for a long time, and then written out. 
Writing data in-place isn't a concern.

Does any of that change the answer? Thanks again.

On Friday, January 13, 2017 at 7:38:01 PM UTC-5, Ross Light wrote:
>
> You may want to look in https://capnproto.org/encoding.html for low-level 
> details, but if you represent each "row" as a struct, then most of the 
> efficiency gains you describe happen without much effort.  Booleans are 
> lumped together as bits, lists can be tightly packed, and you can mmap the 
> file for reading (but not writing).
>
> However, you haven't really mentioned how much data you're working with.  
> Is this kilobytes, megabytes, gigabytes, terabytes?  What is your 
> workload?  Read-only, or moderate writing?  The mmap method works well for 
> low-write scenarios.  For a writable workload, a common way of working with 
> a binary-struct-oriented protocol like Cap'n Proto or Protocol Buffers is 
> to use them as the value in another key-value store (of which there are 
> many).  This way, each "row" is a struct that you can modify independently.
>
> Hope that helps!
> -Ross
>
> On Wed, Jan 11, 2017 at 11:37 PM <[email protected] <javascript:>> wrote:
>
>> By "tabular" data I'm referring to the sort of data that might be stored 
>> in a relational database. The specific characteristics of tabular data I 
>> want a serialization apparatus to take advantage of is the usual SoA vs AoS 
>> tradeoffs. As examples, if every item has a boolean field and there are 
>> many items, it may be advantageous to store chunks of booleans together as 
>> bits. The reason I'm interested in something other than for example, an SQL 
>> database, is that I don't need support for queries, I want to store data as 
>> a file or set of files instead of in a centralized per-machine database, I 
>> want richer typing, and my tabular data is bundled with data that is less 
>> amenable to the relational model such as variable length lists.
>>
>> Is Cap'n Proto appropriate for this sort of use? If not, is there an 
>> obvious alternative that is?
>>
>> Thoughts? Thanks.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Cap'n Proto" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> Visit this group at https://groups.google.com/group/capnproto.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
Visit this group at https://groups.google.com/group/capnproto.

Reply via email to