Thanks for the reply. I'll read the encoding page for the details, but here are the answers to your questions.
For the data I have in mind, each "row" is a logical object but I'm interested in efficiency gains from column-wise storage. The amount of data is typically in the megabytes range, but possibly up to low gigabytes. Data is typically read once, worked on for a long time, and then written out. Writing data in-place isn't a concern. Does any of that change the answer? Thanks again. On Friday, January 13, 2017 at 7:38:01 PM UTC-5, Ross Light wrote: > > You may want to look in https://capnproto.org/encoding.html for low-level > details, but if you represent each "row" as a struct, then most of the > efficiency gains you describe happen without much effort. Booleans are > lumped together as bits, lists can be tightly packed, and you can mmap the > file for reading (but not writing). > > However, you haven't really mentioned how much data you're working with. > Is this kilobytes, megabytes, gigabytes, terabytes? What is your > workload? Read-only, or moderate writing? The mmap method works well for > low-write scenarios. For a writable workload, a common way of working with > a binary-struct-oriented protocol like Cap'n Proto or Protocol Buffers is > to use them as the value in another key-value store (of which there are > many). This way, each "row" is a struct that you can modify independently. > > Hope that helps! > -Ross > > On Wed, Jan 11, 2017 at 11:37 PM <[email protected] <javascript:>> wrote: > >> By "tabular" data I'm referring to the sort of data that might be stored >> in a relational database. The specific characteristics of tabular data I >> want a serialization apparatus to take advantage of is the usual SoA vs AoS >> tradeoffs. As examples, if every item has a boolean field and there are >> many items, it may be advantageous to store chunks of booleans together as >> bits. The reason I'm interested in something other than for example, an SQL >> database, is that I don't need support for queries, I want to store data as >> a file or set of files instead of in a centralized per-machine database, I >> want richer typing, and my tabular data is bundled with data that is less >> amenable to the relational model such as variable length lists. >> >> Is Cap'n Proto appropriate for this sort of use? If not, is there an >> obvious alternative that is? >> >> Thoughts? Thanks. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Cap'n Proto" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> Visit this group at https://groups.google.com/group/capnproto. >> > -- You received this message because you are subscribed to the Google Groups "Cap'n Proto" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. Visit this group at https://groups.google.com/group/capnproto.
