On Mon, Oct 17, 2011 at 2:39 AM, Matthias Pfau <p...@l3s.de> wrote: > We would be very happy if cassandra would give us an option to maintain the > sort order on our own (application logic). That is why it would be > interesting to hear from any of the developers if it would be easily > possible to add such a feature to cassandra.
What you are describing above is option (b), you would do this by building your sort-order, encryption, and decryption into Cassandra. Let me elaborate... The database always has to know how to compute sort order for items. Deferring it to your code can only happen two ways, in-process, or out-of-process. Deferring sort-order comparisons to out-of-process code would have diasterous effects on performance, as they are used multiple times for every single operation the database does. Therefore, short of an application where performance is irrelevant, the feasable method to allow your code to maintain sort-order is "option b", to build your sort-order/encryption/decryption into the database. Cassandra would have to initialize it at startup to read your database. Cassandra is open-source, so you can do this work on your own right now. Aaron's message provided some pointers. If you do go this route, you'll probably want to separate your sort-order-and-encryption-handler into a separate JAR, and add some code to Cassandra to load-and-register your classes when the database starts. You'd submit this "stable data-format plug-in-API" patch to Cassandra, and hopefully find a way to get it accepted into the main codebase. This would make it easier for you to update to new versions, as you would only be dependent only on the public-API, rather than a private fork of Cassandra. > Otherwise, it seems like we have to implement sth. based on strategy (a) > because (b) is not feasible for us and (c) is a rather young research topic > which is slowly gaining more attention. > Certainly (option a) is the most straightforward method if you wish to keep your codebase completely separate from your database (whether Cassandra or not). Whether this is an acceptable security risk or not is up to you. -------- Pulling back from implementation issues, I wonder if you might share a bit more about the reason you need this functionality for your application. Here are a few questions I'm curious about: 1) Is the data all-encrypted with a single key, or do different records use different keys? 2) If a single key, would adding a file/block/record-level encryption to Cassandra solve this problem? If not, why not? Is there something special about your encryption methods? 3) Is the compression of the data somehow special, such that block-level compression (either zlib, snappy, or even a custom-implemented scheme) is not viable? If so, why? 4) Is there something special about the sorting that makes it hard to expose the sort order to a database? (other than cassandra's lack of general composite key sorting)