could you prefix the data with 3-4 bytes of a linear hash of the unencypted data? it wouldn't be a perfect sort, but you'd have less of a range to query to get the sorted values?
- Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 12 Oct 2011 17:57, "Matthias Pfau" <p...@l3s.de> wrote: > Unfortunately, that is not an option as we have to store the data in an > compressed and encrypted and therefore binary and non-sortable form. > > On 10/12/2011 06:39 PM, David McNelis wrote: > >> Is it an option to not convert the data to binary prior to inserting >> into Cassandra? Also, how large are the strings you're sorting? If its >> viable to not convert to binary before writing to Cassandra, and you use >> one of the string based column ordering techniques (utf8, ascii, for >> example), then the data would be sorted without you needing to >> specifically worry about that. Of course, if the strings are lengthy >> you could run into additional issues. >> >> On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau <p...@l3s.de >> <mailto:p...@l3s.de>> wrote: >> >> Hi there, >> we are currently building a prototype based on cassandra and came >> into problems on implementing sorted lists containing millions of >> items. >> >> The special thing about the items of our lists is, that cassandra is >> not able to sort them as the data is stored in a binary format which >> is not sortable. However, we are able to sort the data before the >> plain data gets encoded (our application is responsible for the order). >> >> First Approach: Storing Lists in ColumnFamilies >> *** >> We first tried to map the list to a single row of a ColumnFamily in >> a way that the index of the list is mapped to the column names and >> the items of the list to the column values. The column names are >> increasing numbers which define the sort order. >> This has the major drawback that big parts of the list have to be >> rewritten on inserts (because the column names are numbered by their >> index), which are quite common. >> >> >> Second Approach: Storing the whole List as Binary Data: >> *** >> We tried to store the compressed list in a single column. However, >> this is only feasible for smaller lists. Our lists are far to big >> leading to multi megabyte reads and writes. As we need to read and >> update the lists quite often, this would put our Cassandra cluster >> under a lot of pressure. >> >> Ideal Solution: Native support for storing lists >> *** >> We would be very happy with a way to store a list of sorted values >> without making improper use of column names for the list index. This >> implies that we would need a possibility to insert values at defined >> positions. We know that this could lead to problems with concurrent >> inserts in a distributed environment, but this is handled by our >> application logic. >> >> >> What are your ideas on that? >> >> Thanks >> Matthias >> >> >> >> >> -- >> *David McNelis* >> Lead Software Engineer >> Agentis Energy >> www.agentisenergy.com <http://www.agentisenergy.com> >> c: 219.384.5143 >> >> /A Smart Grid technology company focused on helping consumers of energy >> control an often under-managed resource./ >> >> >> >