Unfortunately, that is not an option as we have to store the data in an compressed and encrypted and therefore binary and non-sortable form.

On 10/12/2011 06:39 PM, David McNelis wrote:
Is it an option to not convert the data to binary prior to inserting
into Cassandra?  Also, how large are the strings you're sorting?  If its
viable to not convert to binary before writing to Cassandra, and you use
one of the string based column ordering techniques (utf8, ascii, for
example), then the data would be sorted without you  needing to
specifically worry about that.  Of course, if the strings are lengthy
you could run into  additional issues.

On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau <p...@l3s.de
<mailto:p...@l3s.de>> wrote:

    Hi there,
    we are currently building a prototype based on cassandra and came
    into problems on implementing sorted lists containing millions of items.

    The special thing about the items of our lists is, that cassandra is
    not able to sort them as the data is stored in a binary format which
    is not sortable. However, we are able to sort the data before the
    plain data gets encoded (our application is responsible for the order).

    First Approach: Storing Lists in ColumnFamilies
    ***
    We first tried to map the list to a single row of a ColumnFamily in
    a way that the index of the list is mapped to the column names and
    the items of the list to the column values. The column names are
    increasing numbers which define the sort order.
    This has the major drawback that big parts of the list have to be
    rewritten on inserts (because the column names are numbered by their
    index), which are quite common.


    Second Approach: Storing the whole List as Binary Data:
    ***
    We tried to store the compressed list in a single column. However,
    this is only feasible for smaller lists. Our lists are far to big
    leading to multi megabyte reads and writes. As we need to read and
    update the lists quite often, this would put our Cassandra cluster
    under a lot of pressure.

    Ideal Solution: Native support for storing lists
    ***
    We would be very happy with a way to store a list of sorted values
    without making improper use of column names for the list index. This
    implies that we would need a possibility to insert values at defined
    positions. We know that this could lead to problems with concurrent
    inserts in a distributed environment, but this is handled by our
    application logic.


    What are your ideas on that?

    Thanks
    Matthias




--
*David McNelis*
Lead Software Engineer
Agentis Energy
www.agentisenergy.com <http://www.agentisenergy.com>
c: 219.384.5143

/A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource./



Reply via email to