Well, it appears that this just isn't possible.  I created CASSANDRA-5959
as a result.  (Backstory + performance testing results are described in the
issue):

https://issues.apache.org/jira/browse/CASSANDRA-5959

--
Les Hazlewood | @lhazlewood
CTO, Stormpath | http://stormpath.com | @goStormpath | 888.391.5282

On Thu, Aug 29, 2013 at 12:04 PM, Les Hazlewood <lhazlew...@apache.org>wrote:

> Hi all,
>
> We're using a Cassandra table to store search results in a
> table/column family that that look like this:
>
> +--------+---------+---------+---------+----
> |        | 0       | 1       | 2       | ...
> +--------+---------+---------+---------+----
> | row_id | text... | text... | text... | ...
>
> The column name is the index # (an integer) of the location in the
> overall result set.  The value is the result at that particular index.
>  This is great because pagination becomes a simple slice query on the
> column name.
>
> Large result sets are split into multiple rows - we're limiting row
> size on disk to be around 6 or 7 MB.  For our particular result
> entries, this means we can get around 50,000 columns in a single row.
>
> When we create the rows, we have the entire data available in the
> application at the time the row insert is necessary.
>
> Using CQL3, an initial implementation had one INSERT statement per
> column.  This was killing performance (not to mention the # of
> tombstones it created).
>
> Here's the CQL3 table definition:
>
> create table query_results (
>     row_id text,
>     shard_num int,
>     list_index int,
>     result text,
>     primary key (row_id, shard_num), list_index))
> with compact storage
>
> (the row key is row_id + shard_num.  The 'cluster column' is list_index).
>
> I don't want to execute 50,000 INSERT statements for a single row.  We
> have all of the data up front - I want to execute a single INSERT.
>
> Is this possible?
>
> We're using the Datastax Java Driver.
>
> Thanks for any help!
>
> Les
>

Reply via email to