As always, your queries should drive your data model. Unless you really
need 1000+ columns for most queries, you should consider separate tables
for the subsets of the columns that need to be returned for a given query.

The new 3.0 Materialized View feature can be used to easily create subsets
of a base table, although that begs the question of whether you ever really
need all 1000+ columns in the same table.

-- Jack Krupansky

On Tue, Nov 24, 2015 at 10:45 AM, Kai Wang <dep...@gmail.com> wrote:

> Hi all,
>
> If I have the following table:
> CREATE TABLE t (
>   pk int,
>   ck int,
>   c1 int,
>   c2 int,
>   ...
>   PRIMARY KEY (pk, ck)
> )
>
> There are lots of non-clustering columns (1000+). From time to time I need
> to do a query like this:
>
> SELECT c1 FROM t WHERE pk = abc AND ck > xyz;
>
> How efficient is this query compared to SELECT * ...? Apparently SELECT c1
> would save a lot of network bandwidth since only c1 needs to be transferred
> on the wire. But I am more interested in the impact on disk IO. If I
> understand C* storage engine correctly, one CQL row is clustered together
> on disk. That means c1 from different rows are stored apart. In the case of
> SELECT c1, does C* do multiple seeks to only lift c1 of each row from disk
> or lift the whole row into memory and return c1 from there?
>
> From comments on https://issues.apache.org/jira/browse/CASSANDRA-5762 it
> seems C* lifts the whole row as of 1.2.7. Is this still the case on 2.1.*?
>
> Thanks.
>

Reply via email to