Hi Mike,

To learn get subsecond performance on your queries using _any_ database you 
need to use proper indexing. Like Jeremy said, Solr will do this.




If you'd like to try to solve this using Cassandra you need to learn the 
difference between partition and clustering in your primary key and understand 
you need a clustering to do any kind of range query.




Also, COUNTs in Cassandra are generally fairly slow.




Cheers,

Jens
—
Sent from Mailbox

On Tue, Jun 24, 2014 at 10:09 AM, Mike Carter <jaloos...@gmail.com> wrote:

> Hello!
> I'm a beginner in C* and I'm quite struggling with it.
> I’d like to measure the performance of some Cassandra-Range-Queries. The
> idea is to execute multidimensional range-queries on Cassandra. E.g. there
> is a given table of 1million rows with 10 columns and I like to execute
> some queries like “select count(*) from testable where d=1 and v1<10 and v2
>>20 and v3 <45 and v4>70 … allow filtering”.  This kind of queries is very
> slow in C* and soon the tables are bigger, I get a read-timeout probably
> caused by long scan operations.
> In further tests I like to extend the dimensions to more than 200 hundreds
> and the rows to 100millions, but actually I can’t handle this small table.
> Should reorganize the data or is it impossible to perform such high
> multi-dimensional queries on Cassandra?
> The setup:
> Cassandra is installed on a single node with 2 TB disk space and 180GB Ram.
> Connected to Test Cluster at localhost:9160.
> [cqlsh 4.1.1 | Cassandra 2.0.7 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
> Keyspace:
> CREATE KEYSPACE test WITH replication = {
>   'class': 'SimpleStrategy',
>   'replication_factor': '1'
> };
> Table:
> CREATE TABLE testc21 (
>   key int,
>   d int,
>   v1 int,
>   v10 int,
>   v2 int,
>   v3 int,
>   v4 int,
>   v5 int,
>   v6 int,
>   v7 int,
>   v8 int,
>   v9 int,
>   PRIMARY KEY (key)
> ) WITH
>   bloom_filter_fp_chance=0.010000 AND
>   caching='ROWS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.000000 AND
>   gc_grace_seconds=864000 AND
>   index_interval=128 AND
>   read_repair_chance=0.100000 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   default_time_to_live=0 AND
>   speculative_retry='99.0PERCENTILE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> CREATE INDEX testc21_d_idx ON testc21 (d);
> select * from testc21 limit 10;
> key    | d | v1 | v10 | v2 | v3 | v4  | v5 | v6 | v7 | v8 | v9
> --------+---+----+-----+----+----+-----+----+----+----+----+-----
>  302602 | 1 | 56 |  55 | 26 | 45 |  67 | 75 | 25 | 50 | 26 |  54
>  531141 | 1 | 90 |  77 | 86 | 42 |  76 | 91 | 47 | 31 | 77 |  27
>  693077 | 1 | 67 |  71 | 14 | 59 | 100 | 90 | 11 | 15 |  6 |  19
>    4317 | 1 | 70 |  77 | 44 | 77 |  41 | 68 | 33 |  0 | 99 |  14
>  927961 | 1 | 15 |  97 | 95 | 80 |  35 | 36 | 45 |  8 | 11 | 100
>  313395 | 1 | 68 |  62 | 56 | 85 |  14 | 96 | 43 |  6 | 32 |   7
>  368168 | 1 |  3 |  63 | 55 | 32 |  18 | 95 | 67 | 78 | 83 |  52
>  671830 | 1 | 14 |  29 | 28 | 17 |  42 | 42 |  4 |  6 | 61 |  93
>   62693 | 1 | 26 |  48 | 15 | 22 |  73 | 94 | 86 |  4 | 66 |  63
>  488360 | 1 |  8 |  57 | 86 | 31 |  51 |  9 | 40 | 52 | 91 |  45
> Mike

Reply via email to