subject:"Re\: Full table scan with cassandra"

Re: Full table scan with cassandra

2017-08-17 Thread Alex Kotelnikov

So it is also terribly slow. Does not work with materialized views, quick hack about that below and UDT, this requires more time to fix. So I used it to retrieve the only built-in type column, the key. To make the task more time-consuming I exteneded the dataset a bit, to ~2.5M records. All of m

Re: Full table scan with cassandra

2017-08-17 Thread Dmitry Saprykin

Hi Alex, How do you generate you subrange set for running queries? It may happen that some of your ranges intersect data ownership range borders (check it running 'nodetool describering [keyspace_name]') Those range queries will be highly ineffective in that case and that could explain your result

Re: Full table scan with cassandra

2017-08-17 Thread Jeff Jirsa

Brian Hess has perhaps the best open source code example of the right way to do this: https://github.com/brianmhess/cassandra-loader/blob/master/src/main/java/com/datastax/loader/CqlDelimUnload.java On Thu, Aug 17, 2017 at 10:00 AM, Alex Kotelnikov < alex.kotelni...@diginetica.com> wrote: > yu

Re: Full table scan with cassandra

2017-08-17 Thread Alex Kotelnikov

yup, user_id is the primary key. First of all,can you share, how to "go to a node directly"?. Also such approach will retrieve all the data RF times, coordinator should have enough metadata to avoid that. Should not requesting multiple coordinators provide certain concurrency? On 17 August 2017

Re: Full table scan with cassandra

2017-08-17 Thread Dor Laor

On Thu, Aug 17, 2017 at 9:36 AM, Alex Kotelnikov < alex.kotelni...@diginetica.com> wrote: > Dor, > > I believe, I tried it in many ways and the result is quite disappointing. > I've run my scans on 3 different clusters, one of which was using on VMs > and I was able to scale it up and down (3-5-7

Re: Full table scan with cassandra

2017-08-17 Thread Alex Kotelnikov

Dor, I believe, I tried it in many ways and the result is quite disappointing. I've run my scans on 3 different clusters, one of which was using on VMs and I was able to scale it up and down (3-5-7 VMs, 8 to 24 cores) to see, how this affects the performance. I also generated the flow from spark

Re: Full table scan with cassandra

2017-08-16 Thread Dor Laor

Hi Alex, You probably didn't get the paralelism right. Serial scan has a paralelism of one. If the paralelism isn't large enough, perf will be slow. If paralelism is too large, Cassandra and the disk will trash and have too many context switches. So you need to find your cluster's sweet spot. We

Re: Full table scan with cassandra

2017-08-16 Thread Ben Bromhead

Apache Cassandra is not great in terms of performance at the moment for batch analytics workloads that require a full table scan. I would look at FiloDB for all the benefits and familiarity of Cassandra with better streaming and analytics performance: https://github.com/filodb/FiloDB There are als

Re: Full table scan with cassandra

Re: Full table scan with cassandra

Re: Full table scan with cassandra

Re: Full table scan with cassandra

Re: Full table scan with cassandra

Re: Full table scan with cassandra

Re: Full table scan with cassandra

Re: Full table scan with cassandra

8 matches

Site Navigation

Mail list logo

Footer information