Re: limit vs sample for indexing a small amount of data quickly?

2014-12-31 Thread Kevin Burton
I thought so but doesn’t that read that into the driver? I need to keep piping it into other RDDs. I have a huge table as the input and I need to do multiple transformations on the data so I just want to read the first N rows from that as an RDD and then keep doing my transformations. On Wed, De

Re: Manual compaction can't finish because of GC

2014-12-31 Thread Robert Coli
On Wed, Dec 31, 2014 at 12:44 PM, Mikhail Strebkov wrote: > I see, well that's what I expected, but it still should improve a read > latency, since it will reduce the number of disk seeks per row request, is > my assumption correct? > Yep. Also other things per-sstable, like bloom filters and in

Re: Is compound index a planned feature in 3.0?

2014-12-31 Thread Jack Krupansky
For now, Cassandra application developers have three options for compound (multi-column) indexing with Cassandra: 1. DataStax Enterprise Search, which uses Solr/Lucene under the hood. http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-solr 2. Stratio which uses Luc

Re: Reload/resync system.peers table

2014-12-31 Thread Robert Coli
On Wed, Dec 17, 2014 at 8:41 AM, Paulo Ricardo Motta Gomes < paulo.mo...@chaordicsystems.com> wrote: > Is there any automatic way of reloading/resyncing the system.peers table? > Or the only way is by removing ghost nodes? > You could delete its contents, drain, and then restart the node with aut

Re: Manual compaction can't finish because of GC

2014-12-31 Thread Mikhail Strebkov
I see, well that's what I expected, but it still should improve a read latency, since it will reduce the number of disk seeks per row request, is my assumption correct? On Wed, Dec 31, 2014 at 11:51 AM, Robert Coli wrote: > On Wed, Dec 31, 2014 at 11:35 AM, Mikhail Strebkov > wrote: > >> > How

Re: Questions about bootrapping and compactions during bootstrapping

2014-12-31 Thread Robert Coli
On Tue, Dec 16, 2014 at 4:32 PM, Donald Smith < donald.sm...@audiencescience.com> wrote: > *Is it reasonable to do “nodetool disableautocompaction” on the > bootstrapping node? Should that be the default???* > There's various current/recent JIRA about compaction vs. bootstrapping, esp. wrt LCS c

Re: Manual compaction can't finish because of GC

2014-12-31 Thread Robert Coli
On Wed, Dec 31, 2014 at 11:35 AM, Mikhail Strebkov wrote: > > How effective are compactions in this CF? "grep % > /var/log/cassandra/system.log"? > I'm not sure what type of logs should I share, when I do "grep % > /var/log/cassandra/system.log" I don't see anything related to this > long-running

Re: Manual compaction can't finish because of GC

2014-12-31 Thread Mikhail Strebkov
> You can also set this online w/ nodetool, fyi. Oh, thanks! Will use this approach from now on > If you have SSD-esque devices, it is common to be CPU bound on compaction. We use spinning disks, unfortunately we can not afford SSDs in EC2 yet > How effective are compactions in this CF? "grep % /

Re: Stable cassandra build for production usage

2014-12-31 Thread Robert Coli
On Wed, Dec 31, 2014 at 8:38 AM, Ajay wrote: > For my research and learning I am using Cassandra 2.1.2. But I see couple > of mail threads going on issues in 2.1.2. So what is the stable or popular > build for production in Cassandra 2.x series. > https://engineering.eventbrite.com/what-version-o

Re: Manual compaction can't finish because of GC

2014-12-31 Thread Robert Coli
On Wed, Dec 31, 2014 at 12:01 AM, Mikhail Strebkov wrote: > I set compaction_throughput_mb_per_sec to 0 and restarted Cassandra. > You can also set this online w/ nodetool, fyi. > It looks to me that compaction thread is busy doing something that > produces quite a lot of garbage for GC to coll

Re: Internal pagination in secondary index queries

2014-12-31 Thread Sam Klock
Thanks. I've opened the following issue to track this: https://issues.apache.org/jira/browse/CASSANDRA-8550 SK On 2014-12-30 11:26, Tyler Hobbs wrote: > > On Mon, Dec 29, 2014 at 5:20 PM, Sam Klock > wrote: > > > Our investigation led us to logic in Cassandra u

Re: Stable cassandra build for production usage

2014-12-31 Thread Philip Thompson
2.0.11 is the current oldstable version, and is probably what you are looking for. On Wed, Dec 31, 2014 at 11:38 AM, Ajay wrote: > Hi All, > > For my research and learning I am using Cassandra 2.1.2. But I see couple > of mail threads going on issues in 2.1.2. So what is the stable or popular >

Stable cassandra build for production usage

2014-12-31 Thread Ajay
Hi All, For my research and learning I am using Cassandra 2.1.2. But I see couple of mail threads going on issues in 2.1.2. So what is the stable or popular build for production in Cassandra 2.x series. Thanks Ajay

Re: User click count

2014-12-31 Thread Ajay
Thanks Eric. Happy new year 2015 for all Cassandra developers and Users :). This group seems the most active of apache big data projects. Will come back with more questions :) Thanks Ajay On Dec 31, 2014 8:02 PM, "Eric Stevens" wrote: > You can totally avoid the impact of tombstones by rotatin

Re: Is compound index a planned feature in 3.0?

2014-12-31 Thread Tyler Hobbs
I don't think compound indexes are going to happen for 3.0. Perhaps 3.1, but they haven't really been discussed in depth. On Fri, Dec 26, 2014 at 4:31 AM, ziju feng wrote: > The global index JIRA actually mentions compound index but it seems that > there is no JIRA created for this feature? Any

Re: Tombstones without DELETE

2014-12-31 Thread Tyler Hobbs
Overwriting an entire collection also results in a tombstone being inserted. On Wed, Dec 24, 2014 at 7:09 AM, Ryan Svihla wrote: > You should probably ask on the Cassandra user mailling list. > > However, TTL is the only other case I can think of. > > On Tue, Dec 23, 2014 at 1:36 PM, Davide D'Ag

Re: How many tombstones for deleted CQL row?

2014-12-31 Thread Tyler Hobbs
On Fri, Dec 26, 2014 at 5:50 AM, Jens Rantil wrote: > Great. Also, if I issue "DELETE my_table WHERE partition_key=xxx AND > compound_key=yyy" I understand only a single tombstone will be created? That's correct, it will create one range tombstone. -- Tyler Hobbs DataStax

Re: 答复: Downgrade from 2.1.2 to 2.1.1

2014-12-31 Thread Phil Burress
Why don't you use incremental repairs? Is there a known issue with incremental repairs in 2.1.x? On Tue, Dec 30, 2014 at 10:22 PM, 李建奇 wrote: > We also suffer some problem from 2.1.2 . But I think we can deal with . > > First I don’t use incremental repair. > > Second we restart node after repa

Re: User click count

2014-12-31 Thread Eric Stevens
You can totally avoid the impact of tombstones by rotating your partition key in the exact counts table, and only deleting whole partitions once you've counted them. Once you've counted them you never have cause to read that partition key again. You can totally store the final counts in Cassandra

Re: Manual compaction can't finish because of GC

2014-12-31 Thread Mikhail Strebkov
Hi Rob, Thanks for your response! Unthrottle compaction, that's an insane number of SSTables. I set compaction_throughput_mb_per_sec to 0 and restarted Cassandra. Somehow I don't see GC anymore but compaction is still very slow and IO is still not a bottleneck: iotop reports ~400 K/s for disk r