Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

2015-04-02 Thread Ian Rose
| Linkedin: *linkedin.com/in/carlosjuzarterolo > <http://linkedin.com/in/carlosjuzarterolo>* > Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 > www.pythian.com > > On Thu, Apr 2, 2015 at 3:06 PM, Ian Rose wrote: > >> Hi all - >> >> We currently have a single

Best practice: Multiple clusters vs multiple tables in a single cluster?

2015-04-02 Thread Ian Rose
Hi all - We currently have a single cassandra cluster that is dedicated to a relatively narrow purpose, with just 2 tables. Soon we will need cassandra for another, unrelated, system, and my debate is whether to just add the new tables to our existing cassandra cluster or whether to spin up an en

OutOfMemoryError in ReadStage

2015-03-22 Thread Ian Rose
Hi all - I had a nasty streak of OOMs earlier today (several on one node, and a single OOM on one other node). I've downloaded a few of the hprof files for local analysis. In each case, there is a single ReadStage thread with a huge (> 7.5GB) org.apache.cassandra.db.ArrayBackedSortedColumns inst

Re: best way to measure repair times?

2015-03-19 Thread Ian Rose
Thanks Jan, although I'm a bit unsure of the details. It looks like when you run a repair this actually occurs over several "sessions". e.g. in your example above there are 2 different "repair session [...] finished" lines. So does it makes sense that I would want to measure between when I first

best way to measure repair times?

2015-03-19 Thread Ian Rose
Howdy - I'd like to (a) monitor how long my repairs are taking, and (b) know when a repair is finished so that I can take some kind of followup action. What's the best way to tackle either or both of these? Some potentially relevant details: - running community apache cassandra (not DSE) - vers

Re: get partition key from tombstone warnings?

2015-01-21 Thread Ian Rose
x.com> wrote: > There is an open ticket for this improvement at > https://issues.apache.org/jira/browse/CASSANDRA-8561 > > On Wed, Jan 21, 2015 at 4:55 PM, Ian Rose wrote: > >> When I see a warning like "Read 9 live and 5769 tombstoned cells in ... >> "

get partition key from tombstone warnings?

2015-01-21 Thread Ian Rose
When I see a warning like "Read 9 live and 5769 tombstoned cells in ... " is there a way for me to see the partition key that this query was operating on? The description in the original JIRA ticket ( https://issues.apache.org/jira/browse/CASSANDRA-6042) reads as though exposing this information w

Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Ian Rose
ferring to #2, without realizing the role they play in #1. - Ian On Tue, Dec 16, 2014 at 11:12 AM, Jack Krupansky wrote: > > When you say “no need for tombstones”, did you actually read that > somewhere or were you just speculating? If the former, where exactly? > > -- Jack K

Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Ian Rose
is usually no more than one). > > Hope this helps. > > Robert > > On Dec 16, 2014, at 8:22 AM, Ian Rose wrote: > > Howdy all, > > Our use of cassandra unfortunately makes use of lots of deletes. Yes, I > know that C* is not well suited to this kind of workloa

does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Ian Rose
Howdy all, Our use of cassandra unfortunately makes use of lots of deletes. Yes, I know that C* is not well suited to this kind of workload, but that's where we are, and before I go looking for an entirely new data layer I would rather explore whether C* could be tuned to work well for us. Howev

Re: Cassandra Files Taking up Much More Space than CF

2014-12-09 Thread Ian Rose
Try `nodetool clearsnapshot` which will delete any snapshots you have. I have never taken a snapshot with nodetool yet I found several snapshots on my disk recently (which can take a lot of space). So perhaps they are automatically generated by some operation? No idea. Regardless, nuking those

Re: opscenter: 0 of 0 agents connected, but /nodes/all gives 3 results

2014-12-03 Thread Ian Rose
rigoriev > wrote: > >> I have observed this kind of situation with 0 agents connected. >> Restarting the agents always helped so far. By the way, check the agent's >> logs and opscenterd logs, there may be some clues there. >> >> On Tue, Dec 2, 2014 at

opscenter: 0 of 0 agents connected, but /nodes/all gives 3 results

2014-12-02 Thread Ian Rose
Hi all - Just getting started setting up OpsCenter today. I have a 3 node cassandra cluster and (afaict) the agent installed and running happily on all 3 nodes. I also have OpsCenter up and running on a 4th node. I do not have SSL enabled between these nodes. In the OpsCenter interface, I see

Re: significant NICE cpu usage

2014-10-08 Thread Ian Rose
wrote: > Hello, > > AFAIK Compaction threads run with a lower affinity, I believe that will > show up as “niced”.. > > Regards, > Andras > > From: Ian Rose > Reply-To: user > Date: Wednesday 8 October 2014 17:29 > To: user > Subject: significant NICE cpu u

significant NICE cpu usage

2014-10-08 Thread Ian Rose
Hi - We are running a small 3-node cassandra cluster on Google Compute Engine. I notice that our machines are reporting (via a collectd agent, confirmed by `top`) a significant amount of cpu time in the NICE state. For example, one of our machines is a n1-highmem-4 (4 cores, 26 GB RAM). Here is

Re: Instagram's "Anticolumn"

2014-09-07 Thread Ian Rose
I assume it's a hash to detect read/write races. As an example: 1. actor 1 reads key = (1, 'whatever') and gets value = V0 2. actor 2 writes to key (1, 'whatever') with new value V1 3. actor 1 writes an anticolumn with key = (0, 'whatever') and value = md5(V0) 4. later, if someone wants to read t

Re: are dynamic columns supported at all in CQL 3?

2014-08-27 Thread Ian Rose
here sensor_id=1; > > sensor_id | collected_at | volts > ---+--+--- > 1 | 2014-05-01 00:00:00Pacific Daylight Time | 1.2 > 1 | 2014-05-02 00:00:00Pacific Daylight Time | 1.3 >

Re: are dynamic columns supported at all in CQL 3?

2014-08-26 Thread Ian Rose
rows > > > > On Tue, Aug 26, 2014 at 1:12 PM, Ian Rose wrote: > >> Is it possible in CQL to create a table that supports dynamic column >> names? I am using C* v2.0.9, which I assume implies CQL version 3. >> >> This page appears to show that this was su

are dynamic columns supported at all in CQL 3?

2014-08-26 Thread Ian Rose
Is it possible in CQL to create a table that supports dynamic column names? I am using C* v2.0.9, which I assume implies CQL version 3. This page appears to show that this was supported in CQL 2 with the 'with comparator' and 'with default_validation' options but that CQL 3 does not support this:

Re: range query times out (on 1 node, just 1 row in table)

2014-08-13 Thread Ian Rose
k that this hasn't been fixed in a recent version, but if you are using > a recent release (say 2.0.9), then please do open a JIRA ticket with your > reproduction steps. > > > On Wed, Aug 13, 2014 at 4:25 AM, Ian Rose wrote: > >> Hi - >> >> I am currently r

Re: range query times out (on 1 node, just 1 row in table)

2014-08-13 Thread Ian Rose
(<, ≤, >, ≥). Indeed >> inequalities forces the server to scan all the cluster to find the >> requested range, which is clearly not optimal. That's the reason why you >> need to add "ALLOW FILTERING" for the query to be accepted. >> >> "ALLOW FI

Re: range query times out (on 1 node, just 1 row in table)

2014-08-13 Thread Ian Rose
Confusingly, it appears to be the presence of an index on int_val that is causing this timeout. If I drop that index (leaving only the index on foo_name) the query works just fine. On Tue, Aug 12, 2014 at 10:25 PM, Ian Rose wrote: > Hi - > > I am currently running a single Cassandr

range query times out (on 1 node, just 1 row in table)

2014-08-12 Thread Ian Rose
Hi - I am currently running a single Cassandra node on my local dev machine. Here is my (test) schema (which is meaningless, I created it just to demonstrate the issue I am running into): CREATE TABLE foo ( foo_name ascii, foo_shard bigint, int_val bigint, PRIMARY KEY ((foo_name, foo_sha

Re: clarification on 100k tombstone limit in indexes

2014-08-12 Thread Ian Rose
ing this minute. > > The manual index approach suffers a lot from bottleneck issue for heavy > workload, that's the main reason they implement a distributed secondary > index. There is no free lunch though. What you gain in term of control and > tuning with the manual index, you

Re: clarification on 100k tombstone limit in indexes

2014-08-11 Thread Ian Rose
0 is bucket 1, 00:30 to 01:00 is bucket 2 and so on. For a whole > day, you'd have 48 buckets. We need to put data into buckets to avoid ultra > wide rows since you mentioned that there are 10 items (so 10 updates) / > sec. Of course, 30 mins is just an exemple, you can tune it down to a

Re: clarification on 100k tombstone limit in indexes

2014-08-10 Thread Ian Rose
ira/browse/CASSANDRA-6117 > [4] > https://github.com/jbellis/cassandra/blob/4ac18ae805d28d8f4cb44b42e2244bfa6d2875e1/conf/cassandra.yaml#L407-L417 > > > > On Sun, Aug 10, 2014 at 7:19 PM, Ian Rose wrote: > >> Hi - >> >> On this page ( >> http://www.data

clarification on 100k tombstone limit in indexes

2014-08-10 Thread Ian Rose
Hi - On this page ( http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_when_use_index_c.html), the docs state: Do not use an index [...] On a frequently updated or deleted column and > *Problems using an index on a frequently updated or deleted column*¶ >