Re: Hadoop jobs and data locality

2013-05-07 Thread Shamim
I have created an issue in jira https://issues.apache.org/jira/browse/CASSANDRA-5544 -- Best regards   Shamim A. 06.05.2013, 22:26, "Shamim" : > I think It will be better to open a issue in jira > Best regards >   Shamim A. > >>  Unfortunately I've just tried with a new cluster with RandomPart

Re: Hadoop jobs and data locality

2013-05-07 Thread cscetbon.ext
I was going to open one. Great ! -- Cyril SCETBON On May 7, 2013, at 9:03 AM, Shamim mailto:sre...@yandex.ru>> wrote: I have created an issue in jira https://issues.apache.org/jira/browse/CASSANDRA-5544 ___

Re: cost estimate about some Cassandra patchs

2013-05-07 Thread aaron morton
> Use case = rows with rowkey like (folder id, file id) > And operations read/write multiple rows with same folder id => so, it could > make sense to have a partitioner putting rows with same "folder id" on the > same replicas. The entire row key the thing we use to make the token used to both lo

Re: Cassandra won't restart : 7365....6c73 is not defined as a collection

2013-05-07 Thread aaron morton
> I have also been changing types, e.g. lock_tokens__ from MAP > to MAP. The error looks like the schema was changed and a log replayed from before the change. Which obviously is not something we would expect to happen. Do you change the map type using ALTER TABLE (not sure if that is possible

Re: hector or astyanax

2013-05-07 Thread aaron morton
> i want to know which cassandra client is better? Go with Astynax or Native Binary, they are both under active development and support by a vendor / large implementor. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com O

Re: Cassandra running High Load with no one using the cluster

2013-05-07 Thread aaron morton
> Why did you increase the stack-size to 5.5 times greater than recommended? > Since each threads now uses 1000KB minimum just for the stack, a large number > of threads will use a large amount of memory. I'd say that is the reason you are running out of memory. Cheers - Aaron

Re: Hadoop jobs and data locality

2013-05-07 Thread cscetbon.ext
I tried to use your quick workaround but the task is lasting really longer than before even if it uses 2 mappers in //. The fact is that there are 1000 tasks. Are you using vnodes ? I didn't try to disable them. Kind% Complete Num Tasks Pending Running CompleteKilled Fai

RE: cost estimate about some Cassandra patchs

2013-05-07 Thread DE VITO Dominique
> -Message d'origine- > De : aaron morton [mailto:aa...@thelastpickle.com] > Envoyé : mardi 7 mai 2013 10:22 > À : user@cassandra.apache.org > Objet : Re: cost estimate about some Cassandra patchs > > > Use case = rows with rowkey like (folder id, file id) > > And operations read/write mu

Re: SSTables not opened on new cluste

2013-05-07 Thread Philippe
Definitely knew that for major releases, didn't expect it for a minor release at all. Le 6 mai 2013 19:22, "Robert Coli" a écrit : > On Sat, May 4, 2013 at 5:41 AM, Philippe wrote: > > After trying every possible combination of parameters, config and the > rest, > > I ended up downgrading the ne

how to get column family details dynamically in cassandra bulk load program

2013-05-07 Thread chandana.tummala
Dear All, I am using cassandra bulkload program from www.datastax.com/dev/blog/bulk-loading‎ In This for CSV entry we are giving column name and validation class . Is there any way to get the column names and validation class directly from database by giving just keyspace and column family name

Re: hector or astyanax

2013-05-07 Thread Blair Zajac
On 05/07/2013 01:37 AM, aaron morton wrote: i want to know which cassandra client is better? Go with Astynax or Native Binary, they are both under active development and support by a vendor / large implementor. Native Binary being which one specifically? Do you mean the new DataStax java-dri

Re: Cassandra won't restart : 7365....6c73 is not defined as a collection

2013-05-07 Thread Blair Zajac
On 05/07/2013 01:28 AM, aaron morton wrote: I have also been changing types, e.g. lock_tokens__ from MAP to MAP. The error looks like the schema was changed and a log replayed from before the change. Which obviously is not something we would expect to happen. Do you change the map type using A

mutation stalls and FileNotFoundException

2013-05-07 Thread Keith Wright
I am running 1.2.4 with Vnodes and have been writing at low volume. I have doubled the volume and suddenly 3 of my 6 nodes are showing much higher load than the others (30 vs 3) and tpstats show the mutation stage as completely full (see below). I did find a FileNotFoundException that I pasted

how to monitor nodetool cleanup?

2013-05-07 Thread Brian Tarbox
I'm recovering from a significant failure and so am doing lots of nodetool move, removetoken, repair and cleanup. For most of these I can do "nodetool netstats" to monitor progress but it doesn't show anything for cleanup...how can I monitor the progress of cleanup? On a related note: I'm able to

Cassanrda 1.1.11 compression: how to tell if it works ?

2013-05-07 Thread Oleg Dulin
I have a column family with really wide rows set to use Snappy like this: compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'} My understanding is that if a file is compressed I should not be able to use "strings" command to view its contents. B

HintedHandoff

2013-05-07 Thread Kanwar Sangha
Hi -I had a question on hinted-handoff. We have 2 DCs configured with overall RF = 2 (DC1:1, DC2:1) and 4 nodes in each DC (total - 8 nodes across 2 DCs) Now we do a write with CL = ONE and Hinted Handoff enabled. *If node 'X ' in DC1 which is a 'replica' node is down and a write co

Re: SSTables not opened on new cluste

2013-05-07 Thread Robert Coli
On Tue, May 7, 2013 at 4:26 AM, Philippe wrote : > Definitely knew that for major releases, didn't expect it for a minor > release at all. This sort of incompatibility is definitely more common between major versions, but not unheard of within minor series. =Rob

CQL3 Data Model Question

2013-05-07 Thread Keith Wright
Hi all, I was hoping you could provide some assistance with a data modeling question (my apologies if a similar question has already been posed). I have time based data that I need to store on a per customer (aka app id ) basis so that I can easily return it in sorted order by event time.

Re: CQL3 Data Model Question

2013-05-07 Thread Hiller, Dean
We use PlayOrm to do 60,000 different streams which are all time series and use the virtual column families of PlayOrm so they are all in one column family. We then partition by time as well. I don't believe that we really have any hotspots from what I can tell. Dean From: Keith Wright mailt

Re: CQL3 Data Model Question

2013-05-07 Thread Keith Wright
So in that case I would create a different column family for each app id and then a "time bucket" key as the row key with perhaps an hour resolution? Something like this: create 123_table organic_events ( hour timestamp, event_id UUID, app_id INT, event_time TIMESTAMP, user_id INT,

Re: CQL3 Data Model Question

2013-05-07 Thread Hiller, Dean
Playorm is not yet on CQL3 and cassandra doesn't work well with +10,000 CF's as we went down that path and cassandra can't cope, so we have one cassandra CF with 60,000 virtual CF's thanks to PlayOrm and a few other CF's. But yes, we bucket into hour or month or whatever depending on your rates an

backup strategy

2013-05-07 Thread Kanwar Sangha
Hi - If we have a RF=2 in a 4 node cluster, how do we ensure that the backup taken is only for 1 copy of the data ? in other words, is it possible for us to take back-up only from 2 nodes and not all 4 and still have at least 1 copy of the data ? Thanks, Kanwar

Re: how to monitor nodetool cleanup?

2013-05-07 Thread Michael Morris
Not sure about making things go faster, but you should be able to monitor it with nodetool compactionstats. Thanks, Mike On Tue, May 7, 2013 at 12:43 PM, Brian Tarbox wrote: > I'm recovering from a significant failure and so am doing lots of nodetool > move, removetoken, repair and cleanup. >