Re: Timeseries data

2013-03-28 Thread sankalp kohli
I think if you use Level compaction, the number of sstables you will touch will be less because sstables in each level is non overlapping except L0. On Wed, Mar 27, 2013 at 8:20 PM, aaron morton wrote: > sstablekey can help you find which sstables your keys are in. > > But yes, a slice call will

CQL3 And Map Literals

2013-03-28 Thread Gareth Collins
Hello, I have been playing with map literals in CQL3 queries. I see that single-quotes work: {'foo':'bar'} but double-quotes do not: {"foo":"bar"} I am curious. Was there a specific reason why it was decided to use single-quotes? I ask because double-quotes would make this valid json. thanks

Re: Datastax AMI with multipart user-data

2013-03-28 Thread Adam Venturella
Ok I got it figured out. It was because I was installing packages in the clout config file under the packages: directive. I just switched it out to be in runcmd, now I don't get any locking error when it installs packages as runcmd does not execute till the end. On Thu, Mar 28, 2013 at 1:18 PM,

Re: Lots of Deleted Rows Came back after upgrade 1.1.6 to 1.1.10

2013-03-28 Thread Arya Goudarzi
I am not familiar with that part of the code yet. But what if the gc_grace was changed to a lower value as part of a schema migration after the hints have been marked with TTLs equal to the lower gc_grace before the migration? >From what you've described, I think this is not an issue for us as we

Datastax AMI with multipart user-data

2013-03-28 Thread Adam Venturella
So, it looks like it supports multipart user-data: Line 86 here: https://github.com/riptano/ComboAMI/blob/2.4/ds2_configure.py I make my multipart user data, text/plaintext text/cloud-config I need to do some configuration and hook the cluster up to my puppet master. The cluster get's configur

Re: Incompatible Gossip 1.1.6 to 1.2.1 Upgrade?

2013-03-28 Thread Arya Goudarzi
There has been a little misunderstanding. When all nodes are 1.2.2, they are fine. But during the rolling upgrade, 1.2.2 nodes see 1.1.10 nodes as down in nodetool command despite gossip reporting NORMAL. I will give your suggestion a try and wil report back. On Sat, Mar 23, 2013 at 10:37 AM, aaro

Re: Infinit Loop in CompactionExecutor

2013-03-28 Thread Arya Goudarzi
I did a nodetool rebuild and it seemed to clear out the pending compactions and didn't have the Exceptions in the log any more, so it fixed the issue intermittently. Not it is time to expedite the upgrade. On Wed, Mar 27, 2013 at 1:10 PM, aaron morton wrote: > Is there a workaround beside upgrad

Re: Clearing tombstones

2013-03-28 Thread Joel Samuelsson
Yeah, I didn't mean "normal" as in "what most people use". I meant that they are not "strange" like Tyler mentions. 2013/3/28 aaron morton > The cleanup operation took several minutes though. This doesn't seem > normal then > > It read all the data and made sure the node was a replica for it. S

Re: CQL vs. non-CQL data models

2013-03-28 Thread Marko Asplund
Hi, Thanks for the helpful reply, Aaron! I also found this Datastax blog post very helpful in this case: http://www.datastax.com/dev/blog/thrift-to-cql3 marko Aaron Morton wrote: >> Is this data model defined by Thrift? How closely does it reflect the >> Cassandra internal data model? > Yes. >

Re: lots of extra bytes on disk

2013-03-28 Thread Wei Zhu
Hi Ben, If affordable, just blow away the node and bootstrap in a replacement/ or restore from snapshot and repair. -Wei - Original Message - From: "Dean Hiller" To: user@cassandra.apache.org Sent: Thursday, March 28, 2013 11:40:21 AM Subject: Re: lots of extra bytes on disk Oh and si

Re: lots of extra bytes on disk

2013-03-28 Thread Hiller, Dean
Oh and since our LCS was 10MB per file it was easy to tell which files did not convert yet. Also, we ended up blowing away a CF on node 5(of 6) and running a full repair on that CF and after he was at a normal size again as well. Dean On 3/28/13 12:35 PM, "Hiller, Dean" wrote: >We had a runawa

Re: lots of extra bytes on disk

2013-03-28 Thread Hiller, Dean
We had a runaway STCS like this due to our own mistakes but were not sure how to clean it up. We went to LCS instead of STCS and that seemed to bring it way back down since the STCS had repeats and such between SSTables which LCS avoids mostly. I can't help much more than that info though. Dean

Re: lots of extra bytes on disk

2013-03-28 Thread Ben Chobot
Sorry to make it confusing. I didn't have snapshots on some nodes; I just made a snapshot on a node with this problem. So to be clear, on this one example node Cassandra reports ~250GB of space used In a CF data directory (before snapshots existed), du -sh showed ~550GB After the snapshot

Re: lots of extra bytes on disk

2013-03-28 Thread Hiller, Dean
I am confused. I thought you said you don't have a snapshot. Df/du reports space used by existing data AND the snapshot. Cassandra only reports on space used by actual dataif you move the snapshots, does df/du match what cassandra says? Dean On 3/28/13 12:05 PM, "Ben Chobot" wrote: >

Re: lots of extra bytes on disk

2013-03-28 Thread Ben Chobot
.though interestingly, the snapshot of these CFs have the "right" amount of data in them (i.e. it agrees with the live SSTable size reported by cassandra). Is it total insanity to remove the files from the data directory not included in the snapshot, so long as they were created before the s

Re: lots of extra bytes on disk

2013-03-28 Thread Ben Chobot
Actually, due to a misconfiguration, we weren't snapshotting at all on some of the nodes that are experiencing this problem. So while we've fixed that, snapshot don't explain the problem. On Mar 28, 2013, at 10:54 AM, Hiller, Dean wrote: > Have you cleaned up your snapshotsŠthose take extra spa

Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-28 Thread Hiller, Dean
LCS does much more "constant" compaction than STCS keeping load on the disks (read and write to move the data) higher. STCS does no do as much constant operations. Dean From: Alain RODRIGUEZ mailto:arodr...@gmail.com>> Reply-To: "user@cassandra.apache.org" ma

Re: lots of extra bytes on disk

2013-03-28 Thread Hiller, Dean
Have you cleaned up your snapshotsŠthose take extra space and don't just go away unless you delete them. Dean On 3/28/13 11:46 AM, "Ben Chobot" wrote: >Are you also running 1.1.5? I'm wondering (ok hoping) that this might be >fixed if I upgrade. > >On Mar 28, 2013, at 8:53 AM, Lanny Ripple wrot

Re: lots of extra bytes on disk

2013-03-28 Thread Ben Chobot
Are you also running 1.1.5? I'm wondering (ok hoping) that this might be fixed if I upgrade. On Mar 28, 2013, at 8:53 AM, Lanny Ripple wrote: > We occasionally (twice now on a 40 node cluster over the last 6-8 months) see > this. My best guess is that Cassandra can fail to mark an SSTable for

Re: lots of extra bytes on disk

2013-03-28 Thread Lanny Ripple
We occasionally (twice now on a 40 node cluster over the last 6-8 months) see this. My best guess is that Cassandra can fail to mark an SSTable for cleanup somehow. Forced GC's or reboots don't clear them out. We disable thrift and gossip; drain; snapshot; shutdown; clear data/Keyspace/Table/

lots of extra bytes on disk

2013-03-28 Thread Ben Chobot
Some of my cassandra nodes in my 1.1.5 cluster show a large discrepancy between what cassandra says the SSTables should sum up to, and what df and du claim exist. During repairs, this is almost always pretty bad, but post-repair compactions tend to bring those numbers to within a few percent of

Re: Vnodes - HUNDRED of MapReduce jobs

2013-03-28 Thread Edward Capriolo
yes. The input format is making split per vnodes it can be optimized likely. On Thu, Mar 28, 2013 at 9:30 AM, Alicia Leong wrote: > Hi All, > > I have 3 nodes of Cassandra 1.2.3 & edited the cassandra.yaml for vnodes. > > When I execute a M/R job .. the console showed HUNDRED of Map tasks. > > M

Re: Vnodes - HUNDRED of MapReduce jobs

2013-03-28 Thread Alicia Leong
Thanks Cem for your confirmation on this. I guess, Login > Create Issue at https://issues.apache.org/jira/browse/CASSANDRA On Thu, Mar 28, 2013 at 9:40 PM, cem wrote: > Hi Alicia , > > Cassandra input format creates mappers as many as vnodes. It is a known > issue. You need to lower the numb

Re: Vnodes - HUNDRED of MapReduce jobs

2013-03-28 Thread cem
Hi Alicia , Cassandra input format creates mappers as many as vnodes. It is a known issue. You need to lower the number of vnodes :( I have a simple solution for that and ready to write a patch. Should I create a ticket about that? I don't know the procedure about that. Regards, Cem On Thu, Ma

Vnodes - HUNDRED of MapReduce jobs

2013-03-28 Thread Alicia Leong
Hi All, I have 3 nodes of Cassandra 1.2.3 & edited the cassandra.yaml for vnodes. When I execute a M/R job .. the console showed HUNDRED of Map tasks. May I know, is the normal since is vnodes? If yes, this have slow the M/R job to finish/complete. Thanks

Re: bloom filter fp ratio of 0.98 with fp_chance of 0.01

2013-03-28 Thread Alain RODRIGUEZ
"remember is used more IO than STS" Are you meaning during compactions ? Because I thought that LCS should decrease the number of disks reads (since 90% of the data aren't spread across multiple sstables and C* needs to read only a file to find the entire row) while not compacting right ? 2013/3

Re: weird behavior with RAID 0 on EC2

2013-03-28 Thread Alain RODRIGUEZ
Ok, if you're going to look into it, please keep me/us posted. It happen twice for me, the same day, within a few hours on the same node and only happened to 1 node out of 12, making this node almost unreachable. 2013/3/28 aaron morton > I noticed this on an m1.xlarge (cassandra 1.1.10) instan

Problem with streaming data from Hadoop: DecoratedKey(-1, )

2013-03-28 Thread Michal Michalski
We're streaming data to Cassandra directly from MapReduce job using BulkOutputFormat. It's been working for more than a year without any problems, but yesterday one of 600 mappers faild and we got a strange-looking exception on one of the C* nodes. IMPORTANT: It happens on one node and on one

Re: TimeUUID Order Partitioner

2013-03-28 Thread Carlos Pérez Miguel
Apparently the MemTable..writeSortedContents has the same problem: I can see how it iterates over the stored keys in byte order, so my classes have something wrong. For the curious, these are my classes until now: https://gist.github.com/anonymous/5261611 Carlos Pérez Miguel 2013/3/28 aaron m

Reading data in bulk from cassandra for indexing in Elastic search

2013-03-28 Thread Utkarsh Sengar
Hello, I am trying to implement an indexer for a column family in cassandra (cluster of 4 nodes) using elastic search. There is a river pluginwhich I am writing which retrieves data from cassandra and throws to elastic search. It is triggered on