Re: SSTable format

2012-07-13 Thread prasenjit mukherjee
> > It depends on what partitioner you use. You should be using the > RandomPartitioner, and if so, the rows are sorted by the hash of the row > key. there are partitioners that sort based on the raw key value but these > partitioners shouldn't be used as they have problems due to uneven > partitio

Re: SSTable format

2012-07-13 Thread Dave Brosius
While in memory cassandra calls it a MemTable, but yes sstables are write-once, and later combined with others into new ones thru compaction. On 07/13/2012 09:54 PM, Michael Theroux wrote: Thanks for the information, So is the SStable essentially kept in memory, then sorted and written to di

Re: SSTable format

2012-07-13 Thread Michael Theroux
Thanks for the information, So is the SStable essentially kept in memory, then sorted and written to disk on flush? After that point, an SStable is not modified, but can be written to another SStable through compaction? -Mike On Jul 13, 2012, at 8:22 PM, Rob Coli wrote: > On Fri, Jul 13, 201

Re: SSTable format

2012-07-13 Thread Rob Coli
On Fri, Jul 13, 2012 at 5:18 PM, Dave Brosius wrote: > It depends on what partitioner you use. You should be using the > RandomPartitioner, and if so, the rows are sorted by the hash of the row > key. there are partitioners that sort based on the raw key value but these > partitioners shouldn't be

Re: SSTable format

2012-07-13 Thread Dave Brosius
On 07/13/2012 08:00 PM, Michael Theroux wrote: Hello, I've been trying to understand in greater detail how SStables are stored, and how information is transferred between Cassandra nodes, especially when a new node is joining a cluster. Specifically, Is information stored to SStables ordered

SSTable format

2012-07-13 Thread Michael Theroux
Hello, I've been trying to understand in greater detail how SStables are stored, and how information is transferred between Cassandra nodes, especially when a new node is joining a cluster. Specifically, Is information stored to SStables ordered by rowkeys? Some of the articles I've read sugg

Re: Increased replication factor not evident in CLI

2012-07-13 Thread Dustin Wenz
I was able to apply the patch in the cited bug report to the public source for version 1.1.2. It seemed pretty straightforward; six lines in MigrationManager.java were switched from System.currentTimeMillis() to FBUtilities.timestampMicros(). I then re-built the project by running 'ant artifact

2012 Cassandra MVP nominations

2012-07-13 Thread Jonathan Ellis
DataStax would like to recognize individuals who go above and beyond in their contributions to Apache Cassandra. To formalize this a little bit, we're creating an MVP program, the first of which will be announced at the Cassandra summit [1] in August. To make this program a success, we need your

Cassandra Summit 2012

2012-07-13 Thread Jonathan Ellis
Hi all, The 2012 Cassandra Summit will be in San Jose on August 8. The 2011 Summit sold out with almost 500 attendees; this year we found a bigger venue to accommodate 700+. It's fantastic to see the Cassandra community grow like this! The 2012 Summit will have *four* talk tracks, plus the popu

Re: How to speed up data loading

2012-07-13 Thread Tupshin Harper
Any chance your server has been running for the last two weeks with the leap second bug? http://www.datastax.com/dev/blog/linux-cassandra-and-saturdays-leap-second-problem -Tupshin On Jul 12, 2012 1:43 PM, "Leonid Ilyevsky" wrote: > I am loading a large set of data into a CF with composite key.

Re: Increased replication factor not evident in CLI

2012-07-13 Thread Dustin Wenz
It sounds plausible that is what we are running into. All of our nodes report a replication factor of 2 (both using describe, and show schema), even though the cluster reported that all schemas agree after I issued the change to 4. If this is related to the bug that you filed, it might also expl

Re: Cassandra and Tableau

2012-07-13 Thread Robin Verlangen
Thank you Aaron and Brian. We're currently investigating several options. Hadoop + Hive combo also seems a good choice as our input files are flat. I'll keep you up-to-date about our final decision. - Robin 2012/7/6 aaron morton > Here are two links I've noticed in my travels, have not looked i

Never ending manual repair after adding second DC

2012-07-13 Thread Bart Swedrowski
Hello everyone, I'm facing quite weird problem with Cassandra since we've added secondary DC to our cluster and have totally ran out of ideas; this email is a call for help/advice! History looks like: - we used to have 4 nodes in a single DC - running Cassandra 0.8.7 - RF:3 - around 50GB of data

Re: Using a node in separate cluster without decommissioning.

2012-07-13 Thread rohit bhatia
Hi Just wanted to say that it worked. I also made sure to modify thrift rpc_port and storage port so that the two clusters don't interfere. Thanks for the suggestion Thanks Rohit On Thu, Jul 12, 2012 at 10:01 AM, aaron morton wrote: > Since replication factor is 2 in first cluster, I > won't lo