OrderPreservingPartitioner limits and workarounds

2010-04-07 Thread Paul Prescod
I have one append-oriented workload and I would like to know if Cassandra is appropriate for it. Given: * 100 nodes * an OrderPreservingPartitioner * a replication factor of "3" * a write-pattern of "always append" * a strong requirement for range queries My understanding is that there

RE: Slow Responses from 2 of 3 nodes in RC1

2010-04-07 Thread Mark Jones
I've rerun the tests with a few changes. The memory I ordered came in, so all 3 machines now have 8GB and I've increased Xmx to 6GB on each node. (I rarely see memory usage > than 2GB during the loading process, but during checking/reading, it skyrockets as expected) Each client thread has ab

Sorting and ordering in Cassandra

2010-04-07 Thread Paul Prescod
I'm working on a blog post that combines all of the information and ideas I can find relative to managing sorted lists in Cassandra. http://jottit.com/s8c4a/# Not only do I greatly appreciate comments, I actually don't think I can publish it without some feedback because there are some embedded q

What is loadbalance supposed to do? 0.6.0RC1

2010-04-07 Thread Mark Jones
It shouldn't remove a node from the ring should it? (appears it did) It shouldn't remove data from db, should it? (data size appears to grow, but records are now missing) Loaded 38 million "rows" and the ring looked like this: m...@ec2:~/cassandra/apache-cassandra-0.6.0-rc1$ bin/nodetool --h

Reached an EOL or something bizzare occured.

2010-04-07 Thread 叶江
hi: i setup a cluster with 2 nodes,and when i insert the data ,something wrong happened . This is my major code: for(int i = 0;i < 500;i++) { String tmp = "age" + i; client.insert("Keyspace1", key_user_id, new ColumnPath("St

Re: Net::Cassandra::Easy deletion failed

2010-04-07 Thread Ted Zlatanov
On Tue, 06 Apr 2010 14:14:55 -0700 Mike Gallamore wrote: MG> Great it works. Or at least the Cassandra/thrift part seems to MG> work. My tests don't pass but I think it is actual logic errors in the MG> test now, the column does appear to be getting cleared okay with the MG> new version of the

Re: What is loadbalance supposed to do? 0.6.0RC1

2010-04-07 Thread Sylvain Lebresne
> It shouldn't remove a node from the ring should it?  (appears it did) It does. As explained here: http://wiki.apache.org/cassandra/Operations, loadbalance 'decomission' the node and then add it back as a bootstrapping node (roughly). So that the node disappear is expected and it is supposed to

Re: Reached an EOL or something bizzare occured.

2010-04-07 Thread Jonathan Ellis
Upgrade to 0.6 On Wed, Apr 7, 2010 at 8:52 AM, 叶江 wrote: > hi: >   i setup a cluster with 2 nodes,and when i insert the data ,something wrong > happened . This is my major code: >        for(int i = 0;i < 500;i++) >                { >                String tmp = "age" + i; >                c

Cassandra cluster does not tolerate single node failure

2010-04-07 Thread Oleg Anastasjev
Hello, I am doing some tests of cassandra clsuter behavior on several failure scenarios. And i am stuck woith the very 1st test - what happens, if 1 node of cluster becomes unavailable. I have 4 4gb nodes loaded with write mostly test. Normally it works at the rate about 12000 ops/second. Replica

RE: What is loadbalance supposed to do? 0.6.0RC1

2010-04-07 Thread Mark Jones
The log said Bootstrapping @ 07:34 (since it was 08:35, I assumed it wasn't doing anything, also, CPU usage was < 10%) Turns out, when I restarted the node, it claimed the time was 7:35 rather than 8:35. Why would log4j be off by one hour? We are on CDT here, and have been for more than a w

Yet more strangeness RE: Slow Responses from 2 of 3 nodes in RC1

2010-04-07 Thread Mark Jones
I used nodetool to loadbalance the nodes, and now the high performance node is cassdb2, and cassdb1 is missing data. Cassdb1 and Cassdb3 now appear to be the laggards, with poor performance, so it doesn't seem to be the hardware that's having problems. -Original Message- From: Mark Jon

Re: Cassandra cluster does not tolerate single node failure

2010-04-07 Thread Jonathan Ellis
This is a known problem with 0.5 that was addressed in 0.6. On Wed, Apr 7, 2010 at 9:18 AM, Oleg Anastasjev wrote: > Hello, > > I am doing some tests of cassandra clsuter behavior on several failure > scenarios. And i am stuck woith the very 1st test - what happens, if 1 node of > cluster becomes

Inconsistency when unit testing

2010-04-07 Thread Philip Jackson
Hi, To summarise my app; * try to get item from UserUrl cf * if not found then check in the Url cf to see if we have fetched url before and add to UserUrl. * else, fetch the url and its details put in Url and UserUrl The unit tests covering this shouldn't hit the else as they put wha

Re: Inconsistency when unit testing

2010-04-07 Thread Sylvain Lebresne
Use ConsistencyLevel.QUORUM when you write *and* when you read. On Wed, Apr 7, 2010 at 5:26 PM, Philip Jackson wrote: > Hi, > > To summarise my app; > >  * try to get item from UserUrl cf >   * if not found then check in the Url cf to see if we have fetched >     url before and add to UserUrl. >

Cassandra cluster does not tolerate single node failure good

2010-04-07 Thread Oleg Anastasjev
Hello, I am doing some tests of cassandra clsuter behavior on several failure scenarios. And i am stuck with the very 1st test - what happens, if 1 node of cluster becomes unavailable. I have 4 4gb nodes loaded with write mostly test. Replication Factor is 2. Normally it works at the rate about 12

Re: Inconsistency when unit testing

2010-04-07 Thread Philip Jackson
At Wed, 7 Apr 2010 17:29:49 +0200, Sylvain Lebresne wrote: > > Use ConsistencyLevel.QUORUM when you write *and* when you read. I already do (plus, I only test with one node). BTW, I'm on 0.5.0, if that makes any difference. Cheers, Phil

Re: Cassandra cluster does not tolerate single node failure good

2010-04-07 Thread Jonathan Ellis
Isn't this the same question I just answered? On Wed, Apr 7, 2010 at 10:35 AM, Oleg Anastasjev wrote: > Hello, > > I am doing some tests of cassandra clsuter behavior on several failure > scenarios. > And i am stuck with the very 1st test - what happens, if 1 node of cluster > becomes unavailable

ConsistencyLevel.ZERO

2010-04-07 Thread Paul Prescod
On Tue, Apr 6, 2010 at 10:58 AM, Tatu Saloranta wrote: > On Tue, Apr 6, 2010 at 8:17 AM, Jonathan Ellis wrote: >> On Tue, Apr 6, 2010 at 2:13 AM, Ilya Maykov wrote: >>> That does sound similar. It's possible that the difference I'm seeing >>> between ConsistencyLevel.ZERO and ConsistencyLevel.AL

Re: ConsistencyLevel.ZERO

2010-04-07 Thread Jonathan Ellis
Great! On Wed, Apr 7, 2010 at 11:02 AM, Paul Prescod wrote: > On Tue, Apr 6, 2010 at 10:58 AM, Tatu Saloranta wrote: >> On Tue, Apr 6, 2010 at 8:17 AM, Jonathan Ellis wrote: >>> On Tue, Apr 6, 2010 at 2:13 AM, Ilya Maykov wrote: That does sound similar. It's possible that the difference I

Re: ConsistencyLevel.ZERO

2010-04-07 Thread Paul Prescod
Is it planned that Cassandra will eventually be able to handle a buffer overflow without crashing? Is this related to "Cassandra-685 - Add backpressure to StorageProxy" "Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutation

Re: Cassandra cluster does not tolerate single node failure good

2010-04-07 Thread Oleg Anastasjev
Jonathan Ellis gmail.com> writes: > > Isn't this the same question I just answered? > Umm, I am not sure. I looked over last 3 days of your replies and did not found my case. Could you gimme some clue plz ?

Re: Cassandra cluster does not tolerate single node failure good

2010-04-07 Thread Jordan Pittier
>Could you gimme some clue plz ? "This is a known problem with 0.5 that was addressed in 0.6." It seems you posted twice for the same issue On Wed, Apr 7, 2010 at 6:12 PM, Oleg Anastasjev wrote: > > Jonathan Ellis gmail.com> writes: > > > > > Isn't this the same question I just answered? >

Re: Bug in Cassandra that occurs when removing a supercolumn.

2010-04-07 Thread Matthew Grogan
I am seeing a similar problem running on 0.6 rc1. The data/logs have existed since 0.5. If I insert a new row then delete and re-insert then it works fine. If I delete a row that was created under 0.5 then delete and re-insert then the insert silently fails. I can delete the data/logs and start

Re: What is loadbalance supposed to do? 0.6.0RC1

2010-04-07 Thread Rob Coli
On 4/7/10 7:39 AM, Mark Jones wrote: Also, if the data is pushed out to the other nodes before the bootstrapping, why has data been lost? Does this mean that decommissioning a node results in data loss? As I understand it, in the following scenario : 1) Node A has Keys 0-10. 2) Add Node B

Re: OrderPreservingPartitioner limits and workarounds

2010-04-07 Thread Benjamin Black
I'd suggest you use RandomPartitioner, an index, and multiget. You'll be able to do range queries and won't have the load imbalance and performance problems of OPP and native range queries. b On Wed, Apr 7, 2010 at 3:51 AM, Paul Prescod wrote: > I have one append-oriented workload and I would

Re: OrderPreservingPartitioner limits and workarounds

2010-04-07 Thread Paul Prescod
Since I wrote that at 3:51AM (my time) I came to many of the same conclusions and decided to write them up to try and provide a high-level guide on sorting and ordering. * http://jottit.com/s8c4a/ But for completeness I was still hoping to document any workarounds that would help mitigate load b

Re: OrderPreservingPartitioner limits and workarounds

2010-04-07 Thread Jonathan Ellis
One thing you can do is manually "randomize" keys for any CFs that don't need the OP by pre-pending their md5 to the key you send Cassandra. (This is all RP is doing under the hood anyway.) On Wed, Apr 7, 2010 at 5:51 AM, Paul Prescod wrote: > I have one append-oriented workload and I would like

Can these stats be right?

2010-04-07 Thread Mark Jones
>From cfstats: SSTable count: 3 Space used (live): 4951669191 Space used (total): 5237040637 Memtable Columns Count: 190266 Memtable Data Size: 23459012 Memtable Switch Count: 89 Read Cou

Re: Bug in Cassandra that occurs when removing a supercolumn.

2010-04-07 Thread Jonathan Ellis
Your re-insert needs to have a higher timestamp than the delete, this is normal. On Wed, Apr 7, 2010 at 12:25 PM, Matthew Grogan wrote: > I am seeing a similar problem running on 0.6 rc1. > The data/logs have existed since 0.5. > If I insert a new row then delete and re-insert then it works fine.

Why can't you manage one node from another?

2010-04-07 Thread Mark Jones
I have 3 nodes in the cluster, and bin/nodetool --host this-host-name ring Works as expected, but bin/nodetool --host some-other-host ring always throws this exception: Error connecting to remote JMX agent! java.rmi.ConnectException: Connection refused to host: 127.0.1.1; nested exceptio

Re: Can these stats be right?

2010-04-07 Thread Rob Coli
On 4/7/10 12:16 PM, Mark Jones wrote: Read Latency: NaN ms. ./trunk/src/java/org/apache/cassandra/tools/NodeCmd.java " outs.println("\t\tRead Latency: " + String.format("%01.3f", cfstore.getRecentReadLatencyMicros() / 1000) + " " This call is telling you the (Read|Write|Rang

Re: Why can't you manage one node from another?

2010-04-07 Thread Jonathan Ellis
looks like you are running into http://wiki.apache.org/cassandra/JmxGotchas On Wed, Apr 7, 2010 at 2:21 PM, Mark Jones wrote: > I have 3 nodes in the cluster, and >    bin/nodetool --host this-host-name ring > Works as expected, but >    bin/nodetool --host some-other-host ring > > always throws

Re: Bug in Cassandra that occurs when removing a supercolumn.

2010-04-07 Thread Matthew Grogan
In both my cases the re-inserts have a higher timestamp. On 7 April 2010 20:13, Jonathan Ellis wrote: > Your re-insert needs to have a higher timestamp than the delete, this is > normal. > > On Wed, Apr 7, 2010 at 12:25 PM, Matthew Grogan > wrote: > > I am seeing a similar problem running on 0.

Re: Bug in Cassandra that occurs when removing a supercolumn.

2010-04-07 Thread Jonathan Ellis
If you can make a reproducible test case using the example CF definitions, that would be great. On Wed, Apr 7, 2010 at 2:48 PM, Matthew Grogan wrote: > In both my cases the re-inserts have a higher timestamp. > On 7 April 2010 20:13, Jonathan Ellis wrote: >> >> Your re-insert needs to have a hig

Handshake failed

2010-04-07 Thread Jason Alexander
Hey guys, Excuse my noobishness here, we're working through the initial PoC phases of implementing Cassandra here on one of our major systems we're building, and I'm having a few problems. I'm running Cassandra 0.5.1 on Fedora 12 in a VM on OS X, with the network interface running in bridged

Re: Handshake failed

2010-04-07 Thread Brandon Williams
On Wed, Apr 7, 2010 at 3:15 PM, Jason Alexander wrote: >TTransport transport = new TSocket("10.223.131.19", ); > This is not the default Thrift port (unless you explicitly set that way), you probably want port 9160. -Brandon

Re: Handshake failed

2010-04-07 Thread Jonathan Ellis
That means you're connecting the the debugger port, instead of the thrift one. (Thrift is 9160 by default.) On Wed, Apr 7, 2010 at 3:15 PM, Jason Alexander wrote: > Hey guys, > > > Excuse my noobishness here, we're working through the initial PoC phases of > implementing Cassandra here on one o

writes to Cassandra failing occasionally

2010-04-07 Thread Mike Gallamore
I have writes to cassandra that are failing, or at least a read shortly after a write is still getting an old value. I realize Cassandra is "eventually consistent" but this system is a single CPU single node with consistency level set to 1, so this seems odd to me. My setup: Cassandra 0.6rc1

RE: Handshake failed

2010-04-07 Thread Jason Alexander
Awesome, thanks guys - yeah, I was trying 9160, but not able to connect. Let me tweak my firewall settings and see if I can figure out why my VM is getting sandboxed so harshly. Thanks again, -Jason -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, Ap

Re: writes to Cassandra failing occasionally

2010-04-07 Thread Eric Evans
On Wed, 2010-04-07 at 13:19 -0700, Mike Gallamore wrote: > I have writes to cassandra that are failing, or at least a read > shortly after a write is still getting an old value. I realize > Cassandra is "eventually consistent" but this system is a single CPU > single node with consistency level set

Re: Heap sudden jump during import

2010-04-07 Thread Eric Evans
On Tue, 2010-04-06 at 10:55 -0700, Tatu Saloranta wrote: > On Tue, Apr 6, 2010 at 12:15 AM, JKnight JKnight > wrote: > > When import, all data in json file will load in memory. So that, you > can not > > import large data. > > You need to export large sstable file to many small json files, and > r

Iterate through entire data set

2010-04-07 Thread Sonny Heer
I need a way to process all of my data set. A way to process every keyspace, CF, row, column, and perform some operation based on that mapped combination. The map bucket would collect down to column name. Is there a map/reduce program which shows how to go about doing this?

Re: Iterate through entire data set

2010-04-07 Thread Jonathan Ellis
Look at the READMEs for contrib/word_count and contrib/pig. On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer wrote: > I need a way to process all of my data set. > > A way to process every keyspace, CF, row, column, and perform some > operation based on that mapped combination. > > The map bucket would

Re: Iterate through entire data set

2010-04-07 Thread Sonny Heer
These examples work on Cassandra .06 and Hadoop .20.2? On Wed, Apr 7, 2010 at 2:49 PM, Jonathan Ellis wrote: > Look at the READMEs for contrib/word_count and contrib/pig. > > On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer wrote: >> I need a way to process all of my data set. >> >> A way to process e

Re: Iterate through entire data set

2010-04-07 Thread Jonathan Ellis
Yes On Wed, Apr 7, 2010 at 5:01 PM, Sonny Heer wrote: > These examples work on Cassandra .06 and Hadoop .20.2? > > On Wed, Apr 7, 2010 at 2:49 PM, Jonathan Ellis wrote: >> Look at the READMEs for contrib/word_count and contrib/pig. >> >> On Wed, Apr 7, 2010 at 4:47 PM, Sonny Heer wrote: >>> I n

Is this sentence slightly inaccurate

2010-04-07 Thread Paul Prescod
"With OrderPreservingPartitioner the keys themselves are used to place on the ring. One of the potential drawbacks of this approach is that if rows are inserted with sequential keys, all the write load will go to the same node." http://wiki.apache.org/cassandra/StorageConfiguration Wouldn't the "

Integrity of batch_insert and also what about sharding?

2010-04-07 Thread banks
1. I have a farm of web servers, two of them write the same supercolumn full of various data (day 100 columns), can the various columns end up mixed and matched within the super column? I know there is no real form of transaction, but if i'm using batch_insert do they all go in consistent? or is i

Re: Is this sentence slightly inaccurate

2010-04-07 Thread Eric Evans
On Wed, 2010-04-07 at 15:13 -0700, Paul Prescod wrote: > "With OrderPreservingPartitioner the keys themselves are used to place > on the ring. One of the potential drawbacks of this approach is that > if rows are inserted with sequential keys, all the write load will go > to the same node." Yeah,

Re: Is this sentence slightly inaccurate

2010-04-07 Thread Benjamin Black
On Wed, Apr 7, 2010 at 3:13 PM, Paul Prescod wrote: > > Also: Dominic Williams says that one of the advantages of the > OrderPreservingPartitioner is: "3. If you screw up, you can scan over > your data to recover/delete orphaned keys" > > Does anyone know off the top of their head what he might ha

Re: Integrity of batch_insert and also what about sharding?

2010-04-07 Thread Benjamin Black
On Wed, Apr 7, 2010 at 3:41 PM, banks wrote: > > 2. each cassandra node essentially has the same datastore as all nodes, > correct? No. The ReplicationFactor you set determines how many copies of a piece of data you want. If your number of nodes is higher than your RF, as is common, you will no

Re: Is this sentence slightly inaccurate

2010-04-07 Thread David Strauss
On 2010-04-07 23:00, Benjamin Black wrote: > If you are using RP and your own, secondary indices, you have no way > to access rows except by get on the key. Thus, if you lose or corrupt > your indices, you may no longer know all your row keys. With OPP and > range queries, you can discover them.

Re: Iterate through entire data set

2010-04-07 Thread Sonny Heer
Jon, I've got the word_count.jar and a Hadoop cluster. How do you usually run this sample? On Wed, Apr 7, 2010 at 3:04 PM, Jonathan Ellis wrote: > Yes > > On Wed, Apr 7, 2010 at 5:01 PM, Sonny Heer wrote: >> These examples work on Cassandra .06 and Hadoop .20.2? >> >> On Wed, Apr 7, 2010 at 2:

Re: Is this sentence slightly inaccurate

2010-04-07 Thread Benjamin Black
It was not, but is now. On Wed, Apr 7, 2010 at 4:23 PM, David Strauss wrote: > On 2010-04-07 23:00, Benjamin Black wrote: >> If you are using RP and your own, secondary indices, you have no way >> to access rows except by get on the key.  Thus, if you lose or corrupt >> your indices, you may no l

Cassandra at Twitter's Chirp conference

2010-04-07 Thread Ryan King
I'll be giving a talk at our developer's conference next week about how and why we're using cassandra. If there's anything you'd like to hear about, post your question on http://www.google.com/moderator/#15/e=5c0f&t=5c0f.49&f=5c0f.23623. thanks, ryan PS - Yes, I think video will be available.

Re: cassandra data viewer?

2010-04-07 Thread banks
good question, any answers? 2010/4/6 AJ Chen > that looks good. is there a similar cassandra tool in java? > > > On Mon, Apr 5, 2010 at 5:59 PM, selam wrote: > >> look at chiton on github. >> >> On Tue, Apr 6, 2010 at 3:06 AM, AJ Chen wrote: >> > Is there a generic GUI tool for viewing cassan

does compaction of Super Column Family have same limit as compaction of Column Family

2010-04-07 Thread Jeremy Davis
Quick question: There is an open issue with ColumnFamilies growing too large to fit in memory when compacting.. Does this same limit also apply to SCF? As long as each sub CF is sufficiently small, etc. -JD

Re: does compaction of Super Column Family have same limit as compaction of Column Family

2010-04-07 Thread Benjamin Black
SCF rows are loaded in their entirety into memory, so the limit applies in the same way. On Wed, Apr 7, 2010 at 5:16 PM, Jeremy Davis wrote: > Quick question: > There is an open issue with ColumnFamilies growing too large to fit in > memory when compacting.. > Does this same limit also apply to S

Re: cassandra data viewer?

2010-04-07 Thread Jonathan Ellis
Why does the language the tool is written in matter? No, there are no Java clones of chiton. 2010/4/7 banks : > good question, any answers? > > 2010/4/6 AJ Chen >> >> that looks good. is there a similar cassandra tool in java? >> >> On Mon, Apr 5, 2010 at 5:59 PM, selam wrote: >>> >>>  look at

Odd/incorrect getSlice behavior with 0.6.0rc1

2010-04-07 Thread Paul Brown
predicate passed to get slice contains a range with an open left bound (zero-element byte array for start) and a closed right bound of "20100406" with true specified for "reversed" and a fetch size of 1, but it still returns the column for "20100407". Bug? Misconception on my part? -- Paul

Re: Integrity of batch_insert and also what about sharding?

2010-04-07 Thread banks
Then from an IT standpoint, if i'm using a RF of 3, it stands to reason that running on Raid 1 makes sense, since RAID and RF achieve the same ends... it makes sense to strip for speed and let cassandra deal with redundancy, eh? On Wed, Apr 7, 2010 at 4:07 PM, Benjamin Black wrote: > On Wed, Ap

Re: Integrity of batch_insert and also what about sharding?

2010-04-07 Thread Paul Prescod
On Wed, Apr 7, 2010 at 6:02 PM, banks wrote: > Then from an IT standpoint, if i'm using a RF of 3, it stands to reason that > running on Raid 1 makes sense, since RAID and RF achieve the same ends... it > makes sense to strip for speed and let cassandra deal with redundancy, eh? Isn't it RAID-0 t

Re: Iterate through entire data set

2010-04-07 Thread Stu Hood
Please read the README in the contrib/word_count directory. -Original Message- From: "Sonny Heer" Sent: Wednesday, April 7, 2010 6:33pm To: user@cassandra.apache.org Subject: Re: Iterate through entire data set Jon, I've got the word_count.jar and a Hadoop cluster. How do you usually ru

RE: Integrity of batch_insert and also what about sharding?

2010-04-07 Thread Jason Alexander
I think banks meant to s/strip/stripe, in which case RAID 0 == striping. And, yes, striping is purely for perf, no redundancy. HTH, -Jason From: Paul Prescod [pres...@gmail.com] Sent: Wednesday, April 07, 2010 8:07 PM To: user@cassandra.apache.org Subject

Re: Integrity of batch_insert and also what about sharding?

2010-04-07 Thread Benjamin Black
That depends on your goals for fault tolerance and recovery time. If you use RAID1 (or other redundant configuration) you can tolerate disk failure without Cassandra having to do repair. For large data sets, that can be a significant win. b On Wed, Apr 7, 2010 at 6:02 PM, banks wrote: > Then

Re: Integrity of batch_insert and also what about sharding?

2010-04-07 Thread banks
What I'm trying to wrap my head around is what is the break even point... If I'm going to store 30terabytes in this thing... whats optimum to give me performance and scalability... is it best to be running 3 powerfull nodes, 100 smaller nodes, nodes on each web blade with 300g behind each... ya k

RE: Integrity of batch_insert and also what about sharding?

2010-04-07 Thread Jason Alexander
FWIW, I'd love to see some guidance here too - >From our standpoint, we'll be consolidating the various Match.com sites' >(match.com, chemistry.com, etc...) data into a single data warehouse, running >Cassandra. We're looking at roughly the same amounts of data (30TB's or more). >We were assumi

Re: Integrity of batch_insert and also what about sharding?

2010-04-07 Thread Benjamin Black
What benefit does a SAN give you? I've generally been confused by that approach, so I'm assuming I am missing something. On Wed, Apr 7, 2010 at 6:58 PM, Jason Alexander wrote: > FWIW, I'd love to see some guidance here too - > > From our standpoint, we'll be consolidating the various Match.com s

Re: Integrity of batch_insert and also what about sharding?

2010-04-07 Thread Benjamin Black
Recovery times are shorter the less data per node, so lots of smaller nodes are better on that axis. More nodes also means more frequent node failure, so lots of smaller nodes are worse on that axis. The gossip chatter is miniscule, even with large clusters. Simply not a factor. On Wed, Apr 7,

RE: Integrity of batch_insert and also what about sharding?

2010-04-07 Thread Jason Alexander
Well, IANAITG (I Am Not An IT Guy), but outside of the normal benefits you get from a SAN (that you can, of course, get from other options) is that I believe our IT group likes it for the management aspects - they like to buy a BigAssSAN(tm) and provision storage to different clusters, environme

Re: Integrity of batch_insert and also what about sharding?

2010-04-07 Thread Cliff Moon
Putting cassandra's data directories on a SAN is like putting a bunch of F1's on one of those big car carrier trucks and entering a race with the truck. You know, since you have so much horsepower. On 4/7/10 7:28 PM, Jason Alexander wrote: Well, IANAITG (I Am Not An IT Guy), but outside of th

Basic question

2010-04-07 Thread Palaniappan Thiyagarajan
All, I am investigating how we can use Cassandra in our application. We have tokens and session information stored in db now and I am thinking of moving to Cassandra. Currently it's write and read intensive and having performance issue. Is it good idea to move couple of tables and integrate

Re: Integrity of batch_insert and also what about sharding?

2010-04-07 Thread David Timothy Strauss
Based on empirical usage, Gossip chatter is quite manageable well beyond 100 nodes. One advantage of many small nodes is that the cost of node failure is small on rebuild. If you have 100 nodes with a hundred gigs each, the price you pay for a node's complete failure is pulling a hundred gig

Odd/incorrect getSlice behavior with 0.6.0rc1

2010-04-07 Thread Paul Brown
> [Question about slice behavior from me] Bug? Misconception on my part? The answer is misconception; I interpreted the behavior of get_slice to be columns between X and Y, in ascending or descending depending on the "reversed" flag, but it's a little different start = X, end = Y, reversed =