RE: FW: Very slow batch insert using version 0.7.2

2011-03-11 Thread Desimpel, Ignace
That is the amount of records I need to add for each document. And we would like to test it with more than 100K or more documents. That's why we thought Cassandra could be a good database system. At start I did the inserts one by one. Of course by doing it in batch the system was a lot faster,

Re: problem with bootstrap

2011-03-11 Thread Patrik Modesto
Unfortunately I can't provide the info, I deleted it. It was in wery strange state. I started with new cluster today, 2 nodes, each with auto_bootstrap:true. I can create a keyspace with RF=3, but I can't insert any data in it. It didn't happen with the old cluster which made me think. How could I

Re: FW: Very slow batch insert using version 0.7.2

2011-03-11 Thread Zhu Han
On Fri, Mar 11, 2011 at 10:40 AM, Erik Forkalsrud wrote: > > I see the same behavior with smaller batch sizes. It appears to happen > when starting Cassandra with the defaults on relatively large systems. > Attached is a script I created to reproduce the problem. (usage: mutate.sh > /path/to/ap

Re: memory utilization

2011-03-11 Thread Chris Burroughs
On 03/10/2011 09:26 PM, Bill Hastings wrote: > Hi All > > Memory utilization reported by JCOnsole for Cassandra seems to be much > lesser than that reported by top ("RES" memory). Can someone explain this? > Maybe off topic but would appreciate a response. > Is there an more or less constant amo

Poor performance on small data set

2011-03-11 Thread Vodnok
Hi, I'm facing poor performance issue getting a simple row Here is my dev env : - Windows 7 - PHPCassa [PHP 5.3.5] - Cassandra 0.7.3 CF : create column family docs with comparator = 'UTF8Type' and column_type = 'Standard' and rows_cached=10 and keys_cached=10; There is less than 1000 r

Re: Poor performance on small data set

2011-03-11 Thread Peter Schuller
> There is less than 1000 rows and i've got a 75-100ms to get one row by id > With memcached it's 2ms > > I don't know where is the problem. jvm ? cassandra ? phpcassa ? > > What can i do to detect where is the problem ? I'm not familiar with the PHP client, but this sounds suspiciously like a

Re: Poor performance on small data set

2011-03-11 Thread Edward Capriolo
On Fri, Mar 11, 2011 at 11:44 AM, Peter Schuller wrote: >> There is less than 1000 rows and i've got a 75-100ms to get one row by id >> With memcached it's 2ms >> >> I don't know where is the problem. jvm ? cassandra ? phpcassa ? >> >> What can i do to detect where is the problem ? > > I'm not

Re: Pig output to Cassandra

2011-03-11 Thread Jeremy Hanna
Yep - it's usable and separate so you should be able to download 0.7-branch and build the jar and use it against a 0.7.3 cluster. I've been using it against a 0.7.2 cluster actually. http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7/ To use it, check out the readme in the contri

Re: Poor performance on small data set

2011-03-11 Thread Jonathan Ellis
Also: https://issues.apache.org/jira/browse/THRIFT-638 On Fri, Mar 11, 2011 at 10:44 AM, Peter Schuller wrote: >> There is less than 1000 rows and i've got a 75-100ms to get one row by id >> With memcached it's 2ms >> >> I don't know where is the problem. jvm ? cassandra ? phpcassa ? >> >> Wh

Re: Pig output to Cassandra

2011-03-11 Thread Mark
I'll give it a try. Thanks alot! On 3/11/11 9:30 AM, Jeremy Hanna wrote: Yep - it's usable and separate so you should be able to download 0.7-branch and build the jar and use it against a 0.7.3 cluster. I've been using it against a 0.7.2 cluster actually. http://svn.apache.org/repos/asf/cass

Re: FW: Very slow batch insert using version 0.7.2

2011-03-11 Thread Erik Forkalsrud
On 03/11/2011 04:56 AM, Zhu Han wrote: When I run it on my laptop (Fedora 14, 64-bit, 4 cores, 8GB RAM) it flushes one Memtable with 5000 operations When I run it on a server (RHEL5, 64-bit, 16 cores, 96GB RAM) it flushes 100 Memtables with anywhere between 1 operation and 359

Re: FW: Very slow batch insert using version 0.7.2

2011-03-11 Thread Jonathan Ellis
https://issues.apache.org/jira/browse/CASSANDRA-2158, fixed in 0.7.3 you could have saved a lot of time just by upgrading first. :) On Fri, Mar 11, 2011 at 2:02 PM, Erik Forkalsrud wrote: > On 03/11/2011 04:56 AM, Zhu Han wrote: >> >> When I run it on my laptop (Fedora 14, 64-bit, 4 cores, 8GB R

Re: FW: Very slow batch insert using version 0.7.2

2011-03-11 Thread Erik Forkalsrud
On 03/11/2011 12:13 PM, Jonathan Ellis wrote: https://issues.apache.org/jira/browse/CASSANDRA-2158, fixed in 0.7.3 you could have saved a lot of time just by upgrading first. :) Hmm, I'm testing with 0.7.3 ... but now I know at least which knob to turn. - Erik -

Re: How long will all nodes data sync.

2011-03-11 Thread aaron morton
The answer is eventually. There is no point in time. The simple case where your 3 nodes are up, and not under pressure, your write will end up at every replica. Hope that helps . Aaron On 11 Mar 2011, at 16:38, Vincent Lu (ECL) wrote: > Hi all, > > I have a question about eventually consi

Re: How long will all nodes data sync.

2011-03-11 Thread mcasandra
Is there a way to monitor how far behind the sync is? In case of hinted hand off or when node is down for extended period of time it will probably be helpful to know. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-long-will-all-nodes-data-syn

Re: How long will all nodes data sync.

2011-03-11 Thread Jonathan Ellis
If you're using ConsistencyLevel appropriately then it doesn't matter, because you're guaranteed to see data as current as you need. On Thu, Mar 10, 2011 at 9:38 PM, Vincent Lu (ECL) wrote: > Hi all, > > > > I have a question about eventually consistency. > > If there are 3 nodes and RF=3, Write-

Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

2011-03-11 Thread Jonathan Ellis
On Thu, Mar 10, 2011 at 6:06 AM, Jedd Rashbrooke wrote: >  My question is whether it's considered safer to upgrade via 0.6.12 >  to 0.7, or if a direct 0.6.6 -> 0.7 upgrade is safe enough? You don't need latest 0.6 before upgrading. >  Copying a cluster between AWS DC's: >  We have ~ 150-250GB p

Re: How long will all nodes data sync.

2011-03-11 Thread mcasandra
Yes I understand that piece but my thought is that if node is down and came up but at that point we want to know how long the sync will take in case there were another node to fail in the replica set. It also is good data point to see how long it takes to sync. It's always good to have this data h

Seed

2011-03-11 Thread mcasandra
I've read in some posts before and on the wiki that all the nodes in the cluster should have same seed list and 2 is the no. of seeds recommended. My question is it advisable to have node seed itself. Say for eg Node A, Node B and Node C in a cluster have a seed list of A and B. Now according to th

Re: problem with bootstrap

2011-03-11 Thread Aaron Morton
IMHO creating a keyspace with RF higher than the number of nodes sounds like a bug. It puts the cluster into a bad place. It may even be a regression, will take a look at the code. The assertion is interesting. Can you reproduce it with logging at debug and post the results? Could you try to re

Re: FW: Very slow batch insert using version 0.7.2

2011-03-11 Thread Erik Forkalsrud
On 03/11/2011 12:13 PM, Jonathan Ellis wrote: https://issues.apache.org/jira/browse/CASSANDRA-2158, fixed in 0.7.3 you could have saved a lot of time just by upgrading first. :) It looks like the fix isn't entirely correct. The bug is still in 0.7.3. In Memtable.java, the line: THRES

Write speed roughly 1/10 of expected.

2011-03-11 Thread Steven Liu
We are using the latest phpcassa (phpcassa-0.7.a.2.tar.gz ) and cassandra 0.7.3, we have inserted 12+ million documents into one column family with the following keyspace/columnfamily settings: Keyspace: dffl: Replication Stra

Re: Write speed roughly 1/10 of expected.

2011-03-11 Thread Peter Schuller
> It took 219 minutes to insert 12+ million docs which translates to about 913 > docs/second using batch_insert in batches of 1250 documents per batch. How big are the documents and/or how big is the resulting data when loaded? What is your data model - is each document a single column? Or a row

Re: Write speed roughly 1/10 of expected.

2011-03-11 Thread Tyler Hobbs
> > (I have no idea how fast phpcassa is.) > The current master branch (which has the benefit of THRIFT-638, while 0.7.a.3 does not) can insert about 3k individual rows a second against a local Cassandra instance. -- Tyler Hobbs Software Engineer

Re: FW: Very slow batch insert using version 0.7.2

2011-03-11 Thread Jonathan Ellis
Absolutely right! Thanks, fixed for 0.7.4. On Fri, Mar 11, 2011 at 4:14 PM, Erik Forkalsrud wrote: > On 03/11/2011 12:13 PM, Jonathan Ellis wrote: >> >> https://issues.apache.org/jira/browse/CASSANDRA-2158, fixed in 0.7.3 >> >> you could have saved a lot of time just by upgrading first. :) > > >

Re: Seed

2011-03-11 Thread Tyler Hobbs
On Fri, Mar 11, 2011 at 2:55 PM, mcasandra wrote: > My question is it advisable to have node seed itself. > Yes, every node should have the same seed list, including the seeds themselves. -- Tyler Hobbs Software Engineer, DataStax Maintainer of the pycassa

Cassandra still won't start - in-use ports block it

2011-03-11 Thread Bob Futrelle
My frustration continues, especially exasperating because so many people just seem to download Cassandra and run it with no problems. All my efforts have been stymied by one port-in-use problem after another. People on this list have helped and their suggestions got me a little bit further, but no

Re: Cassandra still won't start - in-use ports block it

2011-03-11 Thread Jeremy Hanna
I don't know if others have asked this but do you have a firewall running that would prevent access to those ports or something like that? On Mar 11, 2011, at 10:40 PM, Bob Futrelle wrote: > My frustration continues, especially exasperating because so many people just > seem to download Cassand