SV: Help with getting Key range with some column limitations

2010-08-18 Thread Thorvaldsson Justus
You should iterate through them, get 200 then go get the next 200 and so on. Also if checking a bounding box to another.. perhaps try sorting them so you could start looking at both ends, perhaps make the iteration smaller until match somehow? Just my simple coins, also upgrading will probably be

Re: File write errors but cassandra isn't crashing

2010-08-18 Thread Ran Tavory
I opened as an improvement suggestion: https://issues.apache.org/jira/browse/CASSANDRA-1409 On Mon, Aug 16, 2010 at 8:26 PM, Benjamin Black wrote: > Useful config option, perhaps? > > On Mon, Aug 16, 2010 at 8:51 AM, Jonathan Ellis wrote: > > That's a tough call -- you can also come up with sce

Re: Thrift + PHP: help!

2010-08-18 Thread Gabriel Sosa
I would like to help with this too! On Wed, Aug 18, 2010 at 5:15 PM, Bas Kok wrote: > I have some experience in this area and would be happy to help out as well. > -- > Bas > > > On Wed, Aug 18, 2010 at 8:26 PM, Dave Gardner wrote: > >> I'm happy to assist. Having a robust PHP implementation wou

Re: Thrift + PHP: help!

2010-08-18 Thread Bas Kok
I have some experience in this area and would be happy to help out as well. -- Bas On Wed, Aug 18, 2010 at 8:26 PM, Dave Gardner wrote: > I'm happy to assist. Having a robust PHP implementation would help us > greatly. > > Dave > > On Wednesday, August 18, 2010, Jeremy Hanna > wrote: > > As Jon

Re: Videos of the cassandra summit starting to be posted

2010-08-18 Thread thelastpickle.com
Thanks from down in New Zealand. Aaron On 18 Aug 2010, at 05:45, Jeremy Hanna wrote: > The videos of the cassandra summit are starting to be posted, just fyi for > those who were unable to make it out to SF. > > http://www.riptano.com/blog/slides-and-videos-cassandra-summit-2010

Help with getting Key range with some column limitations

2010-08-18 Thread Jone Lura
Hi, We are trying to implement Cassandra to replace one of our biggest SQL tables, and so far we got it working. However, for testing I'm using Cassandra 0.6.2, Java and Pelops. (Pelops not that important for my question) and need suggestions on how to solve a problem retrieving a key range ba

Re: Thrift + PHP: help!

2010-08-18 Thread Dave Gardner
I'm happy to assist. Having a robust PHP implementation would help us greatly. Dave On Wednesday, August 18, 2010, Jeremy Hanna wrote: > As Jonathan mentioned in his keynote at the Cassandra Summit, the thrift + > php has some bugs and is maintainerless right now. > > Is there anyone out there

Re: curious space usages after recovering a failed node

2010-08-18 Thread Scott Dworkis
to update, i seem to be having luck with some combination of "cleanup" followed by triggering a garbage collection on jmx (all on each node). (using jxterm): echo -e 'open localhost:8080\nrun -b java.lang:type=Memory gc' | java -jar jmxterm-1.0-alpha-4-uber.jar -scott On Mon, 16 Aug 2010, Sc

Re: KeyRange.token in 0.7.0

2010-08-18 Thread Ran Tavory
On Wed, Aug 18, 2010 at 4:30 PM, Jonathan Ellis wrote: > (a) if you're using token queries and you're not hadoop, you're doing it > wrong > ah, didn't know that, so I guess I'll remove support for it from hector... > > (b) they are expected to be of the form generated by > TokenFactory.toString

Re: Cassandra and Pig

2010-08-18 Thread Stu Hood
Needing to manually copy the jars to all of the nodes would mean that you aren't applying the Pig 'register ;' command properly. -Original Message- From: "Christian Decker" Sent: Wednesday, August 18, 2010 7:08am To: user@cassandra.apache.org Subject: Re: Cassandra and Pig I got one ste

Thrift + PHP: help!

2010-08-18 Thread Jeremy Hanna
As Jonathan mentioned in his keynote at the Cassandra Summit, the thrift + php has some bugs and is maintainerless right now. Is there anyone out there in the Cassandra community that is adept at PHP that could help out with the thrift + php work? It would benefit all who use Cassandra with PH

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-18 Thread Peter Schuller
> I actually have the log files from all 8 nodes if it helps to diagnose what > activity was going on behind the scenes.  I really need to understand how this > happened. Without necessarily dumping all the information - approximately what do they contain? Do they contain anything about compaction

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-18 Thread Julie
Jonathan Ellis gmail.com> writes: > > If you read the stack traces you pasted, the node in question ran out > of diskspace. When you have < 25% space free this is not surprising. > > But fundamentally you are missing something important from your story > here. Disk space doesn't just increase

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-18 Thread Edward Capriolo
On Wed, Aug 18, 2010 at 10:51 AM, Jonathan Ellis wrote: > If you read the stack traces you pasted, the node in question ran out > of diskspace.  When you have < 25% space free this is not surprising. > > But fundamentally you are missing something important from your story > here.  Disk space does

Job Opportunity in Europe (Nosql, hadoop, crawling)

2010-08-18 Thread Thibaut Britz
Hi, We are searching at least 3 more developers in the fields of search & automatic content/site extraction, crawling, duplicate content, news/spam detection. We do content fetching and aggregation (news,message boards, blogs, ...) for market research institutes, media analytics companies, etc...

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-18 Thread Jonathan Ellis
If you read the stack traces you pasted, the node in question ran out of diskspace. When you have < 25% space free this is not surprising. But fundamentally you are missing something important from your story here. Disk space doesn't just increase spontaneously with "absolutely no activity." On

Re: data deleted came back after 9 days.

2010-08-18 Thread Jonathan Ellis
HH would handle it if it were a FD false positive, but if a node actually does go down then it can miss writes before HH kicks in. On Wed, Aug 18, 2010 at 9:30 AM, Raj N wrote: > Guys, >     Correct me if I am wrong. The whole problem is because a node missed an > update when it was down. Shouldn

Re: Cassandra disk space utilization WAY higher than I would expect

2010-08-18 Thread Julie
Rob Coli digg.com> writes: > As I understand Julie's case, she is : > a) initializing her cluster > b) inserting some number of unique keys with CL.ALL > c) noticing that more disk space (6x?) than is expected is used > d) but that she gets expected usage if she does a major compaction > In oth

RE: data deleted came back after 9 days.

2010-08-18 Thread Raj N
Guys, Correct me if I am wrong. The whole problem is because a node missed an update when it was down. Shouldn’t HintedHandoff take care of this case? Thanks -Raj -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, August 18, 2010 9:22 AM To: user@cassa

Re: Pig + Cassandra = Connection errors

2010-08-18 Thread Christian Decker
I have absolutely no idea what is causing the rejections, they appear to be totally random, on all 3 hosts of my cluster. I cleared all iptables states, and since they all sit on the same switch I don't think it has to do with the underlying network. Is there a connection limit on Cassandra nodes?

Re: data deleted came back after 9 days.

2010-08-18 Thread Jonathan Ellis
Actually, tombstones are read repaired too -- as long as they are not expired. But nodetool repair is much less error-prone than relying on RR and your memory of what deletes you issued. Either way, you'd need to increase GCGraceSeconds first to make the tombstones un-expired first. On Wed, Aug

Re: KeyRange.token in 0.7.0

2010-08-18 Thread Jonathan Ellis
(a) if you're using token queries and you're not hadoop, you're doing it wrong (b) they are expected to be of the form generated by TokenFactory.toString and fromString. You should not be generating them yourself. On Wed, Aug 18, 2010 at 7:56 AM, Ran Tavory wrote: > I'm a bit confused WRT KeyRan

Re: Pig + Cassandra = Connection errors

2010-08-18 Thread Jonathan Ellis
why are you getting connection refused? do you have a firewall problem? On Wed, Aug 18, 2010 at 7:17 AM, Christian Decker wrote: > Hi all, > I'm trying to get Pig scripts to work on data in Cassandra and right now I > want to simply run the example-script.pig on a different Keyspace/CF > contain

Re: Pig + Cassandra = Connection errors

2010-08-18 Thread Christian Decker
You mean the ? Right now it's 1 milliseconds. So that should take care of the timeouts, but what about the refused connections? On Wed, Aug 18, 2010 at 3:08 PM, Drew Dahlke wrote: > What's your cassandra timeout configured to? It's not uncommon to > raise that to 30sec if you're getting time

Re: uncomplete RangeSlices

2010-08-18 Thread Jonathan Ellis
this is fixed in the 0.6 branch for 0.6.5 On Wed, Aug 18, 2010 at 5:04 AM, Stefan Kaufmann wrote: > Sorry - forgot the version: 0.6.4 > > On Wed, Aug 18, 2010 at 11:52 AM, Stefan Kaufmann wrote: >> >> During my tests, I found some strange bootstrap + get Range Slice >> behavior. >> >> In my test

Re: data deleted came back after 9 days.

2010-08-18 Thread Jonathan Ellis
Best practice is to schedule repair more often than GCGraceSeconds, say weekly, rather than doing it manually when you notice the FD mark someone dead. On Tue, Aug 17, 2010 at 3:11 PM, Ned Wolpert wrote: > (gurus, please check my logic here... I'm trying to validate my > understanding of this sit

Re: data deleted came back after 9 days.

2010-08-18 Thread Jonathan Ellis
Corrected, thanks. (Better would be to edit the wiki yourself, of course. :) On Tue, Aug 17, 2010 at 2:58 PM, Jeremy Dunck wrote: > On Tue, Aug 17, 2010 at 2:49 PM, Jonathan Ellis wrote: >> It doesn't have to be disconnected more than GC grace seconds to cause >> what you are seeing, it just ha

Re: "MessageDeserializationTask.java (line 47) dropping message" errors

2010-08-18 Thread Jonathan Ellis
This is Cassandra trying to protect itself from the load spike. You probably need to balance your nodes, tune for performance, add capacity, or some combination. On Wed, Aug 18, 2010 at 6:16 AM, Jianing Hu wrote: > We have a 3-node cluster of Cassandra 0.6.4, on one of them there's a > ton of er

Re: Pig + Cassandra = Connection errors

2010-08-18 Thread Drew Dahlke
What's your cassandra timeout configured to? It's not uncommon to raise that to 30sec if you're getting timeouts. On Wed, Aug 18, 2010 at 8:17 AM, Christian Decker wrote: > Hi all, > I'm trying to get Pig scripts to work on data in Cassandra and right now I > want to simply run the example-script

Re: Map/Reduce over Cassandra

2010-08-18 Thread Drew Dahlke
Hey Bill, A few months ago we did an experiment with 5 hadoop nodes pulling from 4 cass nodes. It was pulling down 1 column family with 8 small columns & just dumping the raw data to hdfs. It was cycling through around 17K map tasks per sec. The machines weren't being taxed too hard, so I'm sure t

KeyRange.token in 0.7.0

2010-08-18 Thread Ran Tavory
I'm a bit confused WRT KeyRange's tokens in 0.7.0 When making a range query you can either use KeyRange.key or KeyRange.token. In 0.7.0 key was typed as byte[]. tokens remain strings. What does this string represent in case of a RP and in case of an OPP? Did this change in 0.7.0? AFAIK in 0.6.0 if

Re: [RELEASE] 0.7.0 beta1

2010-08-18 Thread Ran Tavory
[cross posting to u...@cass and hector-use...@googlegroups] Happy to announce hector's support in 0.7.0. Hector is a java client for cassandra which wraps the low level thrift interface with a nicer API, adds monitoring, connection pooling and more. I didn't do anything... The amazing 0.7.0 work w

Pig + Cassandra = Connection errors

2010-08-18 Thread Christian Decker
Hi all, I'm trying to get Pig scripts to work on data in Cassandra and right now I want to simply run the example-script.pig on a different Keyspace/CF containing ~6'000'000 entries. I got it running but then the job aborts after quite some time, and when I look at the logs I see hundreds of these:

Re: Cassandra and Pig

2010-08-18 Thread Christian Decker
I got one step further by cheating a bit, I just took all the Cassandra Jars and dropped them into the Hadoop lib folder, so at least now I can run some pig scripts over the data in Cassandra, but this is far from optimal since it means I'd have to distribute my UDFs also to the Hadoop cluster, or

"MessageDeserializationTask.java (line 47) dropping message" errors

2010-08-18 Thread Jianing Hu
We have a 3-node cluster of Cassandra 0.6.4, on one of them there's a ton of error message like the following one: WARN [MESSAGE-DESERIALIZER-POOL:1] 2010-08-18 04:10:57,767 MessageDeserializationTask.java (line 47) dropping message (1ms past timeout) The nod's load spikes and is super slow to rea

Re: uncomplete RangeSlices

2010-08-18 Thread Stefan Kaufmann
Sorry - forgot the version: 0.6.4 On Wed, Aug 18, 2010 at 11:52 AM, Stefan Kaufmann wrote: > During my tests, I found some strange bootstrap + get Range Slice behavior. > > In my testing environment im creating Datasets which look like those: > sampleDPUIDforTestingIssues1/000 > -> valu

uncomplete RangeSlices

2010-08-18 Thread Stefan Kaufmann
During my tests, I found some strange bootstrap + get Range Slice behavior. In my testing environment im creating Datasets which look like those: sampleDPUIDforTestingIssues1/000 -> value = 123456789 sampleDPUIDforTestingIssues1/099 -> value = 123456789 i create 1 Million of tho

Re: Cassandra gem

2010-08-18 Thread Benjamin Black
great, thanks! On Tue, Aug 17, 2010 at 11:30 PM, Mark wrote: >  On 8/17/10 5:44 PM, Benjamin Black wrote: >> >> Updated code is now in my master branch, with the reversion to 10.0.0. >>  Please let me know of further trouble. >> >> >> b >> >> On Tue, Aug 17, 2010 at 8:31 AM, Mark  wrote: >>> >>>

Re: Videos of the cassandra summit starting to be posted

2010-08-18 Thread Fernando Racca
Thanks a lot Riptano , this is much welcomed information ! Fernando Racca On 18 August 2010 06:05, samal gorai wrote: > thanks Riptano group for ur support in community education. > > > On Tue, Aug 17, 2010 at 11:15 PM, Jeremy Hanna > wrote: > >> The videos of the cassandra summit are starting