Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-22 Thread aaron morton
For plain old log analysis the Cloudera Hadoop distribution may be a better match. Flume is designed to help with streaming data into HDFS, the LZo compression extensions would help with the data size and PIG would make the analysis easier (IMHO). http://www.cloudera.com/hadoop/ http://www.clou

Re: Reading a keyrange when using RP

2010-10-22 Thread Oleg Anastasyev
> > The goal is actually getting the rows in the range of "start","end"The order is not important at all.But what I can see is, this does not seem to be possible at all using RP. Am I wrong? Simpler solution is just compare MD5 of both keys and set start to one with lesser md5 and end to key with

Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-22 Thread Takayuki Tsunakawa
Hello, Aaron, Thank you for much info (especially pointers that seem interesting). > So you would not have 1,000 tasks sent to each of the 1,000 cassandra nodes. Yes, I meant one map task would be sent to each task tracker, resulting in 1,000 concurrent map tasks in the cluster. ColumnFamilyInpu

Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-22 Thread Aaron Morton
I may be wrong about which nodes the task is sent to. Others here know more about hadoop integration. Aaron On 22 Oct 2010, at 21:30, Takayuki Tsunakawa wrote: > Hello, Aaron, > > Thank you for much info (especially pointers that seem interesting). > > > So you would not have 1,000 t

NPE in cassandra0.7 (from trunk) while bootstrap

2010-10-22 Thread ruslan usifov
I try play with cassandra 0.7 (i build it from trunk) and its looks better then 0.6 brunch, but when i try to add new node with auto_bootstrap: true i got NPE (192.168.0.37 initial node with data on it, 192.168.0.220 bootstraped node): DEBUG 14:00:58,931 Checking to see if compaction of Schema wou

KeyRange over Long keys

2010-10-22 Thread Christian Decker
Ever since I started implementing my second level caches I've been wondering on how to deal with this, and thus far I've not found a good solution. I have a CF acting as a secondary index, and I want to make range queries against it. Since my keys are Long I simply went ahead and wrote them as the

Re: KeyRange over Long keys

2010-10-22 Thread Eric Czech
Prepend zeros to every number out to a fixed length determined by the maximum possible value. As an example, 0055 < 0100 in a lexical ordering where the maximum value is . On Fri, Oct 22, 2010 at 5:05 AM, Christian Decker < decker.christ...@gmail.com> wrote: > Ever since I started implementi

Re: Reading a keyrange when using RP

2010-10-22 Thread Jonathan Ellis
That gets you keys whose MD5s are between the MD5s of start and end, which is not the same as the keys between start and end. On Fri, Oct 22, 2010 at 2:07 AM, Oleg Anastasyev wrote: >> >> The goal is actually getting the rows in the range of "start","end"The order > is not important at all.But wh

Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-22 Thread Jonathan Ellis
On Fri, Oct 22, 2010 at 3:30 AM, Takayuki Tsunakawa wrote: > Yes, I meant one map task would be sent to each task tracker, resulting in > 1,000 concurrent map tasks in the cluster. ColumnFamilyInputFormat cannot > identify the nodes that actually hold some data, so the job tracker will > send the

Re: KeyRange over Long keys

2010-10-22 Thread Stu Hood
> Specifically I'm wondering if I could create a byte representation of the Long > that would also be lexicographically ordered. This is probably what you want to do, combined with the ByteOrderedPartitioner in 0.7 -Original Message- From: "Eric Czech" Sent: Friday, October 22, 2010 7:05

Benchmarking & Testing

2010-10-22 Thread David Replogle
I'm coming to the portion of the Cassandra installation where the customer is looking for benchmarking and testing for purposes of "keeping an eye" on the system to see if we need to add capacity or just to see how the system in general is doing. Basically, warm fuzzies that the system is still

Re: NPE in cassandra0.7 (from trunk) while bootstrap

2010-10-22 Thread Jonathan Ellis
This was a regression from the Thrift 0.5 upgrade. Should be fixed in r1026415 On Fri, Oct 22, 2010 at 5:11 AM, ruslan usifov wrote: > I try play with cassandra 0.7 (i build it from trunk) and its looks better > then 0.6 brunch, but when i try to add new node with auto_bootstrap: true i > got NP

Re: error: identifier ONE is unqualified!

2010-10-22 Thread J T
Thanks very much, that did the trick :) On Thu, Oct 21, 2010 at 9:28 PM, Aaron Morton wrote: > Look for lib/thrift-rX.jar in the source. is the svn revision to > use. > > http://wiki.apache.org/cassandra/InstallThrift > > Not sure if all those steps still apply, but it's what I did last

Re: Cassandra crashed - possible JMX threads leak

2010-10-22 Thread Bill Au
Not with the nodeprobe or nodetool command because the JVM these two commands spawn has a very short life span. I am using a webapp to monitor my cassandra cluster. It pretty much uses the same code as NodeCmd class. For each incoming request, it creates an NodeProbe object and use it to get get

How can build Bond graph?

2010-10-22 Thread ruslan usifov
Hello Does anybody have receipt how possible effectively hold Bond graph in cassandra. For example relations between users in social networks(friendship). Simplest that comes to mind is follow keyspace But this have a minus, if one user have many many friends, and all relations for this o

Re: How can build Bond graph?

2010-10-22 Thread Tyler Hobbs
Unless one user has several hundred million friends, this shouldn't be a problem. - Tyler On Fri, Oct 22, 2010 at 3:00 PM, ruslan usifov wrote: > Hello > > Does anybody have receipt how possible effectively hold Bond graph in > cassandra. For example relations between users in social > networks

Re: Cassandra crashed - possible JMX threads leak

2010-10-22 Thread Jonathan Ellis
Is the fix as simple as calling close() then? Can you submit a patch for that? On Fri, Oct 22, 2010 at 2:49 PM, Bill Au wrote: > Not with the nodeprobe or nodetool command because the JVM these two > commands spawn has a very short life span. > > I am using a webapp to monitor my cassandra clust

DC Cassandra training and Atlanta meetup

2010-10-22 Thread Jonathan Ellis
Riptano is bringing some Cassandra love to the East coast the first week of November. First, on the evening of Nov 3, we're sponsoring a meetup in Atlanta. This is held at the ApacheCon venue but you do _not_ have to be going to ApacheCon to come; it is free to attend! I will be there and several

Hung Repair

2010-10-22 Thread Dan Hendry
I am currently running a 4 node cluster on Cassandra beta 2. Yesterday, I ran into a number of problems and the one of my nodes went down for a few hours. I tried to run a nodetool repair and at least at a data level, everything seems to be consistent and alright. The problem is that the node is st

remove

2010-10-22 Thread Dave Wellman
remove

HintedHandoff and ReplicationFactor with a downed node

2010-10-22 Thread Craig Ching
Hi, I'm testing Cassandra to ensure it fits my needs.  One of the tests I want to perform is writing while a node is down.  Here's the scenario: Cassandra 0.6.6 2 nodes replication factor of 2 hinted handoff on I load node A with 50,000 rows while B is shutdown (BTW, I'm using CL.ONE during the

Re: HintedHandoff and ReplicationFactor with a downed node

2010-10-22 Thread Rob Coli
On 10/22/10 2:55 PM, Craig Ching wrote: Even better, I'd love a way to not allow B to be available until replication is complete, can I detect that somehow? Proposed and rejected a while back : https://issues.apache.org/jira/browse/CASSANDRA-768 =Rob

Re: HintedHandoff and ReplicationFactor with a downed node

2010-10-22 Thread Dan Washusen
The last time this came up on the list Jonathan Ellis said (something along the lines of) if your application can't tolerate stale data then you should read with a consistency level of QUORUM. It would be nice if there was some sort of middle ground for an application that can tolerate slightly st

Streaming got stuck for a long time

2010-10-22 Thread Henry Luo
When using nodetool move command, the streaming between nodes got stuck for a long period like the following: Streaming from: /10.100.10.66 Profile: /opt/choicestream/data/cassandra/data/Profile/U_Profiles-tmp-1137-Index.db 0/809960194 Profile: /opt/choicestream/data/cassandra/data/Profi

Re: Streaming got stuck for a long time

2010-10-22 Thread Jonathan Ellis
This is a known bug in early 0.6, fixed in 0.6.5 iirc. But at this point you should upgrade to 0.6.6. On Fri, Oct 22, 2010 at 8:52 PM, Henry Luo wrote: > When using nodetool move command, the streaming between nodes got stuck for > a long period like the following: > > > > Streaming from: /10.10

Remove

2010-10-22 Thread Gmail
Remove