Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-26 Thread Patrik Modesto
On Wed, Jan 26, 2011 at 08:58, Mck wrote: >> You are correct that microseconds would be better but for the test it >> doesn't matter that much. > > Have you tried. I'm very new to cassandra as well, and always uncertain > as to what to expect... IMHO it's matter of use-case. In my use-case there

Re: client threads locked up - JIRA ISSUE 1594

2011-01-26 Thread Arijit Mukherjee
I'm using the jars packed in Hector 0.6.0-19 (the one compatible with Cassandra 0.6.*). I wanted to use hector, but for some reason I haven't been able to do so yet. What I'm doing is a POC kind of thing, and only if it works out properly, we'll go on to build on it. The reason I asked this questi

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-26 Thread Mck
On Wed, 2011-01-26 at 12:13 +0100, Patrik Modesto wrote: > BTW how to get current time in microseconds in Java? I'm using HFactory.clock() (from hector). > > As far as moving the clone(..) into ColumnFamilyRecordWriter.write(..) > > won't this hurt performance? > > The size of the queue is comp

Re: Files not deleted after compaction and GCed

2011-01-26 Thread Ching-Cheng Chen
It's a bug. In SSTableDeletingReference, it try this operation components.remove(Component.DATA); before STable.delete(desc, components); However, the components was reference to the components object which was created inside SSTable by this.components = Collections.unmodifiableSet(dataCompon

Re: the java client problem

2011-01-26 Thread Ashish
click on the loadSchema() button in right panel :) 2011/1/26 Raoyixuan (Shandy) > I had find the loasschemafrom yaml by jconsole,How to load the schema ? > > > > *From:* Ashish [mailto:paliwalash...@gmail.com] > *Sent:* Friday, January 21, 2011 8:10 PM > *To:* user@cassandra.apache.org > *Subj

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-26 Thread Jonathan Ellis
On Tue, Jan 25, 2011 at 12:09 PM, Mick Semb Wever wrote: > Well your key is a mutable Text object, so i can see some possibility > depending on how hadoop uses these objects. Yes, that's it exactly. We recently fixed a bug in the demo word_count program for this. Now we do ByteBuffer.wrap(Arrays

Node going down when streaming data, what next?

2011-01-26 Thread buddhasystem
I was moving a node and at some point it started streaming data to 2 other nodes. Later, that node keeled over and let's assume I can't fix it for the next 3 days and just want to move tokens on the remaining three to even out and see if I can live with it. But I can't do that! The node that was

Re: Files not deleted after compaction and GCed

2011-01-26 Thread Jonathan Ellis
Thanks for tracking that down! Created https://issues.apache.org/jira/browse/CASSANDRA-2059 to fix. On Wed, Jan 26, 2011 at 8:17 AM, Ching-Cheng Chen wrote: > It's a bug. > In SSTableDeletingReference, it try this operation > components.remove(Component.DATA); > before > STable.delete(desc, comp

Re: Files not deleted after compaction and GCed

2011-01-26 Thread Jonathan Ellis
Patch submitted. One thing I still don't understand is why RetryingScheduledThreadPoolExecutor isn't firing the DefaultUncaughtExceptionHandler, which should have logged that exception. On Wed, Jan 26, 2011 at 9:41 AM, Jonathan Ellis wrote: > Thanks for tracking that down!  Created > https://iss

Re: Files not deleted after compaction and GCed

2011-01-26 Thread Ching-Cheng Chen
I think this might be what happening. Since you are using ScheduledThreadPoolExecutor.schedule(), the exception was swallowed by the FutureTask. You will have to perform a get() method on the ScheduledFuture, and you will get ExecutionException if there was any exception occured in run(). Regard

Re: Stress test inconsistencies

2011-01-26 Thread Oleg Proudnikov
Hi All, I was able to run contrib/stress at a very impressive throughput. Single threaded client was able to pump 2,000 inserts per second with 0.4 ms latency. Multithreaded client was able to pump 7,000 inserts per second with 7ms latency. Thank you very much for your help! Oleg

Re: Stress test inconsistencies

2011-01-26 Thread Jonathan Shook
Would you share with us the changes you made, or problems you found? On Wed, Jan 26, 2011 at 10:41 AM, Oleg Proudnikov wrote: > Hi All, > > I was able to run contrib/stress at a very impressive throughput. Single > threaded client was able to pump 2,000 inserts per second with 0.4 ms latency. > M

RE: Repair on single CF not working (0.7)

2011-01-26 Thread Dan Hendry
After some bad experiences in the past using non-release versions, I am a little hesitant. Which nodes would the new code have to be deployed to in order to test? If it is just one of the three, I might be willing if I need to repair again. Dan From: Brandon Williams [mailto:dri...@gmail.co

Re: Stress test inconsistencies

2011-01-26 Thread Oleg Proudnikov
I returned to periodic commit log fsync. Jonathan Shook gmail.com> writes: > > Would you share with us the changes you made, or problems you found? >

Probelms with Set on Byte type New Installation

2011-01-26 Thread David Quattlebaum
I have set up a new installation of Cassandra, and have it running with no problems (0.7.0) Using CLI I added a new keyspace, and column family. When I set a value for a column I get "Value Inserted" However, when I get the column value it is a number, even though the Column Family is o

My new nemesis: EOFException (0.7.0)

2011-01-26 Thread Dan Hendry
I am having yet another issue on one of my Cassandra nodes. Last night, one of my nodes ran out of memory and crashed after flooding the logs with the same type of errors I am seeing below. After restarting, they are popping up again. My solution has been to drop the consistency from ALL to ONE for

RE: Probelms with Set on Byte type New Installation

2011-01-26 Thread Bill Speirs
I'm very (2 days) new to Cassandra, but what does the output look like? Total shot in the dark, if the number is less than 256 would it not look the same as bytes or a number? Hope that in some way helps... Bill- From: David Quattlebaum [mailto:dquat...@medprocure.com] Sent: Wednesday, January

RE: Probelms with Set on Byte type New Installation

2011-01-26 Thread David Quattlebaum
Nope, I should be getting back the String values that were inserted: [default@TestKeyspace] get custparent['David']; => (column=4164647265737331, value=333038204279205061737320313233, timestamp=1296071732281000) => (column=43697479, value=53656e656361, timestamp=129607174731) => (column=4e616d

Schema Design

2011-01-26 Thread Bill Speirs
I'm looking to use Cassandra to store log messages from various systems. A log message only has a message (UTF8Type) and a data/time. My thought is to create a column family for each system. The row key will be a TimeUUIDType. Each row will have 7 columns: year, month, day, hour, minute, second, an

Re: Repair on single CF not working (0.7)

2011-01-26 Thread Brandon Williams
On Wed, Jan 26, 2011 at 12:22 PM, Dan Hendry wrote: > After some bad experiences in the past using non-release versions, I am a > little hesitant. > The just apply the patch from https://issues.apache.org/jira/browse/CASSANDRA-1992 to 0.7.0 > Which nodes would the new code have to be deployed t

Re: Probelms with Set on Byte type New Installation

2011-01-26 Thread Bill Speirs
Why would you expect strings? You stated that your comparator is BytesType. If you set the default_validation_class then you can specify what types the values should be returned as: [default@Devel] create column family david with comparator=BytesType and default_validation_class=UTF8Type; 2dabf0fb

Re: Schema Design

2011-01-26 Thread David McNelis
I would say in that case you might want to try a single column family where the key to the column is the system name. Then, you could name your columns as the timestamp. Then when retrieving information from the data store you can can, in your slice request, specify your start column as X and

Re: Schema Design

2011-01-26 Thread buddhasystem
Having separate columns for Year, Month etc seems redundant. It's tons more efficient to keep say UTC time in POSIX format (basically integer). It's easy to convert back and forth. If you want to get a range of dates, in that case you might use Order Preserving Partitioner, and sort out which sys

Re: Node going down when streaming data, what next?

2011-01-26 Thread buddhasystem
Bump. I still don't know what is the best things to do, plz help. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964231.html Sent from the cassandra-u...@incubator.apache.org mailing list

Re: Schema Design

2011-01-26 Thread Bill Speirs
I like this approach, but I have 2 questions: 1) what is the implications of continually adding columns to a single row? I'm unsure how Cassandra is able to grow. I realize you can have a virtually infinite number of columns, but what are the implications of growing the number of columns over time

RE: Probelms with Set on Byte type New Installation

2011-01-26 Thread David Quattlebaum
Bill, You are absolutely correct, I must not have set the default_validation_class when I added the column family. Thanks what I get for continuing to work late into the night! Thanks, DQ < Less stupid next time -Original Message- From: Bill Speirs [mailto:bill.spe...@gmail.com]

Re: Schema Design

2011-01-26 Thread Bill Speirs
I have a basic understanding of OPP... if most of my messages come within a single hour then a few nodes could be storing all of my values, right? You totally lost me on, "whether to shard data as per system..." Is my schema (one column family per system, and row keys as TimeUUIDType) sharding by

Re: Probelms with Set on Byte type New Installation

2011-01-26 Thread Bill Speirs
No worries... it forced me to setup an env to test my understanding. I'm still trying to learn/understand. Bill- On Wed, Jan 26, 2011 at 4:23 PM, David Quattlebaum wrote: > Bill, > > You are absolutely correct, I must not have set the default_validation_class > when I added the column family. >

Re: Schema Design

2011-01-26 Thread David McNelis
My cli knowledge sucks so far, so I'll leave that to othersI'm doing most of my reading/writing through a thrift client (hector/java based) As for the implications, as of the latest version of Cassandra there is not theoretical limit to the number of columns that a particular row can hold. O

Re: Schema Design

2011-01-26 Thread Nick Santini
One thing you can do is create one CF, then as the row key use the application name + timestamp, with that you can do your range query using OOP. then store whatever you want in the row problem would be if one app generates far more logs than the others Nicolas Santini On Thu, Jan 27, 2011 at 1

Re: Schema Design

2011-01-26 Thread buddhasystem
I used the term "sharding" a bit frivolously. Sorry. It's just splitting semantically homogenious data among CFs doesn't scale too well, as each CF is allocated a piece of memory on the server. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Sche

RE: Schema Design

2011-01-26 Thread Shu Zhang
Each row can have a maximum of 2 billion columns, which a logging system will probably hit eventually. More importantly, you'll only have 1 row per set of system logs. Every row is stored on the same machine(s), which you means you'll definitely not be able to distribute your load very well. __

Re: Node going down when streaming data, what next?

2011-01-26 Thread Dan Hendry
When this has happened to me, restarting the node you are trying to move works. I can't remeber the exact conditions but I have also hade to restart all nodes in the cluster simultaneously once or twice as well. I would love to know if there is a better way of doing it. On Wednesday, January 26,

Re: Node going down when streaming data, what next?

2011-01-26 Thread buddhasystem
Hello, from what I know, you don't really have to restart "simultaneously", although of course you don't want to wait. I finally decided to use "removetoken" command to actually scratch out the sickly node from the cluster. I'll bootstrap is later when it's fixed. -- View this message in cont

Why does cassandra stream data when moving tokens?

2011-01-26 Thread buddhasystem
Sorry if this sounds silly, but I can't get my brain around this one: if all nodes contain replicas, why does the cluster stream data every time I more or remove a token? If the data is already there, what needs to be streamed? Thanks Maxim -- View this message in context: http://cassandra-use

Re: Schema Design

2011-01-26 Thread William R Speirs
It makes sense that the single row for a system (with a growing number of columns) will reside on a single machine. With that in mind, here is my updated schema: - A single column family for all the messages. The row keys will be the TimeUUID of the message with the following columns: date/tim

Re: Schema Design

2011-01-26 Thread buddhasystem
Bill, it's all explained here: http://wiki.apache.org/cassandra/MemtableThresholds#JVM_Heap_Size,the Watch the number of CFs and the memtable sizes. In my experience, this all matters. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Schema-Des

RE: Why does cassandra stream data when moving tokens?

2011-01-26 Thread Sadasivam, Srinivasan
Not all nodes have replicas. First of all, the number of replicas is determined by replication_factor which you set when creating the keyspace - and where they go is determined by replica_placement_strategy. Say, if you pick 3, the first copy of your item are placed based on token value of the n

Re: Schema Design

2011-01-26 Thread William R Speirs
Ah, sweet... thanks for the link! Bill- On 01/26/2011 08:20 PM, buddhasystem wrote: Bill, it's all explained here: http://wiki.apache.org/cassandra/MemtableThresholds#JVM_Heap_Size,the Watch the number of CFs and the memtable sizes. In my experience, this all matters.

RE: Why does cassandra stream data when moving tokens?

2011-01-26 Thread buddhasystem
Thanks, I'll look at the configuration again. In the meantime, I can't "move" the first node in the ring (after I removed the previous node's token) -- it throws an exception and says data is being streamed to it -- however, this is not what netstats says! Weirdness continues... Maxim -- View

[no subject]

2011-01-26 Thread Geoffry Roberts
-- Geoffry Roberts

RE: Why does cassandra stream data when moving tokens?

2011-01-26 Thread buddhasystem
Removetoken command just never returns. There is nothing streaming in the cluster. Anyone knows what might be happening? nodetool ring returns different results on two nodes compared to the third one (which is the first in the ring). Weirdness started when I did move 0 on the no-defunct node whi

Re: Schema Design

2011-01-26 Thread Wangpei (Peter)
I am also working on a system store logs from hundreds system. In my scenario, most query will like this: "let's look at login logs (category EQ) of that proxy (host EQ) between this Monday and Wednesday(time range)." My data model like this: . only 1 CF. that's enough for this scenario. . group l

Re: the java client problem

2011-01-26 Thread Ashish
I have no clue about this error.. look into log files. They might reveal something Anyone else can help here.? 2011/1/27 Raoyixuan (Shandy) > It shows error, I put it in the attachment > > > > *From:* Ashish [mailto:paliwalash...@gmail.com] > *Sent:* Wednesday, January 26, 2011 10:31 PM > > *T

repair cause large number of SSTABLEs

2011-01-26 Thread B. Todd Burruss
i ran out of file handles on the "repairing node" after doing nodetool repair - strange as i have never had this issue until using 0.7.0 (but i should say that i have not truly tested 0.7.0 until now.) up'ed the number of file handles, removed data, restarted nodes, then restarted my test. wa

RE: the java client problem

2011-01-26 Thread Raoyixuan (Shandy)
I had solved this problem. I created the column family firstly, then it’s ok. From: Ashish [mailto:paliwalash...@gmail.com] Sent: Thursday, January 27, 2011 1:16 PM To: user@cassandra.apache.org Subject: Re: the java client problem I have no clue about this error.. look into log files. They migh

Generating tokens for Cassandra cluster with ByteOrderedPartitioner

2011-01-26 Thread Matthew Tovbin
Hey, Can anyone suggest me how to manually generate tokens for Cassandra 0.7.0 cluster, while ByteOrderedPartitioner is being used? Thanks in advance. -- Best regards, Matthew Tovbin.

Using Cassandra for storing large objects

2011-01-26 Thread Narendra Sharma
Anyone using Cassandra for storing large number (millions) of large (mostly immutable) objects (200KB-5MB size each)? I would like to understand the experience in general considering that Cassandra is not considered a good fit for large objects. https://issues.apache.org/jira/browse/CASSANDRA-265