Re: Passing client as parameter

2010-06-10 Thread Ran Tavory
You can look at http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/CassandraClientFactory.java so, to close the client you can just get the transport out of the client (bold): private void closeClient(CassandraClient cclient) { log.debug("Closing clie

Re: Range search on keys not working?

2010-06-10 Thread David Boxenhorn
My experience is the same as Philip's. My point was simply that there is no way to get a range more restrictive than "all" if you use random partitioning. 2010/6/9 Philip Stanhope > If you are using random partitioner, and you want to do an EXPENSIVE row > scan ... I found that I could iterate u

Granularity SSTables.

2010-06-10 Thread xavier manach
Hi. I try to understand tricks that I can use with the SSTables, for faster manipulation of datas in clusters. I learn I how copy a keyspaces from data directories to a new node and change replicationfactor (thx Jonathan). If I understood, Each SSTable have 3 files : ColumnFamily-ID-Datas.db

single node capacity

2010-06-10 Thread hive13 Wong
Hi, How much data load can a single typical cassandra instance handle? It seems like we are getting into trouble when one of our node's load grows to bigger than 200g. Both read latency and write latency are increasing, varying from 10 to several thousand milliseconds. machine config is 16*cpu 32G

RE: single node capacity

2010-06-10 Thread Dr . Martin Grabmüller
Your problem is probably not the amount of data you store, but the number of SSTable files. When these increase, read latency goes up. Write latency maybe goes up because of compaction. Check in the data directory, whether there are many data files, and check via JMX whether compaction is happe

Re: single node capacity

2010-06-10 Thread hive13 Wong
You are right, our write traffic indeed is pretty tense as we are now at the stage of initializing data. Then we do need some more nodes here. Thanks very much Martin. On Thu, Jun 10, 2010 at 9:04 PM, Dr. Martin Grabmüller < martin.grabmuel...@eleven.de> wrote: > Your problem is probably not th

Best way of adding new nodes

2010-06-10 Thread hive13 Wong
Hi, guys The 2 ways of adding new nodes, when add with bootstrapping, since we've already got lots of data, often it will take many hours to complete the bootstrapping and probably affect the performance of existing nodes. But if we add without bootstrapping, the data load on the new node could be

keyrange for get_range_slices

2010-06-10 Thread Dop Sun
Hi, As documented in the http://wiki.apache.org/cassandra/API, the key range for get_range_slices are both inclusive. As discussed in this thread: http://groups.google.com/group/jassandra-user/browse_thread/thread/c2e56453c de067d3, there is a case that user want to discover all keys (huge

Re: keyrange for get_range_slices

2010-06-10 Thread Philip Stanhope
No ... and I personally don't have a problem with this if you think about what is actually going on under the covers. Note, however, that this is an expensive operation and as a result if there are parallel updates to the indexes while you are performing a full keyscan (rowscan) you will potent

Re: Quick help on Cassandra please: cluster access and performance

2010-06-10 Thread li wei
Thanks you very much, Per! - Original Message From: Per Olesen To: "user@cassandra.apache.org" Sent: Wed, June 9, 2010 4:02:52 PM Subject: Re: Quick help on Cassandra please: cluster access and performance On Jun 9, 2010, at 9:47 PM, li wei wrote: > Thanks a lot. > We are set READ

File Descriptor leak

2010-06-10 Thread Matt Conway
Hi All, I'm running a small 4-node cluster with minimal load using the 2010-06-08_12-31-16 build from trunk, and its exhausting file descriptors pretty quickly (65K in less than an hour). Here's a list of the files I see it leaking, I can do a more specific query if you'd like. Am I doing somet

Running Cassandra as a Windows Service

2010-06-10 Thread Kochheiser,Todd W - TO-DITT1
For various reasons I am required to deploy systems on Windows. As such, I went looking for information on running Cassandra as a Windows service. I've read some of the user threads regarding running Cassandra as a Windows service, such as this one: http://www.mail-archive.com/user@ca

Re: Running Cassandra as a Windows Service

2010-06-10 Thread Gary Dusbabek
IMO this is one of those things that would bitrot fairly quickly if it were not maintained. It may be useful in contrib, where curious parties could pick it up, get it back in shape, and send in the changes to be committed. Judging by the sparse interest so far, this probably wouldn't be a good f

Re: Running Cassandra as a Windows Service

2010-06-10 Thread Ben Standefer
"For various reasons I am required to deploy systems on Windows." I don't think it would be difficult to argue the business case for running Cassandra on Linux. It's still a young project and everybody in IRC and the mailing list is running it on Linux. You should really re-think whatever factor

read operation is slow

2010-06-10 Thread Caribbean410
Hello, I am testing the performance of cassandra. We write 200k records to database and each record is 1k size. Then we read these 200k records. It takes more than 400s to finish the read which is much slower than mysql (20s around). I read some discussion online and someone suggest to make multip

Re: Best way of adding new nodes

2010-06-10 Thread Jonathan Ellis
It's not just a matter of being balanced, if you add new nodes without bootstrapping the others will think it has data on it, that hasn't actually been moved there. On Thu, Jun 10, 2010 at 6:50 AM, hive13 Wong wrote: > Hi, guys > The 2 ways of adding new nodes, when add with bootstrapping, since

Re: cassandra out of heap space crash

2010-06-10 Thread Ran Tavory
I can't say exactly how much memory is the correct amount, but surely 1G is very little. By replicating 3 times your cluster now makes 3 times more work than it used to do, both on reads and on writes while the readers/writers continue hammering it the same pace. So once you've upped your memory (

Re: scans stopped returning values for some keys

2010-06-10 Thread Jonathan Ellis
How is your CF defined? (what comparator?) did you try start=empty byte array instead of Long.MAX_VALUE? On Wed, Jun 9, 2010 at 8:06 AM, Pawel Dabrowski wrote: > Hi, > > I'm using Cassandra to store some aggregated data in a structure like this: > > KEY - product_id > SUPER COLUMN NAME - timest

cassandra out of heap space crash

2010-06-10 Thread Julie
I am running an 8 node cassandra cluster with each node on its own dedicated VM. My app very quickly populates the database with about 100,000 rows of data (each row is about 100K bytes) times the number of nodes in my cluster so there's about 100,000 rows of data on each node (seems very evenly d

RE: Running Cassandra as a Windows Service

2010-06-10 Thread Kochheiser,Todd W - TO-DITT1
I agree that bitrot might be happen if all of the core Cassandra developers are using Linux. Your suggestion of putting things in a contrib area where curious (or desperate) parties suffering on the Windows platform could pick it up seems like a reasonable place to start. It might also be an op

RE: keyrange for get_range_slices

2010-06-10 Thread Dop Sun
Thanks for your quick and detailed explain on the key scan. This is really helpful! Dop From: Philip Stanhope [mailto:pstanh...@wimba.com] Sent: Thursday, June 10, 2010 10:40 PM To: user@cassandra.apache.org Subject: Re: keyrange for get_range_slices No ... and I personally don't have

Re: Range Slices timing question

2010-06-10 Thread Jonathan Ellis
get_range_slices is faster in 0.7 but there's not much you can do in 0.6. On Wed, Jun 9, 2010 at 11:04 AM, Carlos Sanchez wrote: > I have about a million rows (each row with 100 cols) of the form > domain/!date/!id  (e.g. gwm.com/!20100430/!CFRA4500) So I am interested in > getting all the ids

Re: Granularity SSTables.

2010-06-10 Thread Jonathan Ellis
Only if your clusters have the same number of nodes, with the same tokens. Trying to get too clever is not usually advisable. On Thu, Jun 10, 2010 at 3:54 AM, xavier manach wrote: > Hi. > >  I try to understand tricks that I can use with the SSTables, for > faster manipulation of datas in cluste

Re: File Descriptor leak

2010-06-10 Thread Jonathan Ellis
Fixed in https://issues.apache.org/jira/browse/CASSANDRA-1178 On Thu, Jun 10, 2010 at 9:01 AM, Matt Conway wrote: > Hi All, > I'm running a small 4-node cluster with minimal load using > the 2010-06-08_12-31-16 build from trunk, and its exhausting file > descriptors pretty quickly (65K in less th

RE: Range Slices timing question

2010-06-10 Thread Carlos Sanchez
Thx a lot -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, June 10, 2010 4:28 PM To: user@cassandra.apache.org Subject: Re: Range Slices timing question get_range_slices is faster in 0.7 but there's not much you can do in 0.6. On Wed, Jun 9, 2010 at 11:0

Cassandra Write Performance, CPU usage

2010-06-10 Thread Rishi Bhardwaj
Hi I am investigating Cassandra write performance and see very heavy CPU usage from Cassandra. I have a single node Cassandra instance running on a dual core (2.66 Ghz Intel ) Ubuntu 9.10 server. The writes to Cassandra are being generated from the same server using BatchMutate(). The client ma

Re: Cassandra Write Performance, CPU usage

2010-06-10 Thread vd
Hi Rishi The writes in Cassandra are not directly written to the Disk, they are stored in memory and later on flushed to the disk. May be thats why you are not getting much out of iostat. Cant say about high cpu usage. ___ Vineet Daniel _

Re: Cassandra Write Performance, CPU usage

2010-06-10 Thread Jonathan Shook
You are testing Cassandra in a way that it was not designed to be used. Bandwidth to disk is not a meaningful example for nearly anything except for filesystem benchmarking and things very nearly the same as filesystem benchmarking. Unless the usage patterns of your application match your test data

Re: Cassandra Write Performance, CPU usage

2010-06-10 Thread Rishi Bhardwaj
Hi Jonathan Thanks for such an informative reply. My application may end up doing such continuous bulk writes to Cassandra and thus I was interested in such a performance case. I was wondering as to what are all the CPU overheads for each row/column written to Cassandra? You mentioned updating

Re: Cassandra Write Performance, CPU usage

2010-06-10 Thread Jonathan Shook
Rishi, I am not yet knowledgeable enough to answer your question in more detail. I would like to know more about the specifics as well. There are counters you can use via JMX to show logical events, but this will not always translate to good baseline information that you can use in scaling estimat

Re: keyrange for get_range_slices

2010-06-10 Thread Shuai Yuan
Hi, Since you're iterating the whole set with several records a time, your code should know when it's first time. Why just simply if(!_first_time){ _iter++; //to ignore the first record? }else{ _first_time=false; } Kevin Yuan, Supertool Corp. www.yuan-shuai.info On 2010?06?10? 22:0