You can look at
http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/CassandraClientFactory.java
so, to close the client you can just get the transport out of the client
(bold):
private void closeClient(CassandraClient cclient) {
log.debug("Closing clie
My experience is the same as Philip's. My point was simply that there is no
way to get a range more restrictive than "all" if you use random
partitioning.
2010/6/9 Philip Stanhope
> If you are using random partitioner, and you want to do an EXPENSIVE row
> scan ... I found that I could iterate u
Hi.
I try to understand tricks that I can use with the SSTables, for
faster manipulation of datas in clusters.
I learn I how copy a keyspaces from data directories to a new node and
change replicationfactor (thx Jonathan).
If I understood, Each SSTable have 3 files :
ColumnFamily-ID-Datas.db
Hi,
How much data load can a single typical cassandra instance handle?
It seems like we are getting into trouble when one of our node's load grows
to bigger than 200g. Both read latency and write latency are increasing,
varying from 10 to several thousand milliseconds.
machine config is 16*cpu 32G
Your problem is probably not the amount of data you store, but the number of
SSTable files. When these increase, read latency goes up. Write latency maybe
goes up because of compaction. Check in the data directory, whether there are
many
data files, and check via JMX whether compaction is happe
You are right, our write traffic indeed is pretty tense as we are now at the
stage of initializing data.
Then we do need some more nodes here.
Thanks very much Martin.
On Thu, Jun 10, 2010 at 9:04 PM, Dr. Martin Grabmüller <
martin.grabmuel...@eleven.de> wrote:
> Your problem is probably not th
Hi, guys
The 2 ways of adding new nodes, when add with bootstrapping, since we've
already got lots of data, often it will take many hours to complete the
bootstrapping and probably affect the performance of existing nodes. But if
we add without bootstrapping, the data load on the new node could be
Hi,
As documented in the http://wiki.apache.org/cassandra/API, the key range for
get_range_slices are both inclusive.
As discussed in this thread:
http://groups.google.com/group/jassandra-user/browse_thread/thread/c2e56453c
de067d3, there is a case that user want to discover all keys (huge
No ... and I personally don't have a problem with this if you think about what
is actually going on under the covers.
Note, however, that this is an expensive operation and as a result if there are
parallel updates to the indexes while you are performing a full keyscan
(rowscan) you will potent
Thanks you very much, Per!
- Original Message
From: Per Olesen
To: "user@cassandra.apache.org"
Sent: Wed, June 9, 2010 4:02:52 PM
Subject: Re: Quick help on Cassandra please: cluster access and performance
On Jun 9, 2010, at 9:47 PM, li wei wrote:
> Thanks a lot.
> We are set READ
Hi All,
I'm running a small 4-node cluster with minimal load using
the 2010-06-08_12-31-16 build from trunk, and its exhausting file
descriptors pretty quickly (65K in less than an hour). Here's a list of the
files I see it leaking, I can do a more specific query if you'd like. Am I
doing somet
For various reasons I am required to deploy systems on Windows. As such, I
went looking for information on running Cassandra as a Windows service. I've
read some of the user threads regarding running Cassandra as a Windows service,
such as this one:
http://www.mail-archive.com/user@ca
IMO this is one of those things that would bitrot fairly quickly if it
were not maintained. It may be useful in contrib, where curious
parties could pick it up, get it back in shape, and send in the
changes to be committed.
Judging by the sparse interest so far, this probably wouldn't be a
good f
"For various reasons I am required to deploy systems on Windows."
I don't think it would be difficult to argue the business case for running
Cassandra on Linux. It's still a young project and everybody in IRC and the
mailing list is running it on Linux. You should really re-think whatever
factor
Hello,
I am testing the performance of cassandra. We write 200k records to
database and each record is 1k size. Then we read these 200k records.
It takes more than 400s to finish the read which is much slower than
mysql (20s around). I read some discussion online and someone suggest
to make multip
It's not just a matter of being balanced, if you add new nodes without
bootstrapping the others will think it has data on it, that hasn't
actually been moved there.
On Thu, Jun 10, 2010 at 6:50 AM, hive13 Wong wrote:
> Hi, guys
> The 2 ways of adding new nodes, when add with bootstrapping, since
I can't say exactly how much memory is the correct amount, but surely 1G is
very little.
By replicating 3 times your cluster now makes 3 times more work than it used
to do, both on reads and on writes while the readers/writers continue
hammering it the same pace.
So once you've upped your memory (
How is your CF defined? (what comparator?)
did you try start=empty byte array instead of Long.MAX_VALUE?
On Wed, Jun 9, 2010 at 8:06 AM, Pawel Dabrowski wrote:
> Hi,
>
> I'm using Cassandra to store some aggregated data in a structure like this:
>
> KEY - product_id
> SUPER COLUMN NAME - timest
I am running an 8 node cassandra cluster with each node on its own dedicated VM.
My app very quickly populates the database with about 100,000 rows of data
(each row is about 100K bytes) times the number of nodes in my cluster so
there's about 100,000 rows of data on each node (seems very evenly d
I agree that bitrot might be happen if all of the core Cassandra developers are
using Linux. Your suggestion of putting things in a contrib area where curious
(or desperate) parties suffering on the Windows platform could pick it up seems
like a reasonable place to start. It might also be an op
Thanks for your quick and detailed explain on the key scan. This is really
helpful!
Dop
From: Philip Stanhope [mailto:pstanh...@wimba.com]
Sent: Thursday, June 10, 2010 10:40 PM
To: user@cassandra.apache.org
Subject: Re: keyrange for get_range_slices
No ... and I personally don't have
get_range_slices is faster in 0.7 but there's not much you can do in 0.6.
On Wed, Jun 9, 2010 at 11:04 AM, Carlos Sanchez
wrote:
> I have about a million rows (each row with 100 cols) of the form
> domain/!date/!id (e.g. gwm.com/!20100430/!CFRA4500) So I am interested in
> getting all the ids
Only if your clusters have the same number of nodes, with the same tokens.
Trying to get too clever is not usually advisable.
On Thu, Jun 10, 2010 at 3:54 AM, xavier manach wrote:
> Hi.
>
> I try to understand tricks that I can use with the SSTables, for
> faster manipulation of datas in cluste
Fixed in https://issues.apache.org/jira/browse/CASSANDRA-1178
On Thu, Jun 10, 2010 at 9:01 AM, Matt Conway wrote:
> Hi All,
> I'm running a small 4-node cluster with minimal load using
> the 2010-06-08_12-31-16 build from trunk, and its exhausting file
> descriptors pretty quickly (65K in less th
Thx a lot
-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com]
Sent: Thursday, June 10, 2010 4:28 PM
To: user@cassandra.apache.org
Subject: Re: Range Slices timing question
get_range_slices is faster in 0.7 but there's not much you can do in 0.6.
On Wed, Jun 9, 2010 at 11:0
Hi
I am investigating Cassandra write performance and see very heavy CPU usage
from Cassandra. I have a single node Cassandra instance running on a dual core
(2.66 Ghz Intel ) Ubuntu 9.10 server. The writes to Cassandra are being
generated from the same server using BatchMutate(). The client ma
Hi Rishi
The writes in Cassandra are not directly written to the Disk, they are
stored in memory and later on flushed to the disk. May be thats why you are
not getting much out of iostat. Cant say about high cpu usage.
___
Vineet Daniel
_
You are testing Cassandra in a way that it was not designed to be used.
Bandwidth to disk is not a meaningful example for nearly anything
except for filesystem benchmarking and things very nearly the same as
filesystem benchmarking.
Unless the usage patterns of your application match your test data
Hi Jonathan
Thanks for such an informative reply. My application may end up doing such
continuous bulk writes to Cassandra and thus I was interested in such a
performance case. I was wondering as to what are all the CPU overheads for each
row/column written to Cassandra? You mentioned updating
Rishi,
I am not yet knowledgeable enough to answer your question in more
detail. I would like to know more about the specifics as well.
There are counters you can use via JMX to show logical events, but
this will not always translate to good baseline information that you
can use in scaling estimat
Hi,
Since you're iterating the whole set with several records a time, your
code should know when it's first time.
Why just simply
if(!_first_time){
_iter++; //to ignore the first record?
}else{
_first_time=false;
}
Kevin Yuan,
Supertool Corp.
www.yuan-shuai.info
On 2010?06?10? 22:0
31 matches
Mail list logo