I think it would be a good idea to add a bit more explanation
storage-conf.xml/wiki regarding the replication factor. It caused some
confusion until we dug around the mail archiveto realize that our
UnavailableExceptions were caused by our incorrect assumption and that RF=1
does NOT mean tha
Hi all,
Can someone post an example of how to define keyspaces in Cassandra 0.7?
My initial Cassandra node does not load the keyspaces defined at
Cassandra.yaml. Is there a way to define the keyspaces at startup or is
runtime defining an absolute must?
thanks,
BoriS
Defining at runtime is, very intentionally, an absolute must. It
would have been very simple and perhaps user-friendly to add a flag
that loads the schema specified in yaml when cassandra starts up. I
decided against it when implementing the feature because I figured it
would have been a disservi
I just realized that I didn't answer the "how" part of your question. :)
http://svn.apache.org/repos/asf/cassandra/trunk/contrib/py_stress/stress.py
and http://svn.apache.org/repos/asf/cassandra/trunk/test/system/__init__.py
both contain examples of how to use the system_* methods to manipulate
k
Certainly I'm using multiple cloud servers for the multiple client tests.
Whether or not they are resident on the same physical machine, I just don't
know.
-- Oren
On Jul 18, 2010, at 11:35 PM, Brandon Williams wrote:
On Sun, Jul 18, 2010 at 8:45 PM, Oren Benjamin
mailto:o...@clearspring.
Yes, as the size of the data on disk increases and the OS cannot avoid disk
seeks the read performance degrades. You can see this in the results from the
original post where the number of keys in the test goes from 10M to 100M the
reads drop from 4,600/s to 200/s. 10M keys in the stress.py tes
Added: http://wiki.apache.org/cassandra/StorageConfiguration
On Mon, Jul 19, 2010 at 2:55 AM, Dimitry Lvovsky wrote:
> I think it would be a good idea to add a bit more explanation
> storage-conf.xml/wiki regarding the replication factor. It caused some
> confusion until we dug around the mail
Hey Oren,
The Cloud Servers REST API returns a "hostId" for each server that indicates
which physical host you are on: I'm not sure if you can see it from the control
panel, but a quick curl session should get you the answer.
Thanks,
Stu
-Original Message-
From: "Oren Benjamin"
Sent:
Thanks ;-).
On Mon, Jul 19, 2010 at 5:55 PM, Dave Viner wrote:
> Added: http://wiki.apache.org/cassandra/StorageConfiguration
>
>
> On Mon, Jul 19, 2010 at 2:55 AM, Dimitry Lvovsky wrote:
>
>> I think it would be a good idea to add a bit more explanation
>> storage-conf.xml/wiki regarding the r
Hello all, I'm Oren's partner in crime on all this. I've got a few more numbers
to add.
In an effort to eliminate everything but the scaling issue, I set up a cluster
on dedicated hardware (non-virtualized; 8-core, 16G RAM).
No data was loaded into Cassandra -- 100% of requests were misses. Th
How many physical client machines are running stress.py?
-Original Message-
From: "David Schoonover"
Sent: Monday, July 19, 2010 12:11pm
To: user@cassandra.apache.org
Subject: Re: Cassandra benchmarking on Rackspace Cloud
Hello all, I'm Oren's partner in crime on all this. I've got a few
> I ran this test previously on the cloud, with similar results:
>
> nodes reads/sec
> 1 24,000
> 2 21,000
> 3 21,000
> 4 21,000
> 5 21,000
> 6 21,000
>
> In fact, I ran it twice out of disbelief (on different nodes the second time)
> to essentially identical
This may be too much work... but you might consider building an Amazon EC2
AMI of your nodes. This would let others quickly boot up your nodes and run
the stress test against it.
I know you mentioned that you're using Rackspace Cloud. I'm not super
familiar with the internals of RSCloud, but per
Another thing: Is the py_stress traffic definitely non-determinstic
such that each client will generate a definitely unique series of
requests? If all clients are deterministically requesting the same
sequence of keys, it would otherwise be plausible that they end up in
effective lock-step, if the
> How many physical client machines are running stress.py?
One with 50 threads; it is remote from the cluster but within the same
DC in both cases. I also run the test with multiple clients and saw
similar results when summing the reqs/sec.
On Mon, Jul 19, 2010 at 1:22 PM, Stu Hood wrote:
> How
> Another thing: Is the py_stress traffic definitely non-determinstic
> such that each client will generate a definitely unique series of
> requests?
The tests were run both with --random and --std 0.1; in both cases, the
key-sequence is non-deterministic.
Cheers,
Dave
On Jul 19, 2010, at 1
This is absolutely your bottleneck, as Brandon mentioned before. Your client
machine is maxing out at 37K requests per second.
-Original Message-
From: "David Schoonover"
Sent: Monday, July 19, 2010 12:30pm
To: user@cassandra.apache.org
Subject: Re: Cassandra benchmarking on Rackspace Cl
On Mon, Jul 19, 2010 at 12:30 PM, David Schoonover
wrote:
>> How many physical client machines are running stress.py?
>
> One with 50 threads; it is remote from the cluster but within the same
> DC in both cases. I also run the test with multiple clients and saw
> similar results when summing the
>> One with 50 threads; it is remote from the cluster but within the same
>> DC in both cases. I also run the test with multiple clients and saw
>> similar results when summing the reqs/sec.
>
> Multiple client processes, or multiple client machines?
In particular, note that the way CPython works,
stress.py uses multiprocessing if it is present, circumventing the GIL; we ran
the tests with python 2.6.5.
David Schoonover
On Jul 19, 2010, at 1:51 PM, Peter Schuller wrote:
>>> One with 50 threads; it is remote from the cluster but within the same
>>> DC in both cases. I also run the test wi
> Multiple client processes, or multiple client machines?
I ran it with both one and two client machines making requests, and ensured the
sum of the request threads across the clients was 50. That was on the cloud. I
am re-running the multi-host test against the 4-node cluster on dedicated
har
If you put 25 processes on each of the 2 machines, all you are testing is how
fast 50 processes can hit Cassandra... the point of using more machines is that
you can use more processes.
Presumably, for a single machine, there is some limit (K) to the number of
processes that will give you addit
I'm reading what this thread and I am a little lost, what should the
expected behavioral be?
Should it maintain 53K regardless of nodes?
nodes reads/sec
1 53,000
2 37,000
4 37,000
I ran this test previously on the cloud, with similar results:
nodes reads/sec
1 24,000
On Mon, Jul 19, 2010 at 11:02 AM, David Schoonover
wrote:
>> Multiple client processes, or multiple client machines?
>
>
> I ran it with both one and two client machines making requests, and ensured
> the sum of the request threads across the clients was 50. That was on the
> cloud. I am re-runn
Hi Torsten,
When i run bmt_example, M/R job gets executed, cassandra server gets the
data but it goes as HintedHandoff to 127.0.0.2 and it is trying to send data
to 127.0.0.2 as if 127.0.0.2 is an actual node. When the job was done,
close() stop the StorageService instance. Any idea, why does Sto
Usually a fixed bottleneck results from a limited resource -- you've
eliminated disk from the test and you don't mention that CPU is a serious
issue, or memory for that matter.
So for me that leaves network i/o and switch capacity. Is it possible that
your test is saturating your local network ca
> When i run bmt_example, M/R job gets executed, cassandra server gets the
> data but it goes as HintedHandoff to 127.0.0.2 and it is trying to send data
> to 127.0.0.2 as if 127.0.0.2 is an actual node.
Well, it kind of becomes an actual node.
> Any idea, why does StorageService
> returns 127.0
Hi,
Being fairly new to Cassandra I have a question on the eventual
consistency. I'm currently performing experiments with a single-node
Cassandra system and a single client. In some of my tests I perform an
update to an existing subcolumn in a row and subsequently read it back
from the same
> stress.py uses multiprocessing if it is present, circumventing the GIL; we
> ran the tests with python 2.6.5.
Ah, sorry about that. I was mis-remembering because I had to use
threading with pystress because multiprocessing was broken/unavailabie
(can't remember which) on FreeBSD.
I agree with
if your test case is correct then it sounds like a bug to me. With one node,
unless you're writing with CL=0 you should get full consistency.
On Mon, Jul 19, 2010 at 10:14 PM, Hugo wrote:
> Hi,
>
> Being fairly new to Cassandra I have a question on the eventual
> consistency. I'm currently perfo
I'm using CL=QUORUM (=Hector default) for both reads and writes. Most of
the times, the test passes, but sometimes it fails because I get back
the old value. Since the test is single-threaded, I guess it is a bug.
I'll try to reduce the test to something smaller that can be used for
troubleshoo
> I'm using CL=QUORUM (=Hector default) for both reads and writes. Most of the
> times, the test passes, but sometimes it fails because I get back the old
> value. Since the test is single-threaded, I guess it is a bug. I'll try to
> reduce the test to something smaller that can be used for trouble
On Mon, Jul 19, 2010 at 10:43 PM, Peter Schuller <
peter.schul...@infidyne.com> wrote:
> > I'm using CL=QUORUM (=Hector default) for both reads and writes. Most of
> the
> > times, the test passes, but sometimes it fails because I get back the old
> > value. Since the test is single-threaded, I gu
Sorry, mixed signals in my response. I was partially replying to suggestions
that we were limited by the box's NIC or DC's bandwidth (which is gigabit, no
dice there). I also ran the tests with -t50 on multiple tester machines in the
cloud with no change in performance; I've now rerun those test
I'll just add that CPU usage hovered around 50% during these tests.
On Jul 19, 2010, at 3:51 PM, David Schoonover wrote:
> Sorry, mixed signals in my response. I was partially replying to suggestions
> that we were limited by the box's NIC or DC's bandwidth (which is gigabit, no
> dice there).
The following is completely irrelevant if you are indeed using the
default storage-conf.xml as you said. However since I wrote it and it
remains relevant for anyone testing with the order preserving
partitioner, I might aswell post it rather than discard it...
Begin probably irrelevant post:
Anot
See my test case attached below. In my setup it usually fails around the
800th try...
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
import me.prettyprint.cassandra.service.CassandraClient;
imp
I'm about to extend my two node cluster with four dedicated nodes and
removing one of the old nodes, leaving a five node cluster. The
cluster is in production, but I can spare it to do some stress testing
in the meantime as I'm also interested about my cluster performance. I
can't dedicate the clus
Thanks a ton, Juho.
The command was:
./stress.py -o read -t 50 -d $NODELIST -n 7500 -k -i 2
I made a few minor modifications to stress.py to count errors instead of
logging them, and avoid the pointless try-catch on missing keys. (There are
also unrelated edits to restart long run
> Did you see about equal CPU usage on the cassandra nodes during the
> test? Is it possible that most or all of the keys generated by
> stress.py simply fall on a single node?
CPU was approximately equal across the cluster; it was around 50%.
stress.py generates keys randomly or using a gaussian
Now keep adding clients until it stops making the numbers go up...
On Mon, Jul 19, 2010 at 2:51 PM, David Schoonover
wrote:
> Sorry, mixed signals in my response. I was partially replying to suggestions
> that we were limited by the box's NIC or DC's bandwidth (which is gigabit, no
> dice there
> CPU was approximately equal across the cluster; it was around 50%.
>
> stress.py generates keys randomly or using a gaussian distribution, both
> methods showed the same results.
>
> Finally, we're using a random partitioner, so Cassandra will hash the keys
> using md5 to map it to a position o
> Now keep adding clients until it stops making the numbers go up...
Neither adding additional readers nor additional cluster nodes showed
performance gains. The numbers, they do not move.
--
David Schoonover
On Jul 19, 2010, at 5:18 PM, Jonathan Ellis wrote:
> Now keep adding clients until i
I've put up a bunch of steps to get Cassandra installed on an EC2 instance:
http://wiki.apache.org/cassandra/CloudConfig
Look at the "step-by-step guide".
I haven't AMI-ed the result, since the steps are fairly quick and it would
be just one more thing to update with a new release of Cassandra...
When the test fails what value does the verify array have ? Is it null
or a previous value?AaronOn 20 Jul, 2010,at 08:22 AM, Hugo wrote:
See my test case attached below. In my setup it usually fails around
the 800th try...
import java.util.ArrayList;
import java.util.Arrays;
import java.u
What gets logged on the old nodes at debug, when you try to add a
single new machine after a full cluster restart?
Removing Location would blow away the nodes' token information... It
should be safe if you set the InitialToken to what it used to be on
each machine before bringing it up after nuki
Keep it simple. Something like "Cassandra is a row-oriented, fully
distributed database designed for scalability, availability, and
durability."
Trying to explain the data model in two sentences is not going to
work, and "4 or 5 dimension associated arrays" is the wrong tree to
bark up entirely.
cassandra> get system.LocationInfo['L']
Exception Internal error processing get_slice
What's wrong?
Thanks.
Shen
Hi all,
I am new to Cassandra...
I want to use to cassandra for a billing system.
As I saw in many places that Joins won't work in BigTable implementation but
i feel i needed it for my App.
I am unable to get the data from multiple tables (columnFamilies) like
products and inventory
As I am tr
Hi, Stuart,
If I may paraphrase what Jonathan said, typically your batch_mutate
operation is idempotent.
That is, you can replay / retry the same operation within a short timeframe
without any undesirable side effect.
The assumption behind the "short timeframe" here refers to: there is no
other c
Cassandra may not be the best fit for a billing system. I'm guessing the lack of transactions would be a problem if you want to update inventory levels.If you want to get data from multiple column families you will need to make multiple calls, or de-normalise the data so you can get all the data yo
It's the previous value. I've checked.
Groets, Hugo.
On 20 jul 2010, at 00:19, Aaron Morton wrote:
When the test fails what value does the verify array have ? Is it
null or a previous value?
Aaron
On 20 Jul, 2010,at 08:22 AM, Hugo wrote:
See my test case attached below. In my setup it u
In my cluster, I have set both KeysCached and RowsCached of my column family on
all nodes to "0",
but it still happened that a few nodes crashed because of OutOfMemory
(from the gc.log, a full gc wasn't able to free up any memory space),
what else can be consuming the heap?
heap size is 10G an
Supercolumn/column must fit into node memory
It could be?
/Justus
Från: 王一锋 [mailto:wangyif...@aspire-tech.com]
Skickat: den 20 juli 2010 08:48
Till: user
Ämne: What is consuming the heap?
In my cluster, I have set both KeysCached and RowsCached of my column family on
all nodes to "0",
but it sti
Hi Jonathan,
I fear 'row-oriented' could fuel the holy war between 'row-based RDBMS' and
'column-oriented NoSQL databases'
Some related reads here -
-http://dbmsmusings.blogspot.com/2010/03/distinguishing-two-major-types-of_29.html
-http://en.wikipedia.org/wiki/Column-oriented_DBMS
-http://en.wik
No, I don't think so. Because I'm not using supercolumn and size of a column
will not exceed 1M
2010-07-20
发件人: Thorvaldsson Justus
发送时间: 2010-07-20 14:52:22
收件人: 'user@cassandra.apache.org'
抄送:
主题: SV: What is consuming the heap?
Supercolumn/column must fit into node memory
It c
56 matches
Mail list logo