Hello,
I am pretty new to Cassandra and I have some questions, they may seem
trivial, but still I am pretty new to the subject. First is about the lack
of a compareAndSet() operation, as I understood it is not supported
currently in Cassandra, do you know of use cases which really require such
o
Hi all.
I'm benchmarking several nosql datastores and I'm going nuts with Cassandra.
The version of Cassandra we are using is 0.4.1 I know 0.4.1 is a bit outdated
but my implementation is done with that version.
The thing is that every time the test runs, I need to reset the data inside the
da
I'm not an expert, so take what I say with a grain of salt.
2010/4/21 Даниел Симеонов :
> Hello,
> I am pretty new to Cassandra and I have some questions, they may seem
> trivial, but still I am pretty new to the subject. First is about the lack
> of a compareAndSet() operation, as I understood
Hi Paul,
about the last answer I still need some more clarifications, as I
understand it if QUORUM is used, then reads doesn't get old values either?
Or am I wrong?
Thank you very much!
Best regards, Daniel.
2010/4/21 Paul Prescod
> I'm not an expert, so take what I say with a grain of salt.
Hello,
For my first message I will first thanks Cassandra contributors for their
great works.
I have a parameter issue with Cassandra (I hope it's just a parameter
issue). I'm using Cassandra 6.0.1 with Hector client on my desktop. It's a
simple dual core with 4GB of RAM on WinXP. I have keep the
Trying increasing Xmx. 1G is probably not enough for the amount of inserts
you are doing.
On Wed, Apr 21, 2010 at 8:10 AM, Nicolas Labrot wrote:
> Hello,
>
> For my first message I will first thanks Cassandra contributors for their
> great works.
>
> I have a parameter issue with Cassandra (I ho
I have try 1400M, and Cassandra OOM too.
Is there another solution ? My data isn't very big.
It seems that is the merge of the db
On Wed, Apr 21, 2010 at 2:14 PM, Mark Greene wrote:
> Trying increasing Xmx. 1G is probably not enough for the amount of inserts
> you are doing.
>
>
> On Wed, Apr
Stop the program, wipe the data dir and commit logs, start the program, it's
what I'm doing.
I even made a script that will do it so it's just a one line command.
From: ROGER PUIG GANZA [mailto:rp...@tid.es]
Sent: Wednesday, April 21, 2010 5:20 AM
To: cassandra-u...@incubator.apache.org
Subject:
On my 4GB machine I'm giving it 3GB and having no trouble with 60+ million 500
byte columns
From: Nicolas Labrot [mailto:nith...@gmail.com]
Sent: Wednesday, April 21, 2010 7:47 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra tuning for running test on a desktop
I have try 1400M, and Cass
So does it means the RAM needed is proportionnal with the data handled ?
Or Cassandra need a minimum amount or RAM when dataset is big?
I must confess this OOM behaviour is strange.
On Wed, Apr 21, 2010 at 2:54 PM, Mark Jones wrote:
> On my 4GB machine I’m giving it 3GB and having no trouble
first, upgrade to 0.6.1.
second, the easiest way to wipe everything is at the fs level like Mark said.
On Wed, Apr 21, 2010 at 5:20 AM, ROGER PUIG GANZA wrote:
> Hi all.
>
> I’m benchmarking several nosql datastores and I’m going nuts with
> Cassandra.
>
> The version of Cassandra we are using
Hello,
I am testing how cassandra behaves on single node disk failures to know what to
expect when things go bad.
I had a cluster of 4 cassandra nodes, stress loaded it with client and made 2
tests:
1. emulated disk failure of /data volume on read only stress test
2. emulated disk failure of /comm
On Tue, 2010-04-20 at 17:28 -0700, Joseph Boyle wrote:
> We will have people from the Cassandra (including Stu Hood and Matt
> Pfeil) and other NoSQL communities as well as with broader Big Data
> interests, all available for discussion, and you can propose a session
> to learn about anything.
Ga
RAM doesn't necessarily need to be proportional but I would say the number
of nodes does. You can't just throw a bazillion inserts at one node. This is
the main benefit of Cassandra is that if you start hitting your capacity,
you add more machines and distribute the keys across more machines.
On W
Hit send to early
That being said a lot of people running Cassandra in production are using
4-6GB max heaps on 8GB machines, don't know if that helps but hopefully
gives you some perspective.
On Wed, Apr 21, 2010 at 10:39 AM, Mark Greene wrote:
> RAM doesn't necessarily need to be proportio
Thanks Mark.
Cassandra is maybe too much for my need ;)
On Wed, Apr 21, 2010 at 4:45 PM, Mark Greene wrote:
> Hit send to early
>
> That being said a lot of people running Cassandra in production are using
> 4-6GB max heaps on 8GB machines, don't know if that helps but hopefully
> gives yo
Maybe, maybe not. Presumably if you are running a RDMS with any reasonable
amount of traffic now a days, it's sitting on a machine with 4-8G of memory
at least.
On Wed, Apr 21, 2010 at 10:48 AM, Nicolas Labrot wrote:
> Thanks Mark.
>
> Cassandra is maybe too much for my need ;)
>
>
>
> On Wed, A
I'm seeing a cluster of 4 (replication factor=2) to be about as slow overall as
the barely faster than the slowest node in the group. When I run the 4 nodes
individually, I see:
For inserts:
Two nodes @ 12000/second
1 node @ 9000/second
1 node @ 7000/second
For reads:
Abysmal, less than 1000/s
Hi Mark,
I'm a relative newcomer to Cassandra, but I believe the common
experience is that you start seeing gains after 5 nodes in a
column-oriented data store. It may also depend on your usage pattern.
Others may know better - hope this helps!
-- Jim R. Wilson (jimbojw)
On Wed, Apr 21, 2010 a
Hi,
I'm still curious if I got the data movement right in this email from
before? Anyone? Also, anyone know if I can scp the data directory from
a node I want to replace to a new machine? The cassandra streaming seems
much slower than scp.
-Anthony
On Mon, Apr 19, 2010 at 04:48:23PM -0700,
Yes, that looks right, where "token really close" means "slightly less
than" (more than would move it into a different node's range).
You can't really migrate via scp since only one node with a given
token can exist in the cluster at a time.
-Jonathan
On Wed, Apr 21, 2010 at 11:02 AM, Anthony Mo
Some people might be able to answer this better than me. However: with
quorum consistency you have to communicate with n/2 + 1 where n is the
replication factor nodes. So unless you are disk bound your real expense
is going to be all those extra network latencies. I'd expect that you'll
see a r
On Wed, Apr 21, 2010 at 11:08:19AM -0500, Jonathan Ellis wrote:
> Yes, that looks right, where "token really close" means "slightly less
> than" (more than would move it into a different node's range).
Is it better to go slightly less than (say Token - 1), or slightly more than
the beginning of t
Right it's a similar concept to DB sharding where you spread the write load
around to different DB servers but won't necessarily increase the throughput
of an one DB server but rather collectively.
On Wed, Apr 21, 2010 at 12:16 PM, Mike Gallamore <
mike.e.gallam...@googlemail.com> wrote:
> Some
I donnot have a website ;)
I'm testing the viability of Cassandra to store XML documents and make fast
search queries. 4000 XML files (80MB of XML) create with my datamodel (one
SC per XML node) 100 SC which make Cassandra go OOM with Xmx 1GB. On the
contrary an xml DB like eXist handles 4000
Currently running on a single node with intensive write operations.
After running for a while...
Client starts outputting:
TimedOutException()
at
org.apache.cassandra.thrift.Cassandra$insert_result.read(Cassandra.java:12232)
at
org.apache.cassandra.thrift.Cassandra$Client.recv
On Wed, Apr 21, 2010 at 11:31 AM, Anthony Molinaro
wrote:
>
> On Wed, Apr 21, 2010 at 11:08:19AM -0500, Jonathan Ellis wrote:
>> Yes, that looks right, where "token really close" means "slightly less
>> than" (more than would move it into a different node's range).
>
> Is it better to go slightly
There is a patch attached to
https://issues.apache.org/jira/browse/CASSANDRA-948 that needs
volunteers to test.
On Sun, Apr 18, 2010 at 11:13 PM, Mark Greene wrote:
> With the 0.6.0 release, the windows cassandra.bat file errors out. There's a
> bug filed for this already. There's a README or som
On Mon, Apr 19, 2010 at 2:03 PM, Lee Parker wrote:
> I am working on finalizing our backup and restore procedures for a cassandra
> cluster running on EC2. I understand based on the wiki that in order to
> replace a single node, I don't actually need to put data on that node. I
> just need to boo
I'd like to get something besides "I'm seeing close wait but i have no
idea why" for a bug report, since most people aren't seeing that.
On Tue, Apr 20, 2010 at 9:33 AM, Ingram Chen wrote:
> I trace IncomingStreamReader source and found that incoming socket comes
> from MessagingService$SocketThr
You can serialize any RowMutation for BMT but if all you're doing is
deleting rows why bother with BMT? It is not significantly more
efficient than Thrift for that.
On Tue, Apr 20, 2010 at 12:47 PM, Sonny Heer wrote:
> How do i delete a row using BMT method?
>
> Do I simply do a mutate with colu
Hi,
I am new to Cassandra. I would like to use Cassandra to store financial data
(time series). Have question on the data model design.
The example here is the daily stock data. This would be a column family
called dailyStockData. The raw key is stock ticker.
Everyday there are attributes like clo
if you want to look up "what permissions does user X have on asset Y"
then i would model that as a row keyed by userid, containing
supercolumns named by asset ids, and containing subcolumns of the
permissions granted.
On Mon, Apr 19, 2010 at 12:03 PM, tsuraan wrote:
> Suppose I have a CF that hol
[moving to u...@]
0.6 fixes replaying faster than it can flush.
as for why it backs up in the first place before the restart, you can
either (a) throttle writes [set your timeout lower, make your clients
back off temporarily when it gets a timeoutexception] or (b) add
capacity. (b) is recommende
We have a ticket open for this:
https://issues.apache.org/jira/browse/CASSANDRA-809
Ideally I think we'd like to leave the node up to serve reads, if a
disk is erroring out on writes but still read-able. In my experience
this is very common when a disk first begins to fail, as well as in
the "dis
I know Cassandra is very flexible.
a. Because of super_column can not contain large number of columns, you
should not use design 1
b. Maybe with each query, you have to separate to each ColumnFamily
On Wed, Apr 21, 2010 at 1:17 PM, Steve Lihn wrote:
> Hi,
> I am new to Cassandra. I would like to
On Wed, Apr 21, 2010 at 12:21:31PM -0500, Jonathan Ellis wrote:
> [moving to u...@]
>
> 0.6 fixes replaying faster than it can flush.
Yeah, I noticed some of those fixes, and will probably take the leap into
0.6 if I can keep my cluster running (it's not doing too bad, I do about
400K reads and
I'll try to test this out tonight.
On Wed, Apr 21, 2010 at 1:07 PM, Jonathan Ellis wrote:
> There is a patch attached to
> https://issues.apache.org/jira/browse/CASSANDRA-948 that needs
> volunteers to test.
>
> On Sun, Apr 18, 2010 at 11:13 PM, Mark Greene wrote:
> > With the 0.6.0 release, th
On Wed, Apr 21, 2010 at 12:45 PM, Anthony Molinaro
wrote:
>> as for why it backs up in the first place before the restart, you can
>> either (a) throttle writes [set your timeout lower, make your clients
>> back off temporarily when it gets a timeoutexception]
>
> What timeout is this? Something
http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts
On Wed, Apr 21, 2010 at 12:02 PM, Sonny Heer wrote:
> Currently running on a single node with intensive write operations.
>
>
> After running for a while...
>
> Client starts outputting:
>
> TimedOutException()
> at
> org
Hi.
So, I am interested in using Cassandra not because of large amount of data,
but because of following reasons.
1) It's easy to administrate and handle fail-over (and scale, of course)
2) Easy to write an application that makes sense to developers (Developers'
fully in control of how data is or
On Wed, Apr 21, 2010 at 12:17 PM, Steve Lihn wrote:
> [...]
> Design 1: Each attribute is a super column. Therefore each date is a
> column. So we have:
>
> AAPL -> closingPrice -> { '2010-04-13' : 242, '2010-04-14': 245 }
> AAPL -> volume -> { '2010-04-13' : 10.9m, '2010-04-14': 14.4m }
> etc
On Wed, Apr 21, 2010 at 12:56 PM, Soichi Hayashi wrote:
> So, I am interested in using Cassandra not because of large amount of data,
> but because of following reasons.
>
> 1) It's easy to administrate and handle fail-over (and scale, of course)
> 2) Easy to write an application that makes sense
On Wed, Apr 21, 2010 at 12:52:32PM -0500, Jonathan Ellis wrote:
> On Wed, Apr 21, 2010 at 12:45 PM, Anthony Molinaro
> wrote:
> >> as for why it backs up in the first place before the restart, you can
> >> either (a) throttle writes [set your timeout lower, make your clients
> >> back off temporar
On Wed, Apr 21, 2010 at 1:11 PM, Anthony Molinaro
wrote:
> Interesting, in the config I see
>
>
> 5000
>
> So I thought that timeout was for inter-node communication not the thrift
> API, but I see how you probably consider both inter-node traffic and
> thrift traffic as clients. Does this RPC
On Wed, Apr 21, 2010 at 12:05:07PM -0500, Jonathan Ellis wrote:
> On Wed, Apr 21, 2010 at 11:31 AM, Anthony Molinaro
> wrote:
> >
> > On Wed, Apr 21, 2010 at 11:08:19AM -0500, Jonathan Ellis wrote:
> >> Yes, that looks right, where "token really close" means "slightly less
> >> than" (more than w
On Wed, Apr 21, 2010 at 01:24:45PM -0500, Jonathan Ellis wrote:
> On Wed, Apr 21, 2010 at 1:11 PM, Anthony Molinaro
> wrote:
> > Interesting, in the config I see
> >
> >
> > 5000
> >
> > So I thought that timeout was for inter-node communication not the thrift
> > API, but I see how you probabl
I've encountered a problem on cassandra 0.6 while using get_ranged_slices.
I use RP and when I use get_range_slices the keys are not returned in an
"ordered" maner, that means the last key on the list not always the
"greater" key in the list, so I started getting repetitions and ONCE entered
in an
On Wed, Apr 21, 2010 at 2:19 PM, Guilherme Kaster
wrote:
> I've encountered a problem on cassandra 0.6 while using get_ranged_slices.
> I use RP and when I use get_range_slices the keys are not returned in an
> "ordered" maner, that means the last key on the list not always the
> "greater" key in
http://wiki.apache.org/cassandra/CassandraLimitations has good coverage
on the limits around columns.
Are there are design (or practical) limits to the number of rows a
keyspace can have?
Bill
No.
On Wed, Apr 21, 2010 at 2:58 PM, Bill de hOra wrote:
> http://wiki.apache.org/cassandra/CassandraLimitations has good coverage on
> the limits around columns.
>
> Are there are design (or practical) limits to the number of rows a keyspace
> can have?
>
> Bill
>
Hey Bill,
Are you asking if there are limits in the context of a single node or a ring
of nodes?
On Wed, Apr 21, 2010 at 3:58 PM, Bill de hOra wrote:
> http://wiki.apache.org/cassandra/CassandraLimitations has good coverage on
> the limits around columns.
>
> Are there are design (or practical)
Anyone know how to unsubscribe to the mailing list? I tried emailing the
server, user-unsubcr...@cassandra.apache.org, and had no luck.
Thanks in advance!!!
note: I'm using the Thrift API to insert. The commitLog directory
continues to grow. The heap size continues to grow as well.
I decreased MemtableSizeInMB size, but noticed no changes. Any idea
what is causing this, and/or what property i need to tweek to
alleviate this? What is the "insert th
you need to figure out where the memory is going. check tpstats, if
the pending ops are large somewhere that means you're just generating
insert ops faster than it can handle.
On Wed, Apr 21, 2010 at 4:07 PM, Sonny Heer wrote:
> note: I'm using the Thrift API to insert. The commitLog directory
You have a typo: user-unsubscr...@cassandra.apache.org, not
user-unsubcr...@cassandra.apache.org.
:-)
On Wed, Apr 21, 2010 at 3:55 PM, Jennifer Huynh
wrote:
> Anyone know how to unsubscribe to the mailing list? I tried emailing the
> server, user-unsubcr...@cassandra.apache.org, and had no luck.
Is security in terms of remote clients connecting to a cassandra node done
purely at the hardware/firewall level?
i.e. there is no username/pwd like in mysql/sqlserver correct?
Or permissions at the column family level per user ?
They are showing up as completed? Is this correct:
Pool NameActive Pending Completed
STREAM-STAGE 0 0 0
RESPONSE-STAGE0 0 0
ROW-READ-STAGE0 0 517446
L
then that's not the problem.
are you writing large rows that OOM during compaction?
On Wed, Apr 21, 2010 at 4:34 PM, Sonny Heer wrote:
> They are showing up as completed? Is this correct:
>
>
> Pool Name Active Pending Completed
> STREAM-STAGE 0
What does OOM stand for?
for a given insert the size is small (meaning the a single insert
operation only has about a sentence of data) although as the insert
process continues, the columns under a given row key could potentially
grow to be large. Is that what you mean?
An operation entails:
Re
On Wed, Apr 21, 2010 at 5:05 PM, Sonny Heer wrote:
> What does OOM stand for?
out of memory
> for a given insert the size is small (meaning the a single insert
> operation only has about a sentence of data) although as the insert
> process continues, the columns under a given row key could pote
what i mean by as data is processed is that the column size will grow
in cassandra, but my client isn't ever writing large column size under
a given row...
Any idea whats going on here?
On Wed, Apr 21, 2010 at 3:05 PM, Sonny Heer wrote:
> What does OOM stand for?
>
> for a given insert the size
Gotcha. No i don't see anything particularly interesting in the log.
Do i need to turn on higher logging in log4j?
here it is after i killed the client:
INFO [main] 2010-04-21 14:25:52,166 DatabaseDescriptor.java (line
229) Auto DiskAccessMode determined to be standard
INFO [main] 2010-04-2
Hey there! Wanted to let you all know about our next meetup, April
28th. We've got a killer new venue thanks to Amazon.
Check out the details at the link:
http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/calendar/13072272/
Our Speakers this month:
1. Nick Dimiduk, Drawn to Scale: Intro to
> Are you asking if there are limits in the context
> of a single node or a
> ring of nodes?
A ring, but across a few (3+) datacenters.
Bill
Mark Greene wrote:
Hey Bill,
Are you asking if there are limits in the context of a single node or a
ring of nodes?
On Wed, Apr 21, 2010 at 3:58 PM,
Sweet.
Bill
Jonathan Ellis wrote:
No.
On Wed, Apr 21, 2010 at 2:58 PM, Bill de hOra wrote:
http://wiki.apache.org/cassandra/CassandraLimitations has good coverage on
the limits around columns.
Are there are design (or practical) limits to the number of rows a keyspace
can have?
Bill
Hi Daniel,
For a general theoretical understanding, try reading some of the papers on
eventual consistency by Werner Vogels.
Reading the SOSP'07, Dynamo paper would also help with some of the
theoretical foundations and academic references.
To get even further into it, try reading Replication T
Hello,
I'm using Cassandra 0.6.1 and ruby's library. I did some tests on my one-node
development installation about using get_range method to scan the whole CF.
What I want to prove is if a CF with RandomPartitioner can be used with
get_range getting a fixed number of keys at a time, until all
I've tried the patch on https://issues.apache.org/jira/browse/THRIFT-347 ,
but still got this error:
PHP Fatal error: Uncaught exception 'TException' with message 'TSocket:
> timed out reading 1024 bytes from 10.0.0.169:9160' in
> /home/phpcassa/include/thrift/transport/TSocket.php:266
> Stack tr
For each "page" of results, start with the key that was last in the
previous iteration, and you will get all the keys back. The order is
random but consistent.
On Wed, Apr 21, 2010 at 7:55 PM, Lucas Di Pentima
wrote:
> Hello,
>
> I'm using Cassandra 0.6.1 and ruby's library. I did some tests on
I agree your point. I patch the code and log more informations to find out
the real cause.
Here is the code snip I think may be the cause:
IncomingTcpConnection:
public void run()
{
while (true)
{
try
{
MessagingService.validateMagi
But those connections aren't supposed to ever terminate unless a node
dies or is partitioned. So if we "fix" it by adding a socket.close I
worry that we're covering up something more important.
On Wed, Apr 21, 2010 at 8:53 PM, Ingram Chen wrote:
> I agree your point. I patch the code and log mor
Nicolas,
Were all of those super column writes going to the same row?
http://wiki.apache.org/cassandra/CassandraLimitations
Thanks,
Stu
-Original Message-
From: "Nicolas Labrot"
Sent: Wednesday, April 21, 2010 11:54am
To: user@cassandra.apache.org
Subject: Re: Cassandra tuning for runn
arh! That's right.
I check OutboundTcpConnection and it only does closeSocket() after something
went wrong. I will log more in OutboundTcpConnection to see what actually
happens.
Thank your help.
On Thu, Apr 22, 2010 at 10:03, Jonathan Ellis wrote:
> But those connections aren't supposed to
It isn't very well documented apparently, but if you are using 0.6, you can
look at the 'Authenticator' property in the default config for an explanation
of how to authenticate users.
With the SimpleAuthenticator implementation, there are properties files that
define your users and passwords, a
I am using PHP as client to talk to Cassandra server but I found out if any
column value > 8192 bytes, the client crashed with the following error:
PHP Fatal error: Uncaught exception 'TException' with message 'TSocket:
> timed out reading 1024 bytes from 10.0.0.177:9160' in
> /home/phpcassa/incl
After many attempts I found this error only occurred when using PHP
thrift_protocol extension. I don't know if there are some parameters that I
could adjust for this issue. By the way, without the ext the speed is
obviously slow.
On Thu, Apr 22, 2010 at 12:01 PM, Ken Sandney wrote:
> I am using
>
> Ideally I think we'd like to leave the node up to serve reads, if a
> disk is erroring out on writes but still read-able. In my experience
> this is very common when a disk first begins to fail, as well as in
> the "disk is full" case where there is nothing actually wrong with the
> disk per
Hi guys! I'm brand new to Cassandara, and I'm working on a database design.
I don't necessarily know all the advantages/limitations of Cassandra, so I'm
not sure that I'm doing it right...
It seems to me that I can divide my database into two parts:
1. The (mostly) normal data, where every piece
79 matches
Mail list logo