This is my first email, just started with Cassandra, I think you want to use
Mutate object or something like it "The Mutation object can also be used to
create new Columns or to delete a Column, if you supply it with a key that
doesn't exist in the database it will create it, if it detects that
The first option, the coordinator node takes care of sending the work to the other nodes. It will return to you when the write has been acknowledged by the number of nodes you specify in the consistency level. AaronOn 07 Jul, 2010,at 06:06 PM, ChingShen wrote:Thanks aaron morton, I have an anoth
Is there any strategy for using OPP with a hash algorithm on the client side
to get both uniform distribution of data in the cluster *and* the ability to
do range queries?
I'm thinking of something like this:
cassKey = (key % 97) + "@" + key;
cassRange = 0 + "@" + range; 1 + "@" + range; ... 96
Hi,
We are testing Cassandra here, we would like to use it to store some data :
- about 1000 inserts / seconds in a CF "RAW" :
Column : TimeUUID (timeuuid of the insert, so 1000 new columns / second)
Row : MMDDHH of the insert (to minimize the size of rows, the biggest
one is 2GB data),
That pattern is discussed here http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/It's also used in http://github.com/tjake/LucandraYou can do range queries with the Random Partitioner in 0.6.*, the order of the return is undefined and it's a bit slower
Hi:all
In my little cluter ,after i insert many many records into cassandra, there
are hundreds of commit log files in commitlog log directory.
is it normal ?
I read the source code of commitlog , there shouldn't be so many commitlog
log files .
any clue will be appreciate.
--
Best Regards
Anty R
Nice to hear, 150 nodes is quite a lot. I have another question on the
topic: I've read that most of the data in facebook is stored as
key=>value -pairs which are cached to memcached layer and then stored
to mysql as simple key-value -pairs for persistence (so no relations
in mysql). Are you still
Aaron, thank you for the link.
What is discussed there is not exactly what I am thinking of. They propose
distributing the keys with . - which will distribute
the values in a way that cannot easily be reversed. What I am proposing is
to distribute the keys evenly among N buckets, where N is much l
http://wiki.apache.org/cassandra/FAQ#a_long_is_exactly_8_bytes
/**
* Takes php integer and packs it to 64bit (8 bytes) long big
endian binary representation.
* @param $x integer
* @return string eight bytes long binary repersentation of
the integer in big endian
Hello.
I added that code and it works on our x86_64 bit Intel machine (just
tested with your test.php). What environment are you using? I haven't
tested the code on a 32bit machine and I believe that it will not work
there. I should have propably added a note to the wiki that it wont
work on 32bit
On Tue, Jul 6, 2010 at 10:05 PM, ChingShen wrote:
> Hi all,
>
> I have A, B, C, D, E, F and G nodes(RF=3), if I run a write
> operation(CL=ALL) on "A" node(coordinator), and the key range between A and
> B, therefore, the data will be stored on B, C and D nodes, right?
depends on replication st
the thrift api allows you to optionally specify column and subcolumn
as well. no idea how or if phpCassa exposes this though.
On Wed, Jul 7, 2010 at 1:51 AM, Moses Dinakaran
wrote:
> Hi,
>
> Thanks for the reply,
>
> The remove method
> $cassandraInstance->remove('cache_pages_key_hash', 'hash_1'
commitlogs can be removed after _all_ the CFs they have data for have
been flushed.
On Wed, Jul 7, 2010 at 5:21 AM, Anty wrote:
> Hi:all
> In my little cluter ,after i insert many many records into cassandra, there
> are hundreds of commit log files in commitlog log directory.
> is it normal ?
>
Thanks Jonathan Ellis,
If so, how does the hinted handoff work? I thought the coordinator node
will write the data to another node(e.g. G node), I'm confused.
Shen
> > Second, if
> > B node is down during the write operation, does it return failure(CL=ALL)
> to
> > user?
>
> yes
>
> --
> Jona
I heard a rumor that Digg was moving away from Coca-Cola products in all
of its vending machines and break rooms. Can anyone from Digg comment on
this?
My near-term beverage consumption strategy is based largely on my
understanding of Digg's, so if there has been a change, I may need to
r
>
> My near-term beverage consumption strategy is based largely on my
> understanding of Digg's, so if there has been a change, I may need to
> reevaluate.
>
Strategy? Care to elaborate?
Thanks,
Tom
does http://wiki.apache.org/cassandra/HintedHandoff help?
On Wed, Jul 7, 2010 at 10:16 AM, ChingShen wrote:
> Thanks Jonathan Ellis,
>
> If so, how does the hinted handoff work? I thought the coordinator node
> will write the data to another node(e.g. G node), I'm confused.
>
> Shen
>
>>
>> > S
yes, i know. I only insert records into one CF.
when a memtable flush complete, commitlog will check if there are some
obsolete commitlog segments.
I don't known why there are so many commitlog file out there.
is there a possibility that too many memtables is waiting for flushing,
which prevent m
On Wed, Jul 7, 2010 at 8:17 AM, Eric Evans wrote:
>
> I heard a rumor that Digg was moving away from Coca-Cola products in all
> of its vending machines and break rooms. Can anyone from Digg comment on
> this?
>
> My near-term beverage consumption strategy is based largely on my
> understanding o
Dr. Pepper has recently been picked up by Coca Cola as well. I wonder if
the UnCola solutions like 7Up and Fanta are just a fad?
On Wed, Jul 7, 2010 at 10:50 AM, Mike Malone wrote:
> On Wed, Jul 7, 2010 at 8:17 AM, Eric Evans wrote:
>
>>
>> I heard a rumor that Digg was moving away from Coca-C
Hi guys,
I have what may be a dumb question but I am confused by how much disk space is
being used by my Cassandra nodes. I have 10 nodes in my cluster with a
replication factor of 3. After I write 1,000,000 rows to the database (100kB
each), I see that they have been distributed very evenly,
number of memtables waiting to flush has a pretty low bound (# of data
file directories in 0.6.3)
did you check your log for exceptions?
On Wed, Jul 7, 2010 at 10:35 AM, Anty wrote:
> yes, i know. I only insert records into one CF.
>
> when a memtable flush complete, commitlog will check if the
Either way, they all seem to have decided to ship with a dependency on HFCS
instead of Sugar, even though users seem to have a better experience with
Sugar, the cost benefit of using HFCS is worth the hit in user satisfaction.
On Jul 7, 2010, at 8:55 AM, Miguel Verde wrote:
> Dr. Pepper has rec
On Wed, Jul 7, 2010 at 8:55 AM, Miguel Verde wrote:
> Dr. Pepper has recently been picked up by Coca Cola as well. I wonder if
> the UnCola solutions like 7Up and Fanta are just a fad?
I'm on the fence. I mean, there's really nothing wrong with a nice cold Coke
to satiate your thirst. But we've
On Tue, Jul 6, 2010 at 8:13 PM, Matt Su wrote:
> This thread make us raised a concern: we choose Cassandra because
> FB,Twitter,Digg are using them, and we’re doubting whether Cassandra is
> definitely trustable.
If cassandra is definitely trustable is something that you have to
find by yourself,
Hi Julie --
Keep in mind that there is additional data storage overhead, including
timestamps and column names. Because the schema can vary from row to row,
the column names are stored with each row, in addition to the data. Disk
space-efficiency is not a primary design goal for Cassandra.
Mason
> It runs correctly during several days. Last night, we started to have timeout
> exception on insert and high cpu load on all nodes.
>
> We stopped inserts. But the CPU remains high (without any insert or read).
Has data been written to the cluster faster than background compaction
is proceeding
> Keep in mind that there is additional data storage overhead, including
> timestamps and column names. Because the schema can vary from row to row,
> the column names are stored with each row, in addition to the data. Disk
> space-efficiency is not a primary design goal for Cassandra.
If the row'
We've been experiencing some cluster-wide performance issues if any single
node in the cluster is performing poorly. For example this occurs if
compaction is running on any node in the cluster, or if a new node is being
bootstrapped.
We believe the root cause of this issue is a performance optimiz
Having a few requests time out while the service detects badness is
typical in this kind of system. I don't think writing a completely
separate StorageProxy + supporting classes to allow avoiding this in
exchange for RF times the network bandwidth is a good idea.
On Wed, Jul 7, 2010 at 11:23 AM,
Coke sucks! Only drink it if you want to work hard for 20 minutes then crash.
I started a new cola that's already way better than Coke and it will solve all
your problems. I'm finalizing my results but so far I only need one drink per
WEEK!
On Jul 7, 2010, at 12:10 PM, Mike Malone wrote:
Hahaha.
Well.. I can comment that we do still have coke products, we have been doing
Cosco runs of recent, and now serve Mexican Coke in glass bottles. :-)
-Chris
On Jul 7, 2010, at 8:17 AM, Eric Evans wrote:
>
> I heard a rumor that Digg was moving away from Coca-Cola products in all
> of it
On Wed, Jul 7, 2010 at 11:33 AM, Jonathan Ellis wrote:
> Having a few requests time out while the service detects badness is
> typical in this kind of system. I don't think writing a completely
> separate StorageProxy + supporting classes to allow avoiding this in
> exchange for RF times the net
Is it true the Mexican engineers have managed to remove the dependency on
HCFS? That should spur uptake.
On Wed, Jul 7, 2010 at 11:45 AM, Chris Goffinet wrote:
> Hahaha.
>
> Well.. I can comment that we do still have coke products, we have been
> doing Cosco runs of recent, and now serve Mexica
On Wed, 2010-07-07 at 09:45 -0700, Chris Goffinet wrote:
> Well.. I can comment that we do still have coke products, we have been
> doing Cosco runs of recent, and now serve Mexican Coke in glass
> bottles. :-)
Interesting. What are you thoughts on the following then?
http://www.boingboing.net/20
You know Eric, at Digg we are really concerned. Last week we called for a Town
Hall meeting to discuss this very issue.
Thankfully we do not technically have 'vending machines'. More like a kitchen
that houses what we call a 'refrigerator'. Our lawyers are checking in to make
sure the city does
Peter Schuller infidyne.com> writes:
> > Keep in mind that there is additional data storage overhead, including
> > timestamps and column names. Because the schema can vary from row to row,
> > the column names are stored with each row, in addition to the data. Disk
> > space-efficiency is not
> I am thinking that the timestamps and column names should be included in the
> column family stats, which basically says 300,000 rows that are 100KB each=30
> GB. My rows only have 1 column so there should only be one timestamp. My
> column name is only 10 bytes long.
>
> This doesn't explain w
On Wed, Jul 7, 2010 at 11:50 AM, Mason Hale wrote:
> I'm curious of what performance benefit is actually being gained from this
> optimization.
It's really pretty easy to saturate gigabit ethernet with a Cassandra
cluster. Mulitplying traffic by roughly RF is definitely a lose.
> Sorry to keep
Could we conditionally use an MD5 request only if a node was in a different
zone/datacenter according to the replication strategy? Presumably the bandwidth
usage within a datacenter isn't a concern as much as latency.
-Original Message-
From: "Mason Hale"
Sent: Wednesday, July 7, 2010 1
I see the same thing here. I have tried to do some maths including
timestamps, columns name, keys and raw data but in the end cassandra reports
a cluster size from 2 to 3 times bigger than the raw data. I am surely
missing something in my formula + i have a lot of free hard drive space, so
it's not
On Wed, Jul 7, 2010 at 12:10 PM, Julie wrote:
> I am thinking that the timestamps and column names should be included in the
> column family stats, which basically says 300,000 rows that are 100KB each=30
> GB. My rows only have 1 column so there should only be one timestamp. My
> column name is
On Wed, Jul 7, 2010 at 8:50 AM, Mike Malone wrote:
> On Wed, Jul 7, 2010 at 8:17 AM, Eric Evans wrote:
>>
>> I heard a rumor that Digg was moving away from Coca-Cola products in all
>> of its vending machines and break rooms. Can anyone from Digg comment on
>> this?
>>
>> My near-term beverage co
On 7/7/10 10:10 AM, Julie wrote:
This doesn't explain why 30 GB of data is taking up 106 GB of disk 24 hours
after all writes have completed. Compactions should be complete, no?
Is your workload straight INSERT or does it contain UPDATE and/or
DELETE? If your workload contains UPDATE/DELETE
Hi, my name is Larry and I am the Beverage Considerer for NUB Enterprises.
I am new to Coke and Digg, but I've already been reading Coke's nutrition
facts on the internet and even purchased a bottle the other day (it's
sitting in my office fridge right now). It looks like quite a promising
office
On Wed, 2010-07-07 at 10:28 -0700, Ryan King wrote:
> I can't really comment on specifics, but Twitter is more of a tea and
> coffee company.
No offense Ryan, but I don't think people actually take what Twitter
does into account when making important decisions like this.
--
Eric Evans
eev...@rac
Rob Coli digg.com> writes:
> Is your workload straight INSERT or does it contain UPDATE and/or
> DELETE? If your workload contains UPDATE/DELETE and GCGraceSeconds (10
> days by default) hasn't passed, you might have a non-trivial number of
> tombstone rows. Only major compactions clean up to
On Wed, Jul 7, 2010 at 2:27 AM, Michael Dürgner wrote:
> Have you done some testing with small nodes already? Because from what we
> saw trying to run IO bound services on small instances is, that their IO
> performance is really bad compared to other instance types as you can read
> in several b
Jonathan Ellis gmail.com> writes:
> On Wed, Jul 7, 2010 at 12:10 PM, Julie nextcentury.com>
wrote:
> >
> > This doesn't explain why 30 GB of data is taking up 106 GB of disk 24 hours
> > after all writes have completed. Compactions should be complete, no?
>
> http://wiki.apache.org/cassandra/
I am also interested in seeing how the performance of Cassandra performs on
various virtual platforms.
On Wed, Jul 7, 2010 at 2:15 PM, Andrew Rollins wrote:
> On Wed, Jul 7, 2010 at 2:27 AM, Michael Dürgner wrote:
>
>> Have you done some testing with small nodes already? Because from what we
>>
Thanks, second funniest thing I've read this month!
On Tue, Jul 6, 2010 at 4:13 PM, Matt Su wrote:
> Thanks for all your guys’ information.
>
> This thread make us raised a concern: we choose Cassandra because
> FB,Twitter,Digg are using them, and we’re doubting whether Cassandra is
> definitely
On Wed, Jul 7, 2010 at 1:22 PM, Julie wrote:
> Jonathan Ellis gmail.com> writes:
>
>> On Wed, Jul 7, 2010 at 12:10 PM, Julie nextcentury.com>
> wrote:
>> >
>> > This doesn't explain why 30 GB of data is taking up 106 GB of disk 24 hours
>> > after all writes have completed. Compactions should b
> I need to set my key cache size properly but am not sure how to set it if
> each
> key cached is stored in the key cache 2 or 3 times. I'd really appreciate
> any
> insight into how this works.
> Thanks!
> Julie
>
>
I actually still have this question but now have another, related one
Dear all,
I recently launched a Q+A site about scalability, and someone posted a
question about Cassandra:
http://scale.metaoptimize.com/questions/29/when-should-i-use-cassandra
As part of the launch of my site, I'm following up on questions by
inviting the communities behind the discussed tech
Hi
If the reason for this happening is compaction,
changing the priority of the compaction thread might be effective.
(cassandra0.6.3 offers this function)
see also 0.6.3 changelog
2010/7/7 Olivier Rosello
> Hi,
>
> We are testing Cassandra here, we would like to use it to store some data :
>
Hi,
I just updated from 0.63 to trunk. With the removal of storage-conf.xml I am
looking to write my API calls to build up my schema which I am fine with.
I have located the API's but figured that there might be a best practice for
doing this, in order to develop the code and populate data the must
Thx Jonathan.
On Wed, Jul 7, 2010 at 11:58 PM, Jonathan Ellis wrote:
> number of memtables waiting to flush has a pretty low bound (# of data
> file directories in 0.6.3)
>
> O ,I seen
> did you check your log for exceptions?
>
Yes ,but no exceptions.
> On Wed, Jul 7, 2010 at 10:35 AM, Ant
you're not out of disk space, are you?
if not you could try restarting, that should clear them out if nothing else does
On Wed, Jul 7, 2010 at 8:07 PM, Anty wrote:
> Thx Jonathan.
>
>
> On Wed, Jul 7, 2010 at 11:58 PM, Jonathan Ellis wrote:
>>
>> number of memtables waiting to flush has a prett
Let's move this to the user@ list...
On Wed, 2010-07-07 at 16:32 -0700, Dave Viner wrote:
> After starting up my cluster, I see this one of the system.log :
>
>
> ERROR [GMFD:1] 2010-07-07 23:27:46,044 PropertyFileEndPointSnitch.java
> (line 91) Could not find end point information for /10.202
Hi all,
I found the Cassandra paper(in 5.7 section) that mentioned "all system
control messages rely on UDP", but when I start up my cluster, I haven't see
any informations about UDP. Why?
TCP connections:
port 7000 = Gossip
port 9160 = Thrift service
port 8080 = JMX
Right?
Shen
On Thu, Jul 8, 2010 at 9:21 AM, Jonathan Ellis wrote:
> you're not out of disk space, are you?
>
> No.
> if not you could try restarting, that should clear them out if nothing else
> does
>
Yes. I restarted the node, then the commitlogs were removed.
But recover so many commitlog take so much ti
Because that part of the paper is no longer accurate.
On Wed, Jul 7, 2010 at 8:29 PM, ChingShen wrote:
> Hi all,
>
> I found the Cassandra paper(in 5.7 section) that mentioned "all system
> control messages rely on UDP", but when I start up my cluster, I haven't see
> any informations about UD
After more investigation, as well as a bunch of trial and error, here's what
seems to be happening.
1. The rack.properties file key values (the stuff before the =) must match
the toString() method of the InetAddress object for the host.
2. (In EC2) the InetAddress of a node *other* than the one yo
So, does it mean that only the CL=ZERO and CL=ANY support hinted handoff,
right?
Thanks.
Shen
On Wed, Jul 7, 2010 at 11:28 PM, Jonathan Ellis wrote:
> does http://wiki.apache.org/cassandra/HintedHandoff help?
>
> On Wed, Jul 7, 2010 at 10:16 AM, ChingShen
> wrote:
> > Thanks Jonathan Ellis,
>
thanks
yes, it works on x86_64
my environment:
Darwin 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010;
root:xnu-1504.7.4~1/RELEASE_I386 i386
is have any Compatible solutions on 64bit and 32bit?
2010/7/7 Juho Mäkinen
> Hello.
>
> I added that code and it works on our x86_6
No, it means that HH writes don't count towards meeting the requested
ConsistencyLevel.
On Wed, Jul 7, 2010 at 9:36 PM, ChingShen wrote:
> So, does it mean that only the CL=ZERO and CL=ANY support hinted handoff,
> right?
>
> Thanks.
>
> Shen
>
> On Wed, Jul 7, 2010 at 11:28 PM, Jonathan Ellis w
hmm... I'm really confused.
The http://wiki.apache.org/cassandra/API document mentioned that if write
ConsistencyLevel=ANY that "Ensure the write has been written to at least 1
node, including hinted recipients.", I couldn't imagine this case. :(
If I have A,B,C and D nodes(RF=1), and write Consis
Hi all,
What is the recommended strategy for backing up the data stored inside
cassandra?
I realized that Cass. is a distributed database, and with a decent
replication factor, backups are "already done" in some sense. But, as a
relatively new user, I'm always concerned that the data is only wit
Hi all,
If I want to add a new Keyspace, does it mean I have to distribute my
storage-conf.xml to whole nodes? and restart whole nodes?
Shen
> If I want to add a new Keyspace, does it mean I have to distribute my
> storage-conf.xml to whole nodes? and restart whole nodes?
I *think* that is the case in Cassandra 0.6, but I'll let someone else
comment. In trunk/upcoming 7 there are live schema upgrades that
propagate through the cluste
Here are my notes on how to make schema changes in 0.6:
# Empty the commitlog with "nodetool drain."
=> NOTE while this is running, the node will not accept writes.
# Shutdown Cassandra and verify that there is no remaining data in the
commitlog.
=> HOW to verify?
# Delete the sstable file
71 matches
Mail list logo