On 2010-05-05 04:50, Denis Haskin wrote:
> I've been reading everything I can get my hands on about Cassandra and
> it sounds like a possibly very good framework for our data needs; I'm
> about to take the plunge and do some prototyping, but I thought I'd
> see if I can get a reality check here on
I've been reading everything I can get my hands on about Cassandra and
it sounds like a possibly very good framework for our data needs; I'm
about to take the plunge and do some prototyping, but I thought I'd
see if I can get a reality check here on whether it makes sense.
Our schema should be fai
I would lightly hack sstable2json to write rows to the other cluster,
instead of spitting them out as json. That would be a pretty simple
modification.
On Tue, May 4, 2010 at 9:21 PM, Joost Ouwerkerk wrote:
> I want to export data from one cassandra cluster (production) to
> another (development
On Tue, May 4, 2010 at 8:09 PM, Weijun Li wrote:
> Does anyone use binary memtable to import data into Cassandra?
Yes.
> When you do
> this how do you determine the destination node that should own those data?
You let the StorageProxy API figure that out.
> Is replication factor taken into con
The Streaming service is what moves data around for load balancing,
bootstrap, and decommission operations.
On Tue, May 4, 2010 at 8:08 PM, Weijun Li wrote:
> A dumb question: what is the use of Cassandra streaming service? Any use
> case or example?
>
> Thanks,
> -Weijun
>
--
Jonathan Ellis
On Tue, May 4, 2010 at 4:55 PM, David Rosenstrauch wrote:
> I've had some neat ideas that I'd like to tinker with for a distributed DB
> that implements a very different data model than Cassandra. However, I
> obviously don't want to reinvent the wheel - particularly because in the
> case of dist
get_range_slices with an empty list of column names should work
On Tue, May 4, 2010 at 3:02 PM, Chris Dean wrote:
> I have a ColumnFamily with a small number of keys, but each key has a
> large number of columns.
>
> What's the best way to get just the keys back? I don't want to load all
> the c
I have a situation where I need to accumulate values in Cassandra on an
ongoing basis. Atomic increments are still in the works apparently (see
https://issues.apache.org/jira/browse/CASSANDRA-721) so for the time being
I'll be using Hadoop, and attempting to feed in both the existing values and
th
On Tue, May 4, 2010 at 8:50 PM, Jonathan Ellis wrote:
> Yes, although when and where are TBD.
>
Having it the day before/after Velocity conference at the end of June
would be ideal (hint, hint). I'm sure a lot of people with interest
in Cassandra will be in the area.
Did removing Trove collections have a noticeable effect on performance
or memory use at the time?
On Tuesday, May 4, 2010, Avinash Lakshman wrote:
> Hahaha, Jeff - I remember scampering to remove those references to the Trove
> maps, I think around 2 years ago.
> Avinash
>
> On Tue, May 4, 2010
I want to export data from one cassandra cluster (production) to
another (development). This is not a case of replication, because I
just want a snapshot, not a continuous synchronization. I guess my
options include 'nodetool snapshot' and 'sstable2json'. In our case,
however, the development cl
Ah! Thank you.
Explained better here:
http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency
On Tue, May 4, 2010 at 8:38 PM, Robert Coli wrote:
> On 5/4/10 7:16 AM, Jonathan Shook wrote:
>
>> I may be wrong here. Someone please correct me if I am.
>> ...
>>
On 5/4/10 7:16 AM, Jonathan Shook wrote:
I may be wrong here. Someone please correct me if I am.
...
The ability to set the replication factor on inserts and gets allows
you to decide when (if) and how much (little) to pay the price for
consistency.
You mean "Consistency Level", not "Replicati
Does anyone use binary memtable to import data into Cassandra? When you do
this how do you determine the destination node that should own those data?
Is replication factor taken into consideration when you import binary
memtable?
Thanks,
-Weijun
A dumb question: what is the use of Cassandra streaming service? Any use
case or example?
Thanks,
-Weijun
Yes, although when and where are TBD.
On Tue, May 4, 2010 at 7:38 PM, Mark Greene wrote:
> Jonathan,
> Awesome! Any plans to offer this training again in the future for those of
> us who can't make it this time around?
> -Mark
>
> On Tue, May 4, 2010 at 5:07 PM, Jonathan Ellis wrote:
>>
>> I'll
Jonathan,
Awesome! Any plans to offer this training again in the future for those of
us who can't make it this time around?
-Mark
On Tue, May 4, 2010 at 5:07 PM, Jonathan Ellis wrote:
> I'll be running a day-long Cassandra training class on Friday, May 21.
> I'll cover
>
> - Installation and
;) ya I it was painful
On Tue, May 4, 2010 at 10:53 AM, Avinash Lakshman <
avinash.laksh...@gmail.com> wrote:
> Hahaha, Jeff - I remember scampering to remove those references to the
> Trove maps, I think around 2 years ago.
>
> Avinash
>
>
> On Tue, May 4, 2010 at 2:34 AM, Jeff Hammerbacher wrot
You didn't miss anything. There aren't many .bat files yet.
On Tue, May 4, 2010 at 6:29 PM, Dop Sun wrote:
> Hi,
>
>
>
> As of 0.6.1, I don’t find sstable2jason.bat. I don’t know if I missed
> anything?
>
>
>
> It will good if we can have one, which can help import/ export data in/ out
> develop
Hi,
As of 0.6.1, I don't find sstable2jason.bat. I don't know if I missed
anything?
It will good if we can have one, which can help import/ export data in/ out
development machine.
Thanks,
Regards,
Dop
I've had some neat ideas that I'd like to tinker with for a distributed
DB that implements a very different data model than Cassandra. However,
I obviously don't want to reinvent the wheel - particularly because in
the case of distributed systems, the wheel is quite complicated and hard
to get
On Tue, May 4, 2010 at 4:17 PM, aaron wrote:
> I was noticing cases under the random partitioner where keys I expected to
> be returned
> were not. Can you give a little advice on the expected behaviour of
> get_range_slices
> with the RP and I'll try to write a JUnit for it. e.g. Is it essentiall
More insight for this sstable: the ArrayList for IndexSummary has 644195
entries, so total number of entries for this sstable is: 644195*128=~82mil.
The problem is that the total bits for its BloomFilter (long[19400551]
inside BitSet) is 19400551*64=1241635264, which means each key is taking
~15bit
Thanks Jonathan.
After looking at the Lucandra code I realized my confusions has to do with
get_range_slices
and the RandomPartitioner. When I switched to the OPP I got the expected
behaviour.
I was noticing cases under the random partitioner where keys I expected to
be returned
were not.
On 4 May 2010 23:07, "Jonathan Ellis" wrote:
I'll be running a day-long Cassandra training class on Friday, May 21.
I'll cover
- Installation and configuration
- Application design
- Basics of Cassandra internals
- Operations
- Tuning and troubleshooting
Details at http://riptanobayarea2010052
I'll be running a day-long Cassandra training class on Friday, May 21.
I'll cover
- Installation and configuration
- Application design
- Basics of Cassandra internals
- Operations
- Tuning and troubleshooting
Details at http://riptanobayarea20100521.eventbrite.com/
--
Jonathan Ellis
Project C
BloomFilter is not redundant, because it stores information about
_all_ keys while the index summary stores every 1/128 key.
On Tue, May 4, 2010 at 3:47 PM, Weijun Li wrote:
> Hello,
>
> We stored about 47mil keys in one Cassandra node and what a memory dump
> shows for one of the SStableReader:
Hello,
We stored about 47mil keys in one Cassandra node and what a memory dump
shows for one of the SStableReader:
SSTableReader: 386MB. Among this 386MB, IndexSummary takes about 231MB
but BloomFilter takes 155MB with an embedded huge array long[19.4mil].
It seems that BloomFilter is taking
I have a ColumnFamily with a small number of keys, but each key has a
large number of columns.
What's the best way to get just the keys back? I don't want to load all
the columns if I don't have to. There also isn't necessarily any column
names in common between the different rows.
Cheers,
Chri
Im using Ubuntu 8.04 on 64 bit hosts on rackspace cloud.
Im in the middle of repeating some perf tests, but so far, I get as-good or
slightly better read perf by using standard disk access mode vs mmap. So far
consecutive tests are returning consistent numbers.
Im not sure how to explain it...
it's a 64bit host.
when I cancel mmap I see less memory used and zero swapping, but it's slowly
growing so I'll have to wait and see.
Performance isn't much better, not sure what's the bottleneck now (could
also be the application).
Now on the same host I see:
top - 15:43:59 up 12 days, 4:23, 1
Are you using 32 bit hosts? If not don't be scared of mmap using a
lot of address space, you have plenty. It won't make you swap more
than using buffered i/o.
On Tue, May 4, 2010 at 1:57 PM, Ran Tavory wrote:
> I canceled mmap and indeed memory usage is sane again. So far performance
> hasn't b
You could try mmap_index_only - this would restrict mmap usage to the
index files.
-Nate
On Tue, May 4, 2010 at 11:57 AM, Ran Tavory wrote:
> I canceled mmap and indeed memory usage is sane again. So far performance
> hasn't been great, but I'll wait and see.
> I'm also interested in a way to ca
On Tue, May 4, 2010 at 2:57 PM, Ran Tavory wrote:
> I'm also interested in a way to cap mmap so I can take advantage of it but
> not swap the host to death...
>
Isn't the point of mmap() to just directly access a file as if it were
memory? I can see how it would fool the reporting tools into thi
I canceled mmap and indeed memory usage is sane again. So far performance
hasn't been great, but I'll wait and see.
I'm also interested in a way to cap mmap so I can take advantage of it but
not swap the host to death...
On Tue, May 4, 2010 at 9:38 PM, Kyusik Chung wrote:
> This sounds just like
This sounds just like the slowness I was asking about in another thread - after
a lot of reads, the machine uses up all available memory on the box and then
starts swapping.
My understanding was that mmap helps greatly with read and write perf (until
the box starts swapping I guess)...is there
Hahaha, Jeff - I remember scampering to remove those references to the Trove
maps, I think around 2 years ago.
Avinash
On Tue, May 4, 2010 at 2:34 AM, Jeff Hammerbacher wrote:
> Hey,
>
> History repeating itself a bit, here: one delay in getting Cassandra into
> the open source world was removin
We went through this with Ode w.r.t. Hibernate. Note that Ode still ships with
Hibernate support there, just not with Hibernate libraries in the distribution
or with a strong dependence on Hibernate.
So, if you made Trove maps optional and provided an adapter, you'd be OK. You
just can't bun
On May 4, 2010, at 6:24 PM, Tatu Saloranta wrote:
> But of course Apache can impose their own, however misguided silly
> rules on projects under their umbrella. :-)
I smell an -ac'esque patch to Cassandra brewing. ;)
--Joe
Oh boy... that stupid, stupid bickering about true nature of LGPL.
Both Apache Foundation and FSF appeared like little kids arguing over
whose dad is stronger (this was few years back, when it was discussed
whether LGPL components could be used for Apache License projects)
Almost made me explicitly
1. When initially startup your nodes, please plan your InitialToken of each
node evenly.
2. standard
On Tue, May 4, 2010 at 9:09 PM, Boris Shulman wrote:
> I think that the extra (more than 4GB) memory usage comes from the
> mmaped io, that is why it happens only for reads.
>
> On Tue, May 4, 20
Thanks Jonathan!
Yeah, I will just wait until we are ready for upgrade and hold of on that
project for now.
Erik
One would use batch processes (e.g. through Hadoop) or client-side
aggregation, yes. In theory it would be possible to introduce runtime
sharding across rows into the Cassandra server side, but it's not part
of its design.
In practice, one would want to model their data such that the 'row h
Thanks for prompt reply.
As per your reply, my configuration should be like,
Node 1: Configuraiton
43.193.211.215 43.193.213.160
Node 2: Configuration
43.193.211.215 43.193.213.160
About replication - In my case it should be 2 as i got two cluster node. Am i
right?In C
I may be wrong here. Someone please correct me if I am.
There may be a race condition if you aren't increasing your replication
factor.
If you insert to node A with replication factor 1, and then get from node B
with replication factor 1, it should be possible (and even more likely in
uneven loadi
> All other parameters are identical in both servers. I have added some data
> from both node
> but i am confused on which node data stores. Does it stores in both node
> OR only stores in one node from where it has been added. I can retrieve data
> from both nodes
> but sometime can not. Not sur
Hi,
I am very new to Cassandra 0.6.1. I have setup the two node on two different
server. I would like to know how data distribution and replication work.
Node 1 IP:43.193.211.215Node 2 IP:43.193.213.160
Node 1: Configuraiton 43.193.211.215
Node 2: Configuration 43.193.213.160
4
LGPL ia listed as a part of a forbidden licenses for apache projects
(see Excluded Licenses in http://www.apache.org/legal/3party.html)...
On Tue, May 4, 2010 at 12:34 PM, Jeff Hammerbacher wrote:
> Hey,
>
> History repeating itself a bit, here: one delay in getting Cassandra into
> the open sour
I think that the extra (more than 4GB) memory usage comes from the
mmaped io, that is why it happens only for reads.
On Tue, May 4, 2010 at 2:02 PM, Jordan Pittier wrote:
> I'm facing the same issue with swap. It only occurs when I perform read
> operations (write are very fast :)). So I can't he
Hi Miguel,
I'd like to ask is it possible to have runtime sharding or rows in
cassandra, i.e. if the row has too much new columns inserted then create
another one row (let's say if the original timesharding is one day per row,
then we would have two rows for that day). Maybe batch processes could
2. I have used the same configuration (3 machines with 4GB RAM) and I
got an Out of memory error on compactation each time trying to compact 4
x 128MB sstables. Tried different configuration incl Java Opts, same
result. When I have used 16GB ram machine everything worked like a charm.
Pe 04.05
If R + W > N, where R, W, and N are respectively the read replica count, the
write replica count, and the replication factor, all client reads will see
the most recent write.
On Tue, May 4, 2010 at 4:39 PM, vineet daniel wrote:
> Hi
>
> In a cluster of cassandra if we are updating any key/value a
Hi
In a cluster of cassandra if we are updating any key/value and perform the
fetch query on that same key, we get old/stale data. This can be because of
Read Repair.
Is there any way to fetch the latest updated data from the cluster, as old
data stands no significance and showing it to client is
I'm facing the same issue with swap. It only occurs when I perform read
operations (write are very fast :)). So I can't help you with the memory
probleme.
But to balance the load evenly between nodes in cluster just manually fix
their token.(the "formula" is i * 2^127 / nb_nodes).
Jordzn
On Tue,
Hey,
History repeating itself a bit, here: one delay in getting Cassandra into
the open source world was removing its use of the Trove collections library,
as the license (LGPL) is not compatible with the Apache 2.0 license.
Later,
Jeff
On Sat, Apr 24, 2010 at 11:28 PM, Tatu Saloranta wrote:
>
As you havent specified all the details pertaining to filters and your data
layout (structure) at a very high level what i can suggest is that you need
to create a seperate CF for each filter.
On Sat, May 1, 2010 at 5:04 PM, Rakesh Rajan wrote:
> I am evaluating cassandra to implement activity
Reduce GCGraceSeconds in storage.conf, that should work.
On Tue, May 4, 2010 at 2:31 PM, vineet daniel wrote:
> Only major compactions can clean out obsolete tombstones.
>
> On Tue, May 4, 2010 at 9:59 AM, Jonathan Ellis wrote:
>
>> On Mon, May 3, 2010 at 8:45 PM, Kauzki Aranami
>> wrote:
>> >
Only major compactions can clean out obsolete tombstones.
On Tue, May 4, 2010 at 9:59 AM, Jonathan Ellis wrote:
> On Mon, May 3, 2010 at 8:45 PM, Kauzki Aranami
> wrote:
> > Let me rephrase my question.
> >
> > How does Cassandra deal with bloom filter's false positives on deleted
> records?
>
:) I think this is simpler and I am just stupid
I retried with clean data and commit log directories and everything works
well.
I should have missed something (maybe when I upgraded from 0.5.1 to 0.6) but
anyway, I am just in test.
On Tue, May 4, 2010 at 8:47 AM, Jonathan Shook wrote:
> I
59 matches
Mail list logo