When creating a new CF, defaults are now in fact compression enabled.
On Sat, Mar 17, 2012 at 5:50 AM, R. Verlangen wrote:
> Check your log for messages about rebuilding indices: that might grow your
> dataset some.
>
> One thing is for sure: the data import removed all the crap that lasted in
We do not use Cassandra for search. We made modifications to Lucene.
Here is a blog post on our engineering section that talks about what we did:
http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faster_1656.html
On Sun, Mar 18, 2012 at 11:22 PM, Tharindu Mathew wrote:
> Sasha,
>
No. We built a pluggable cache provider for memcache.
On Sun, Oct 30, 2011 at 7:31 PM, Mohit Anchlia wrote:
> On Sun, Oct 30, 2011 at 6:53 PM, Chris Goffinet
> wrote:
> >
> >
> > On Sun, Oct 30, 2011 at 3:34 PM, Sorin Julean
> > wrote:
> >>
> >&
run memcache on each node
and allocate the remaining to that.
> 2. What your network speed ? Do you use trunks ? Do you have a dedicated
> VLAN for gossip/store traffic ?
>
> No dedicated VLAN for gossip. We run at 2Gb/s. We have bonded NIC's.
> Cheers,
> Sorin
>
>
RE: RAID0 Recommendation
Cassandra supports multiple data file directories. Because we do
compactions, it's just much easier to deal with (1) data file directory
that is stripped across all disks as 1 volume (RAID0). There are other ways
to accomplish this though. At Twitter we use software raid (
My best advice on this is, insert a bit of data into the tree, and then do a
heap dump to calculate the extra overhead. It's unfortunately more than you
would like from our testing.
On Tue, Oct 18, 2011 at 8:14 PM, Todd Nine wrote:
> **
> Hi guys,
> We've just built a K tree implementation in
At the time of that project, there wasn't enough resources and dedicated
team. Since then we changed that (based on the presentation I gave). We
decided to focus on other areas, and newer projects. We spent a lot of time
with the community improving failure conditions, performance, etc. We chose
to
g 5 big column families, if I create say 1000 each (5000
> total), do you think it will help me in this case? ( smaller files and so
> smaller load on compaction )
>
> Is it normal to have 5000 column families?
>
> thanks
> Ramesh
>
>
>
> On Mon, Oct 3, 2011 at 2:50 PM
If he puts the mx4j jar (http://mx4j.sourceforge.net/) in his lib/ folder,
he can fetch stats out over HTTP. mx4j is a bridge for JMX->HTTP.
On Mon, Oct 3, 2011 at 2:53 AM, aaron morton wrote:
> Other than manually pull them from JMX, not really.
>
> Most monitoring templates will grab those sta
Most likely what could be happening is you are running single threaded
compaction. Look at the cassandra.yaml of how to enable multi-threaded
compaction. As more data comes into the system, bigger files get created
during compaction. You could be in a situation where you might be compacting
at a hi
You could tail the commit log with `strings` to see what keys are being
inserted.
On Sat, Sep 10, 2011 at 2:24 PM, Jonathan Ellis wrote:
> Two possibilities:
>
> 1) Hinted handoff (this will show up in the logs on the sending
> machine, on the receiving one it will just look like any other write
For things like rolling restarts, we do:
disablethrift
disablegossip
(...wait for all nodes to see this node go down..)
drain
2011/9/10 Radim Kolar
> what is recommended node stop method. drain or kill Java process? i haven't
> seen anybody using drain in stop scripts yet
>
> If i kill Java pro
Twitter runs 0.8 in production/closer to trunk. No big issues from us.
On Thu, Sep 8, 2011 at 8:53 PM, Eric Czech wrote:
> We just migrated from .7.5 to .8.4 in our production environment and it was
> definitely the least painful transition yet (coming all the way from the .4
> release series).
It will also depend on how long you can handle recovery time. So imagine
this case:
3 nodes w/ RF of 3
Each node has 30TB of space used (you never want to fill up entire node).
If one node fails and you must recover, that will take over 3.6 days in
just transferring data alone. That's with a susta
ong time
>> ago. Twitter is a vocal supporter with a large Apache Cassandra install,
>> e.g. "Twitter currently runs a couple hundred Cassandra nodes across a half
>> dozen clusters. "
>> http://www.datastax.com/2011/06/chris-goffinet-of-twitter-to-speak-at-cassa
We also have a ticket open at
https://issues.apache.org/jira/browse/CASSANDRA-2399
We have observed in production the impact of streaming data to new nodes being
added. We actually have our entire dataset in page cache in one of our
clusters, our 99th percentiles go from 20ms to >1 second on s
Congratulations Sylvain!
On Mon, Mar 28, 2011 at 2:56 PM, Edward Capriolo wrote:
> Congratulations Sylvain!
>
> On Mon, Mar 28, 2011 at 4:33 PM, Jonathan Ellis wrote:
> > The Cassandra PMC has voted to add Sylvain as a committer.
> >
> > Welcome, Sylvain, and thanks for the hard work!
> >
> > --
The easiest way to get memlock to work after putting the jna jar in your
classpath is just run this before:
ulimit -a unlimited
in your init script or before starting cassandra. The default for max locked
memory is 32KB on older kernels, and 64KB on newer ones.
-Chris
On Mar 22, 2011, at 12:5
-Dcassandra.join_ring=false
-Chris
On Mar 21, 2011, at 10:32 PM, Jason Harvey wrote:
> I set join_ring=false in my java opts:
> -Djoin_ring=false
>
> However, when the node started up, it joined the ring. Is there
> something I am missing? Using 0.7.4
>
> Thanks,
> Jason
How large are your SSTables on disk? My thought was because you have so many
on disk, we have to store the bloom filter + every 128 keys from index in
memory.
On Mon, Mar 7, 2011 at 4:35 PM, ruslan usifov wrote:
>
>
> 2011/3/8 Chris Goffinet
>
>> The rows you are inserting, w
The rows you are inserting, what is your update ratio to those rows?
On Mon, Mar 7, 2011 at 4:03 PM, ruslan usifov wrote:
>
>
> 2011/3/8 Chris Goffinet
>
> Can you tell me how many SSTables on disk when you see GC pauses? In your 3
>> node cluster, what's the RF f
Can you tell me how many SSTables on disk when you see GC pauses? In your 3
node cluster, what's the RF factor?
On Mon, Mar 7, 2011 at 1:50 PM, ruslan usifov wrote:
>
>
> 2011/3/8 Jonathan Ellis
>
> It sounds like you're complaining that the JVM sometimes does
>> stop-the-world GC.
>>
>> You can
I would like to subscribe to your newsletter.
On Tue, Feb 15, 2011 at 8:04 AM, A J wrote:
>
>
Err. I mean't, thanks Evan for getting this released so fast :)
On Fri, Jan 28, 2011 at 3:18 PM, Chris Goffinet wrote:
> +1
>
>
> On Fri, Jan 28, 2011 at 3:13 PM, Eric Evans wrote:
>
>>
>> It seems like it was just earlier this week that we announced the
>&g
+1
On Fri, Jan 28, 2011 at 3:13 PM, Eric Evans wrote:
>
> It seems like it was just earlier this week that we announced the
> release of 0.6.10. Oh wait, it was. In the time since though,
> CASSANDRA-2058[1] was found and fixed, and that seemed like reason
> enough to fast-track a new release
emtabe thresholds?
> Using mlockall() ?
>
> There are a couple of issues listed in the first paragraphs here that at
> first glance may cause issues
> http://www.oracle.com/technetwork/java/javase/tech/largememory-jsp-137182.html
>
> cheers
> Aaron
>
> On 17/01/201
I've seen about a 13% improvement in practice.
-Chris
On Jan 16, 2011, at 4:01 PM, David Dabbs wrote:
> Hello.
>
> Can anyone comment on the performance impact (positive or negative)
> of running Cassandra configured to use large pages under Linux?
> Yes, YMMV applies, but I thought I'd ask be
What kernel version are you running? I have seen with I/O intense nodes with
2.6.18 to 2.6.24 the kernel has a bug where it locks the JVM and spins to
100%.
On Mon, Dec 20, 2010 at 1:14 PM, Brandon Williams wrote:
> On Mon, Dec 20, 2010 at 2:13 PM, Dan Hendry wrote:
>
>> Yes, I have tried that (
You can disable compaction and enable it later. Use nodetool and
setcompactionthreshold to 0 0
-Chris
On Dec 18, 2010, at 6:05 PM, Wayne wrote:
> Rereading through everything again I am starting to wonder if the page cache
> is being affected by compaction. We have been heavily loading data fo
If you are using Python, and raw Thrift, use the following:
protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport)
The serialization/deserialization is done directly in C.
On Wed, Oct 20, 2010 at 11:53 AM, Wayne wrote:
> We did some testing and the object is 23megs that is taking mor
Digg is using redis for such a feature as well. We use it on the MyNews - Top
in 24 hours. Since we need timestamp ordering + sorting by how many friends
touch a story.
-Chris
On Aug 15, 2010, at 7:34 PM, Benjamin Black wrote:
> http://code.google.com/p/redis/
>
> On Sat, Aug 14, 2010 at 11:
When you can't get the number of threads, that means you have way too many
running (8,000+) usually.
Try running `ps -eLf | grep cassandra`. How many threads?
-Chris
On Jul 29, 2010, at 8:40 PM, Dathan Pattishall wrote:
>
> To Follow up on this thread. I blew away the data for my entire clust
Can you provide the output from `nodetool tpstats`.
-Chris
On Jul 20, 2010, at 8:59 PM, Dathan Pattishall wrote:
> Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
> cassandra> connect cass01/9160
> cassandra> get TimeFrameClicks.Standard2['test_cassandra_alive']
> Exception null
>
>
Not 100% relevant but I found this to be interesting if you're nodes are doing
heavy disk I/O:
http://rackerhacker.com/2008/08/07/reduce-disk-io-for-small-reads-using-memory/
-Chris
On Jul 15, 2010, at 11:47 PM, Peter Schuller wrote:
>> This would require that Cassandra run as root on Linux sy
make
sure the city does not try to stipulate the classification as 'vending
machine'. We anticipate greater lines at places like Cosco because of this ban.
All in all, we take this matter very seriously.
-Chris
On Jul 7, 2010, at 9:56 AM, Eric Evans wrote:
> On Wed, 2010-07-07 at
Hahaha.
Well.. I can comment that we do still have coke products, we have been doing
Cosco runs of recent, and now serve Mexican Coke in glass bottles. :-)
-Chris
On Jul 7, 2010, at 8:17 AM, Eric Evans wrote:
>
> I heard a rumor that Digg was moving away from Coca-Cola products in all
> of it
Digg is not forking Cassandra. We use 0.6 for production, with a few in-house
patches (related to our infrastructure). The biggest difference with our branch
and apache 0.6 branch is we have the work Kelvin and Twitter has done in
regards to Vector Clocks + Distributed Counters. This will never
We're seeing this as well. We were testing with a 40+ node cluster on the
latest 0.6 branch from few days ago.
-Chris
On Jun 3, 2010, at 9:55 PM, Lu Ming wrote:
>
> I have ten 0.5.1 Cassandra nodes in my cluster, and I update them to
> cassandra to 0.6.2 yesterday.
> But today I find six cass
My money is on the fact that the serializer is just horribly verbose. It's
using a basic set of the java serializer.
-Chris
On Tue, May 25, 2010 at 10:02 AM, Ryan King wrote:
> Also, timestamps for each column.
>
> -ryan
>
> On Tue, May 25, 2010 at 5:41 AM, Jonathan Ellis wrote:
> > That's tr
If you are running multiple datacenters, intend to have a lot of writes for
counters, I highly advise against it. We got rid of ZK because of that.
-Chris
On May 16, 2010, at 7:04 PM, S Ahmed wrote:
> Can someone quickly go over how you go about using zookeeper if you want to
> store counts an
>
> So my question is: If I properly flush every node after performing a larger
> bulk insert, can Cassandra merge multiple writes on a single row & column
> family when using the BMT interface? Or is using BMT only feasible for
> loading data on rows that don't exist yet?
>
Yes. When you flu
0 0 5
> HINTED-HANDOFF-POOL 0 0 21
>
>
> On Tue, Apr 27, 2010 at 3:32 AM, Chris Goffinet wrote:
> Upgrade to b20 of Sun's version of JVM. This OOM might be related to
> LinkedBlockQueue issues that were fixed.
>
>
ction is going
> on the individual node is I/O limited
> tpstats: caught me, didn't know this. I will set up a test and try to catch
> a node during the critical time.
>
> Thanks,
> Roland
>
>
> 2010/4/26 Chris Goffinet
>
> Which version of Cassandra?
>>
Which version of Cassandra?
Which version of Java JVM are you using?
What do your I/O stats look like when bulk importing?
When you run `nodeprobe -host tpstats` is any thread pool backing up
during the import?
-Chris
2010/4/26 Roland Hänel
> I have a cluster of 5 machines building a Cass
We don't use PHP to talk to Cassandra directly. But we do have the front-end
communicate to our backend services which are over Thrift. We've used Framed
and Buffered, both required some tweaks. We use the PHP C-extension from the
Thrift repo. I have to admit, it's pretty crappy, we had to make
I wonder if that might be related to this:
https://issues.apache.org/jira/browse/CASSANDRA-896
We switched from a Concurrent structure to LinkedBlockingQueue in 0.6.
-Chris
On Apr 17, 2010, at 9:26 PM, Schubert Zhang wrote:
> We are testing 0.6.0, compares with 0.5.1, and it seems:
>
> 1. 0.
interesting thanks!
>
> On Tue, Apr 6, 2010 at 12:54 PM, Chris Goffinet wrote:
> That's not true. We have been using the Zookeper work we posted on jira.
> That's what we are using internally and have been for months. We are now just
> wrapping up our vector clocks
That's not true. We have been using the Zookeper work we posted on jira. That's
what we are using internally and have been for months. We are now just wrapping
up our vector clocks + distributed counter patch so we can begin transitioning
away from the Zookeeper approach because there are proble
> wrote:
> >>>> I would turn debug logging on globally on the new node, that will
> >>>> answer more questions than just the streaming package.
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Dan Di Spaltro
> >>>
> >>
> >
> >
> >
> > --
> > Dan Di Spaltro
> >
>
--
Chris Goffinet
How many columns in each row?
-Chris
On Mar 31, 2010, at 2:54 PM, James Golick wrote:
> I just tried running the same multi_get against cassandra 1000 times,
> assuming that that'd force it in to cache.
>
> I'm definitely seeing a 5-10ms improvement, but it's still looking like
> 20-30ms on a
Awesome! 2 tickets left.
-Chris
On Mar 27, 2010, at 11:42 PM, Evan Weaver wrote:
> Me too.
>
> On Tue, Mar 23, 2010 at 12:48 PM, Jeff Hodges wrote:
>> I'll be there.
>> --
>> Jeff
>>
>> On Mon, Mar 22, 2010 at 8:40 PM, Eric Florenzano wrote:
>>> Nice, I'll go!
>>>
>>> -Eric Florenzano
>>
>
what's the ulimit set to?
-Chris
On Mar 27, 2010, at 10:29 AM, James Golick wrote:
> Hey,
>
> I put our first cluster in to production (writing but not reading) a couple
> of days ago. Right now, it's got two pretty sizeable nodes taking about 200
> writes per second each and virtually no rea
As promised, here is the official invite to register for the hackathon in
SF. The event starts at 6:30pm on April 22nd.
http://cassandrahackathon.eventbrite.com/
--
Chris Goffinet
.com/, I never did hear a
>> final date but they put up a schedule online (april 20-22).
>>
>> But, 22 probably is a better date, and Eric and Stu are fully capable
>> of representing rackspace without me. :)
>>
>> -Jonathan
>>
>> On Wed, Mar 10, 2010 a
27;s up with this? Thanks!
>
> --
> Toby DiPasquale
>
--
Chris Goffinet
We saw corruption pre 0.4 days. Digg hasn't seen corruption since that got
taken care of. We are only doing this for the "just in case the shit hits the
fan". Cassandra is rapidly changing and it would be completely careless of us
to forgo a path of using a new database as our primary datastore.
On Mar 20, 2010, at 9:10 AM, Jeremy Dunck wrote:
> On Sat, Mar 20, 2010 at 10:40 AM, Chris Goffinet wrote:
>>> 5. Backups : If there is a 4 or 5 TB cassandra cluster what do you
>>> recommend the backup scenario's could be?
>>
>> Worst case scenario
> 5. Backups : If there is a 4 or 5 TB cassandra cluster what do you recommend
> the backup scenario's could be?
Worst case scenario (total failure) we opted to do global snapshots every 24
hours. This creates hard links to SSTables on each node. We copy those SSTables
to HDFS on daily basis.
> Also, Does cassandra support counters? Digg's article said they are going to
> contribute their work to open source any idea when that would be?
>
All of the custom work has been pushed upstream from Digg and continues. We
have a few operational tools we will be releasing that will go into co
59 matches
Mail list logo