Re: Read IO

2013-02-20 Thread Peter Schuller
> Is this correct ? Yes, at least under optimal conditions and assuming a reasonably sized row. Things like read-ahead (at the kernel level) will play into it; and if your read (even if assumed to be small) straddles two pages you might or might not take another read depending on your kernel setti

Read IO

2013-02-20 Thread Kanwar Sangha
Hi - Can someone explain the worst case IOPS for a read ? No key cache, No row cache, sampling rate say 512. 1) Bloom filter will be checked to see existence of key (In RAM) 2) Index filer sample (IN RAM) will be checked to find approx. location in index file on disk 3) 1 IOPS

Re: Testing compaction strategies on a single production server?

2013-02-20 Thread aaron morton
I *think* it will work. The steps in the blog post to change the compaction strategy before RING_DELAY expires is to ensure no sstables are created before the strategy is changed. But I think you will be venturing into unchartered territory where their might be dragons. And not the fun Disney

Re: Cassandra backup

2013-02-20 Thread aaron morton
You'll need to use two CF's to achieve that. Denormalising to support a workload like that is not a terrible idea. Depending on how big the 7 days hot set is you may get benefit from using a large row cache with one CF. Maybe worth doing some testing. CHeers - Aaron Morton F

Re: benchmark

2013-02-20 Thread Michael Kjellman
http://www.miraclelinux.com/jp/online-service/labs/pdf/zabbix-write-performance is a recent one that comes to mind But that was just write performance.. If you are really doing a case study you might want to do it yourself, in which case you can use the stress tool distributed with Cassandra to

benchmark

2013-02-20 Thread Sai Kumar Ganji
Hello, I am working on a case study to compare non-relational and relational databases where I choose cassandra and MySQL as my alternatives. So, I need some benchmarks to run on cassandra. Can you please point me to some. I am measuring scalability, performance and maintainabiliy. #cassandra, #m

Re: load on cluster nodes

2013-02-20 Thread Hiller, Dean
Check the logs for compaction. CPU will go up while one node may be compacting and the other node is not compacting yet. Compaction can also last quite a long time in some cases. Also, you can do a jstack –l {pid} > thread.txt to do a thread dump to see what that node is doing and compare it

Re: cassandra vs. mongodb quick question(good additional info)

2013-02-20 Thread Wojciech Meler
you have 86400 seconds a day so 42T could take less than 12 hours on 10Gb link 19 lut 2013 02:01, "Hiller, Dean" napisał(a): > I thought about this more, and even with a 10Gbit network, it would take > 40 days to bring up a replacement node if mongodb did truly have a 42T / > node like I had hear

Re: cassandra vs. mongodb quick question(good additional info)

2013-02-20 Thread Edward Capriolo
Write once and compact is generally a bad fit for very large datasets. It is like being able to jump 60 feet in the air, but your legs can not withstand 10 feet drops. http://wiki.apache.org/cassandra/LargeDataSetConsiderations On Wed, Feb 20, 2013 at 3:33 PM, Bryan Talbot wrote: > There seem

Re: cassandra vs. mongodb quick question(good additional info)

2013-02-20 Thread Bryan Talbot
There seem to be some data structures in cassandra which scale with the number of rows stored and consume in-jvm memory without bound (other than number of rows). Even with 1.2, I think that index samples are still kept in-jvm so you may need to tune index_interval. Unfortunately that is a global

Re: cassandra vs. mongodb quick question(good additional info)

2013-02-20 Thread Hiller, Dean
Heh, we just discovered that mistake a few minutes ago….thanks though. I am now wondering and may run a test cluster with a separate 6 nodes and test how compaction is on very large data sets and such. We have tons of research data that sits there so I am wondering if 20T / node is now feasibl

Re: cassandra vs. mongodb quick question(good additional info)

2013-02-20 Thread Bryan Talbot
This calculation is incorrect btw. 10,000 GB transferred at 1.25 GB / sec would complete in about 8,000 seconds which is just 2.2 hours and not 5.5 days. The error is in the conversion (1hr/60secs) which is off by 2 orders of magnitude since (1hr/3600secs) is the correct conversion. -Bryan On

very confused by jmap dump of cassandra

2013-02-20 Thread Hiller, Dean
I took this jmap dump of cassandra(in production). Before I restarted the whole production cluster, I had some nodes running compaction and it looked like all memory had been consumed(kind of like cassandra is not clearing out the caches or memtables fast enough). I am trying to still debug co

Data Model - Additional Column Families or one CF?

2013-02-20 Thread Adam Venturella
My data needs only require me to store JSON, and I can handle this in 1 column family by prefixing row keys with a type, for example: comments:{message_id} Where comments: represents the prefix and {message_id} represents some row key to a message object in the same column family. In this case c

If we Open Source our platform, would it be interesting to you?

2013-02-20 Thread Marcelo Elias Del Valle
Hello All, I’m sending this email because I think it may be interesting for Cassandra users, as this project have a strong usage of Cassandra platform. We are strongly considering opening the source of our DMP (Data Management Platform), if it proves to be technically interesting to other develop

Re: Mutation dropped

2013-02-20 Thread Wei Zhu
What does rpc_timeout control? Only the reads/writes? How about other inter-node communication, like data stream, merkle tree request?  What is the reasonable value for roc_timeout? The default value of 10 seconds are way too long. What is the side effect if it's set to a really small number, sa

Re: File Store

2013-02-20 Thread Hiller, Dean
Astyanax client also has an api to sort of stream a file in and the file is written to various rows depending on size. Dean From: Kanwar Sangha mailto:kan...@mavenir.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Wednesday,

File Store

2013-02-20 Thread Kanwar Sangha
Hi - I am looking for some inputs on the file storage in Cassandra. Each file size can range from 200kb - 3MB. I don't see any limitation on the column size. But would it be a good idea to store these files as binary in the columns ? Thanks, Kanwar

Re: how to debug slowdowns from these log snippets-more info 2

2013-02-20 Thread Hiller, Dean
Oh, and my startup command that cassandra logged was a2.bigde.nrel.gov: xss = -ea -javaagent:/opt/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8021M -Xmx8021M -Xmn1600M -XX:+HeapDumpOnOutOfMemoryError -Xss128k And I remember from docs you don't want to go

SSTable Num

2013-02-20 Thread Kanwar Sangha
Hi - I have around 6TB of data on 1 node and the cfstats show 32 sstables. There is no compaction job running in the background. Is there a limit on the size per sstable ? Or will the sstable compaction continue and eventually we will have 1 file ? Thanks, Kanwar

Re: how to debug slowdowns from these log snippets-more info

2013-02-20 Thread Hiller, Dean
Here is the printout before that log which is probably important as wellŠ.. INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 3618 ms for 2 collections, 7038159096 used; max is 8243904512 INFO [ScheduledTasks:1] 2013-02-20 07:14:00,375 Status

how to debug slowdowns from these log snippets(we know the keys being accessed as well)

2013-02-20 Thread Hiller, Dean
Cassandra version 1.1.4 I captured all the logs of node causing timeouts (in a 6 node cluster). We seem to get these slowdowns every once in a while and it causes our whole website to be 10 times slower. Since PlayOrm actually logs the rows being accessed we know exactly which row the timeout

Re: UnavailableException() for keyspace

2013-02-20 Thread Abhijit Chanda
It looks like some keyspace creation issue. Can you paste the keyspace creation schema? On Wed, Feb 20, 2013 at 9:09 AM, Marcelo Elias Del Valle wrote: > Hello, > > I have a cluster with 3 nodes, all of them configured as seeds and > with correct listen_addess. If I run nodetool ring on any