RE: How to store large columns?

2013-01-21 Thread Jason Brown
The reason for multiple keys (and, by extension, multiple columns) is to better distribute the write/read load across the cluster as keys will (hopefully) be distributed on different nodes. This helps to avoid hot spots. Hope this helps, -Jason Brown Netflix Fro

RE: CQL3 Frame Length

2013-01-21 Thread Pierre Chalamet
Hi, That's not a good reason imho. This would have been better to have chunks of data (like in the good old IFF file format). If the client is not able to read the chunk, just skip it. And frankly, that's not a few more bytes that would have killed us. As an example, request tracing was added p

Re: Cassandra Performance Benchmarking.

2013-01-21 Thread Pradeep Kumar Mantha
Hi, Thanks for the information.. I upgraded my cassandra version to 1.2.0 and tried running the experiment again to find the statistics. My application took nearly 529 seconds for querying 76896 keys. Please find the statistic information below for 32 threads ( where each thread query 76896 key

Re: sstable2json had random behavior

2013-01-21 Thread Binh Nguyen
Hi William, I also saw this one before but it always happened in my case when I have only Data and Index files. The problem goes away when I have all another files (Compression, Filter...) On Mon, Jan 21, 2013 at 11:36 AM, William Oberman wrote: > I'm running 1.1.6 from the datastax repo. > > I

sstable2json had random behavior

2013-01-21 Thread William Oberman
I'm running 1.1.6 from the datastax repo. I ran sstable2json and got the following error: Exception in thread "main" java.io.IOError: java.io.IOException: dataSize of 7020023552240793698 starting at 993981393 would be larger than file /var/lib/cassandra/data/X-Data.db length 7502161255

Re: How to store large columns?

2013-01-21 Thread Vegard Berget
I think the main difference will be that by splitting on multiple rows, you will get the data evenly distributed on multiple nodes. On large data this is probably better. .vegard, Sávio Teles : >Astyanax split large objects into multiple keys. Is it a good idea? It >is better >to split into m

Re: How to store large columns?

2013-01-21 Thread Sávio Teles
Astyanax split large objects into multiple keys. Is it a good idea? It is better to split into multiple columns? Thanks 2013/1/21 Sávio Teles > > Thanks Keith Wright. > > > 2013/1/21 Keith Wright > >> This may be helpful: >> https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store >> >> F

RE: High Read and write through put

2013-01-21 Thread Viktor Jevdokimov
For such a generic question without technical details of requirements, the answer - use defaults. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT

RE: Concurrent write performance

2013-01-21 Thread Viktor Jevdokimov
Do you experience any performance problems? This will be the last thing to look at. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius

High Read and write through put

2013-01-21 Thread Jay Svc
Folks, For given situation I am expecting multiple read and write request to a same cluster. What are primary design or configuration consideration we should make? Any thoughts or links to such documentation is appreciated. Thanks, Jay

Concurrent write performance

2013-01-21 Thread Jay Svc
Folks, I would like to write(insert or update) to a single row in a column family. I have concurrent requests which will write to a single row. Do we see any performance implications because of concurrent writes to a single row where comparator has to sort the columns at the same time? Please s

Re: How to store large columns?

2013-01-21 Thread Sávio Teles
Thanks Keith Wright. 2013/1/21 Keith Wright > This may be helpful: > https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store > > From: Vegard Berget > Reply-To: "user@cassandra.apache.org" , Vegard > Berget > Date: Monday, January 21, 2013 8:35 AM > To: "user@cassandra.apache.org" > Sub

Re: How to store large columns?

2013-01-21 Thread Keith Wright
This may be helpful: https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store From: Vegard Berget mailto:p...@fantasista.no>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>>, Vegard Berget mailto:p...@fantasista.no>> Date: Monday,

Re: Cassandra timeout whereas it is not much busy

2013-01-21 Thread Nicolas Lalevée
Le 17 janv. 2013 à 05:00, aaron morton a écrit : > Check the disk utilisation using iostat -x 5 > If you are on a VM / in the cloud check for CPU steal. > Check the logs for messages from the GCInspector, the ParNew events are times > the JVM is paused. I have seen logs about that. I didn't w

Re: How to store large columns?

2013-01-21 Thread Vegard Berget
  Hi, You could split it into multiple columns on the client side:   RowKeyData: Part1: [1mb], Part2: [1mb], Part3: [1mb]...PartN[1mb] Now you can use multiple get() in parallell to get the files back and then join them back to one file. I _think_ maybe the new C

How to store large columns?

2013-01-21 Thread Sávio Teles
We wish to store a column in a row with size larger thanthrift_framed_transport_size_in_mb . But, Thrift has a maximum frame size configured by thrift_framed_transport_size_in_mb in cassandra.yaml. so, How to store columns with size larger than thrift_framed_transport_size_in_mb? Increasing this va

Re: Efficiency between SimpleStrategy and NetworkTopologyStrategy

2013-01-21 Thread Francisco Sobral
Thanks! Francisco Sobral. On Jan 21, 2013, at 5:55 AM, aaron morton wrote: > Use the NetworkTopologyStrategy, it's the default and it saves a lot of > trouble later. > > There is no real performance difference between NTS and SS. The NTS uses the > information provided by the snitch, it doe

AW: Cassandra at Amazon AWS

2013-01-21 Thread Roland Gude
On a side note: If you are going for priam AND you are using LeveledCompaction think carefully whether you need incremental backups. The s3 upload cost can be very high because Leveled Compaction tends to create a lot of files and each put request to s3 costs money. We had this setup in relative

Re: Cassandra pending compaction tasks keeps increasing

2013-01-21 Thread aaron morton
The main guarantee LCS gives you is that most reads will only touch 1 row http://www.datastax.com/dev/blog/when-to-use-leveled-compaction If compaction is falling behind this may not hold. nodetool cfhistograms tells you how many SSTables were read from for reads. It's a recent histogram that

Re: Cassandra Performance Benchmarking.

2013-01-21 Thread aaron morton
You can also see what it looks like from the server side. nodetool proxyhistograms will show you full request latency recorded by the coordinator. nodetool cfhistograms will show you the local read latency, this is just the time it takes to read data on a replica and does not include network o