Re: High IO and poor read performance on 3.11.2 cassandra cluster

2018-09-11 Thread Elliott Sims
>>> Could you decrease chunk_length_in_kb to 16 or 8 and repeat the test. >>> >>> On Wed, Sep 5, 2018, 5:51 AM wxn...@zjqunshuo.com >>> wrote: >>> >>>> How large is your row? You may meet reading wide row problem. >>>> >>>&

Re: High IO and poor read performance on 3.11.2 cassandra cluster

2018-09-09 Thread Laxmikant Upadhyay
wxn...@zjqunshuo.com >> wrote: >> >>> How large is your row? You may meet reading wide row problem. >>> >>> -Simon >>> >>> *From:* Laxmikant Upadhyay >>> *Date:* 2018-09-05 01:01 >>> *To:* user >>> *Subject:* High IO

Re: High IO and poor read performance on 3.11.2 cassandra cluster

2018-09-04 Thread Alexander Dejanovski
>> -Simon >> >> *From:* Laxmikant Upadhyay >> *Date:* 2018-09-05 01:01 >> *To:* user >> *Subject:* High IO and poor read performance on 3.11.2 cassandra cluster >> >> We have 3 node cassandra cluster (3.11.2) in single dc. >> >> We

Re: High IO and poor read performance on 3.11.2 cassandra cluster

2018-09-04 Thread CPC
> *Subject:* High IO and poor read performance on 3.11.2 cassandra cluster > We have 3 node cassandra cluster (3.11.2) in single dc. > > We have written 450 million records on the table with LCS. The write > latency is fine. After write we perform read and update operations. >

Re: High IO and poor read performance on 3.11.2 cassandra cluster

2018-09-04 Thread wxn...@zjqunshuo.com
How large is your row? You may meet reading wide row problem. -Simon From: Laxmikant Upadhyay Date: 2018-09-05 01:01 To: user Subject: High IO and poor read performance on 3.11.2 cassandra cluster We have 3 node cassandra cluster (3.11.2) in single dc. We have written 450 million records on

High IO and poor read performance on 3.11.2 cassandra cluster

2018-09-04 Thread Laxmikant Upadhyay
We have 3 node cassandra cluster (3.11.2) in single dc. We have written 450 million records on the table with LCS. The write latency is fine. After write we perform read and update operations. When we run read+update operations on newly inserted 1 million records (on top of 450 m records) then t

Re: cassandra concurrent read performance problem

2018-05-28 Thread Alain RODRIGUEZ
Hi, Would you share some more context with us? - What Cassandra version do you use? - What is the data size per node? - How much RAM does the hardware have? - Does your client use paging? A few ideas to explore: - Try tracing the query, see what's taking time (and resources) - From the tracing,

cassandra concurrent read performance problem

2018-05-26 Thread onmstester onmstester
By reading 90 partitions concurrently(each having size > 200 MB), My single node Apache Cassandra became unresponsive, no read and write works for almost 10 minutes. I'm using this configs: memtable_allocation_type: offheap_buffers gc: G1GC heap: 128GB concurrent_reads: 128 (having more tha

Re: Would User Defined Type(UDT) nested in a LIST collections column type give good read performance

2017-10-30 Thread Bill Walters
n't care about insertion order it's better to use Set > rather than list. List implementation requires read before write for some > operations. > > Second, the read performance of the collection itself depends on 2 factors > : > > 1) collection cardinality e.g. the nu

Re: Would User Defined Type(UDT) nested in a LIST collections column type give good read performance

2017-10-30 Thread DuyHai Doan
Hello Bill First if you don't care about insertion order it's better to use Set rather than list. List implementation requires read before write for some operations. Second, the read performance of the collection itself depends on 2 factors : 1) collection cardinality e.g. the

Would User Defined Type(UDT) nested in a LIST collections column type give good read performance

2017-10-29 Thread Bill Walters
Hi Everyone, We need some help in deciding whether to use User Defined Type(UDT) nested in LIST collection columns in our table. In a couple of months, we are planning to roll out a new solution that will incorporate a Read heavy use case. We have one big table which will hold around 250 million

Re: Cassandra Poor Read Performance Response Time

2016-11-02 Thread Jens Rantil
all_ad_impressions_counter_1d > > WHERE ad_id = ? > AND time_bucket = ? > > the cluster is running on servers with 16 GB RAM, and 4 CPU cores and 3 > 100GB datastores, the storage is not local and these VMs are being managed > through openstack. There are rough

Re: Cassandra Poor Read Performance Response Time

2016-11-01 Thread Kant Kodali
time_bucket, > uc, > count > FROM > all_ad_impressions_counter_1d > > WHERE ad_id = ? > AND time_bucket = ? > > the cluster is running on servers with 16 GB RAM, and 4 CPU cores and 3 > 100GB dat

Cassandra Poor Read Performance Response Time

2016-11-01 Thread _ _
PU cores and 3 100GB datastores, the storage is not local and these VMs are being managed through openstack. There are roughly 200 million records being written per day (1 time_bucket) and maybe a few thousand records per partition (time_bucket, ad_id) at most. The amount of writes is not having

Re: Read performance

2015-05-11 Thread Alprema
According to the trace log, only one was read, the compaction strategy is size tiered. I attached a more readable version of my trace for details. On Mon, May 11, 2015 at 11:35 AM, Anishek Agarwal wrote: > how many sst tables were there? what compaction are you using ? These > properties defi

Re: Read performance

2015-05-11 Thread Anishek Agarwal
how many sst tables were there? what compaction are you using ? These properties define how many possible disk reads cassandra has to do to get all the data you need depending on which SST Tables have data for your partition key. On Fri, May 8, 2015 at 6:25 PM, Alprema wrote: > I was planning

Re: Read performance

2015-05-08 Thread Alprema
I was planning on using a more "server-friendly" strategy anyway (by parallelizing my workload on multiple metrics) but my concern here is more about the raw numbers. According to the trace and my estimation of the data size, the read from disk was done at about 30MByte/s and the transfer between

Re: Read performance

2015-05-08 Thread Bryan Holladay
Try breaking it up into smaller chunks using multiple threads and token ranges. 86400 is pretty large. I found ~1000 results per query is good. This will spread the burden across all servers a little more evenly. On Thu, May 7, 2015 at 4:27 AM, Alprema wrote: > Hi, > > I am writing an applicatio

Read performance

2015-05-07 Thread Alprema
Hi, I am writing an application that will periodically read big amounts of data from Cassandra and I am experiencing odd performances. My column family is a classic time series one, with series ID and Day as partition key and a timestamp as clustering key, the value being a double. The query I r

Re: trouble showing cluster scalability for read performance

2014-07-17 Thread Timo Ahokas
g) to handle large queries, or >> maybe simple, Cassandra-style “horizontal scaling” (adding nodes) will be >> sufficient. Sure, you can tune Cassandra for single-node performance, but >> that seems lot a lot of extra work, to me, compared to adding more cheap >> nodes

Re: trouble showing cluster scalability for read performance

2014-07-17 Thread Diane Griffith
t; nodes. > > -- Jack Krupansky > > *From:* Diane Griffith > *Sent:* Thursday, July 17, 2014 9:31 AM > *To:* user > *Subject:* Re: trouble showing cluster scalability for read performance > > Duncan, > > Thanks for that feedback. I'll give a bit more info and the

Re: trouble showing cluster scalability for read performance

2014-07-17 Thread Jack Krupansky
tune Cassandra for single-node performance, but that seems lot a lot of extra work, to me, compared to adding more cheap nodes. -- Jack Krupansky From: Diane Griffith Sent: Thursday, July 17, 2014 9:31 AM To: user Subject: Re: trouble showing cluster scalability for read performance Duncan

Re: trouble showing cluster scalability for read performance

2014-07-17 Thread Diane Griffith
ame keys is not long enough given the fact we are not doing different offsets for each client thread. Thanks, Diane On Thu, Jul 17, 2014 at 3:53 AM, Duncan Sands wrote: > Hi Diane, > > > On 17/07/14 06:19, Diane Griffith wrote: > >> We have been struggling proving out linear re

Re: trouble showing cluster scalability for read performance

2014-07-17 Thread Duncan Sands
Hi Diane, On 17/07/14 06:19, Diane Griffith wrote: We have been struggling proving out linear read performance with our cassandra configuration, that it is horizontally scaling. Wondering if anyone has any suggestions for what minimal configuration and approach to use to demonstrate this. We

trouble showing cluster scalability for read performance

2014-07-16 Thread Diane Griffith
We have been struggling proving out linear read performance with our cassandra configuration, that it is horizontally scaling. Wondering if anyone has any suggestions for what minimal configuration and approach to use to demonstrate this. We were trying to go for a simple set up, so on the

Re: Read performance in map data type

2014-04-04 Thread Tyler Hobbs
http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html On Fri, Apr 4, 2014 at 11:34 AM, Apoorva Gaurav wrote: > > > On Fri, Apr 4, 2014 at 9:37 PM, Tyler Hobbs wrote: > >> >> On Fri, Apr 4, 2014 at 12:41 AM, Apoorva Gaurav < >> apoorva.gau...@myntra.com> wrot

Re: Read performance in map data type

2014-04-04 Thread Apoorva Gaurav
On Fri, Apr 4, 2014 at 9:37 PM, Tyler Hobbs wrote: > > On Fri, Apr 4, 2014 at 12:41 AM, Apoorva Gaurav > wrote: > >> If we store the same data as a json using text data type i.e (studentID >> int, subjectMarksJson text) we are getting a latency of ~10ms from the same >> client for even bigger. I

Re: Read performance in map data type

2014-04-04 Thread Tyler Hobbs
On Fri, Apr 4, 2014 at 12:41 AM, Apoorva Gaurav wrote: > If we store the same data as a json using text data type i.e (studentID > int, subjectMarksJson text) we are getting a latency of ~10ms from the same > client for even bigger. I understand that json is not the preferred storage > for cassand

Re: Read performance in map data type

2014-04-03 Thread Apoorva Gaurav
On Fri, Apr 4, 2014 at 3:32 AM, Robert Coli wrote: > On Thu, Apr 3, 2014 at 12:20 AM, Apoorva Gaurav > wrote: > >> At the client side we are getting a latency of ~350ms, we are using >> datastax driver 2.0.0 and have kept the fetch size as 500. And these are >> coming while reading rows having ~

Re: Read performance in map data type

2014-04-03 Thread Robert Coli
On Thu, Apr 3, 2014 at 12:20 AM, Apoorva Gaurav wrote: > At the client side we are getting a latency of ~350ms, we are using > datastax driver 2.0.0 and have kept the fetch size as 500. And these are > coming while reading rows having ~200 columns. > And you're sure that the 300ms between what ca

Re: Read performance in map data type

2014-04-03 Thread Apoorva Gaurav
~200 columns. >> >> >> On Thu, Apr 3, 2014 at 12:45 PM, Shrikar archak wrote: >> >>> Hi Apoorva, >>> As per the cfhistogram there are some rows which have more than 75k >>> columns and around 150k reads hit 2 SStables. >>> >>> Ar

Re: Read performance in map data type

2014-04-03 Thread Shrikar archak
eads hit 2 SStables. >> >> Are you sure that you are seeing more than 500ms latency? The >> cfhistogram should the worst read performance was around 51ms >> which looks reasonable with many reads hitting 2 sstables. >> >> Thanks, >> Shrikar >> >

Re: Read performance in map data type

2014-04-03 Thread Apoorva Gaurav
re are some rows which have more than 75k > columns and around 150k reads hit 2 SStables. > > Are you sure that you are seeing more than 500ms latency? The cfhistogram > should the worst read performance was around 51ms > which looks reasonable with many reads hitting 2 sstables.

Re: Read performance in map data type

2014-04-03 Thread Shrikar archak
Hi Apoorva, As per the cfhistogram there are some rows which have more than 75k columns and around 150k reads hit 2 SStables. Are you sure that you are seeing more than 500ms latency? The cfhistogram should the worst read performance was around 51ms which looks reasonable with many reads hitting

Re: Read performance in map data type

2014-04-02 Thread Apoorva Gaurav
Hello Shrikar, We are still facing read latency issue, here is the histogram http://pastebin.com/yEvMuHYh On Sat, Mar 29, 2014 at 8:11 AM, Apoorva Gaurav wrote: > Hello Shrikar, > > Yes primary key is (studentID, subjectID). I had dropped the test table, > recreating and populating it post whic

Re: Read performance in map data type

2014-04-01 Thread Apoorva Gaurav
I've observed that reducing fetch size results in better latency (isn't that obvious :-)), tried from fetch size varying from 100 to 1, seeing a lot of errors for 1. Haven't tried modifying the number of columns. Let me start a new thread focused on fetch size. On Wed, Apr 2, 2014 at 9:5

Re: Read performance in map data type

2014-04-01 Thread Sourabh Agrawal
>From the doc : The fetch size controls how much resulting rows will be retrieved simultaneously. So, I guess it does not depend on the number of columns as such. As all the columns for a key reside on the same node, I think it wouldn't matter much whatever be the number of columns as long as we ha

Re: Read performance in map data type

2014-04-01 Thread Apoorva Gaurav
Thanks Sourabh, I've modelled my table as "studentID int, subjectID int, marks int, PRIMARY KEY(studentID, subjectID)" as primarily I'll be querying using studentID and sometime using studentID and subjectID. I've tried driver 2.0.0 and its giving good results. Also using its auto paging feature.

Re: Read performance in map data type

2014-04-01 Thread Robert Coli
On Mon, Mar 31, 2014 at 9:13 PM, Apoorva Gaurav wrote: > Thanks Robert, Is there a workaround, as in our test setups we keep > dropping and recreating tables. > Use unique keyspace (or table) names for each test? That's the approach they're taking in 5202... =Rob

Re: Read performance in map data type

2014-03-31 Thread Apoorva Gaurav
Thanks Robert, Is there a workaround, as in our test setups we keep dropping and recreating tables. On Mon, Mar 31, 2014 at 11:51 PM, Robert Coli wrote: > On Fri, Mar 28, 2014 at 7:41 PM, Apoorva Gaurav > wrote: > >> Yes primary key is (studentID, subjectID). I had dropped the test table, >> r

Re: Read performance in map data type

2014-03-31 Thread Robert Coli
On Fri, Mar 28, 2014 at 7:41 PM, Apoorva Gaurav wrote: > Yes primary key is (studentID, subjectID). I had dropped the test table, > recreating and populating it post which will share the cfhistogram. In such > case is there any practical limit on the rows I should fetch, for e.g. > should I do >

Re: Read performance in map data type

2014-03-29 Thread Sourabh Agrawal
Hi, I don't think there is a problem with the driver. Regarding the schema, you may want to choose between wide rows and skinny rows. http://stackoverflow.com/questions/19039123/cassandra-wide-vs-skinny-rows-for-large-columns http://thelastpickle.com/blog/2013/01/11/primary-keys-in-cql.html When

Re: Read performance in map data type

2014-03-29 Thread Apoorva Gaurav
Hello Sourabh, I'd prefer to do query like select * from marks_table where studentID = ? and subjectID in (?, ?, ??) but if its costly then can happily delegate the responsibility to the application layer. Haven't tried 2.x java driver for this specific issue but tried it once earlier and fou

Re: Read performance in map data type

2014-03-29 Thread Sourabh Agrawal
Hi Apoorva, Do you always query on studentID only or do you need to query on both studentID and subjectID? Also, I think using the latest driver (2.x) can make querying large number of rows efficient. http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0 On Sat, Mar 29, 2

Re: Read performance in map data type

2014-03-28 Thread Apoorva Gaurav
Hello Shrikar, Yes primary key is (studentID, subjectID). I had dropped the test table, recreating and populating it post which will share the cfhistogram. In such case is there any practical limit on the rows I should fetch, for e.g. should I do select * form marks_table where studentID =

Re: Read performance in map data type

2014-03-28 Thread Shrikar archak
Hi Apoorva, I assume this is the table with studentId and subjectId as primary keys and not other like like marks in that. create table marks_table(studentId int, subjectId int, marks int, PRIMARY KEY(studentId,subjectId)); Also could you give the cfhistogram stats? nodetool cfhistograms mark

Read performance in map data type

2014-03-28 Thread Apoorva Gaurav
Hello All, We've a schema which can be modeled as (studentID, subjectID, marks) where combination of studentID and subjectID is unique. Number of studentID can go up to 100 million and for each studentID we can have up to 10k subjectIDs. We are using apahce cassandra 2.0.4 and datastax java driv

RE: cassandra read performance jumps from one row to next

2014-01-22 Thread NEWBROUGH, JONATHAN
Trying to find out why a cassandra read is taking so long, I used tracing and limited the number of rows. Strangely, when I query 600 rows, I get results in ~50 milliseconds. But 610 rows takes nearly 1 second! cqlsh> select containerdefinitionid from containerdefinition limit 600; ... lots of o

Re: Is read performance improved by moving more volatile data to different CF?

2013-10-21 Thread Edward Capriolo
Stupid cell phone. I would say no. If you design around row cache and your data access patterns change, the original assertions may be invalidated and the performance might be worst then the simple design. On Mon, Oct 21, 2013 at 12:03 PM, Edward Capriolo wrote: > I would say no. If you design

Re: Is read performance improved by moving more volatile data to different CF?

2013-10-21 Thread Edward Capriolo
I would say no. If you design around row cache and your data acceas patterns change your assertions will be invalidates and your performance may be worst over time. I would use the kiss here. Keep it a smple usng one column family. Experiement with size teired vs leveled compaction. On Thursday,

Is read performance improved by moving more volatile data to different CF?

2013-10-17 Thread Jan Algermissen
Hi, my rows consist of ~70 columns each, some containing small values, some containing larger amounts of content (think "small documents"). My data is occasionally updated and read several times per day as complete paging through all rows. The updates usually affect only about 10% of the smal

Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo
ction={'class': 'LeveledCompactionStrategy'} AND > compression={'chunk_length_kb': '8', 'crc_check_chance': '0.1', > 'sstable_compression': 'LZ4Compressor'}; > > From: Igor > Reply-To: "user@cassand

Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright
ssion': 'LZ4Compressor'}; From: Igor mailto:i...@4friends.od.ua>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Thursday, May 16, 2013 4:27 PM To: "user@cassandra.apache.org<mailto:user

Re: SSTable size versus read performance

2013-05-16 Thread Igor
just in case it will be useful to somebody - here is my checklist for better read performance from SSD 1. limit read-ahead to 16 or 32 2. enable 'trickle_fsync' (available starting from cassandra 1.1.x) 3. use 'deadline' io-scheduler (much more important for rotational dri

Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo
e tried decreasing my SSTable size >> to 5 MB and changing the chunk size to 8 kb >> >> From: Igor >> Reply-To: "user@cassandra.apache.org" >> Date: Thursday, May 16, 2013 1:55 PM >> >> To: "user@cassandra.apache.org" >> Subj

Re: SSTable size versus read performance

2013-05-16 Thread Bryan Talbot
set to 512. I have tried decreasing my SSTable size > to 5 MB and changing the chunk size to 8 kb > > From: Igor > Reply-To: "user@cassandra.apache.org" > Date: Thursday, May 16, 2013 1:55 PM > > To: "user@cassandra.apache.org" > Subject: Re: SSTable s

Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright
PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: SSTable size versus read performance My 5 cents: I'd check blockdev --getra for data drives - too high values for readahead (default to 256 for debian) can

Re: SSTable size versus read performance

2013-05-16 Thread Igor
My 5 cents: I'd check blockdev --getra for data drives - too high values for readahead (default to 256 for debian) can hurt read performance. On 05/16/2013 05:14 PM, Keith Wright wrote: Hi all, I currently have 2 clusters, one running on 1.1.10 using CQL2 and one running on 1.2.4

Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright
; mailto:user@cassandra.apache.org>> Subject: Re: SSTable size versus read performance With you use compression you should play with your block size. I believe the default may be 32K but I had more success with 8K, nearly same compression ratio, less young gen memory pressure. On T

Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo
y, May 16, 2013 10:23 AM > To: "user@cassandra.apache.org" > Subject: Re: SSTable size versus read performance > > I am not sure of the new default is to use compression, but I do not > believe compression is a good default. I find compression is better for > larger column fa

Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright
y-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Thursday, May 16, 2013 10:23 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: SSTable

Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo
fference. > > Thanks! > > Relevant table definition if helpful (note that I also changed to the LZ4 > compressor expecting better read performance and I decreased the crc change > again to minimize read latency): > > CREATE TABLE global_user ( > user_id BIGINT, > ap

SSTable size versus read performance

2013-05-16 Thread Keith Wright
the OOTB chunk size and BF settings)? I just decreased the sstable size to 5 MB and am waiting for compactions to complete to see if that makes a difference. Thanks! Relevant table definition if helpful (note that I also changed to the LZ4 compressor expecting better read performance and I

Re: tuning for read performance

2012-10-23 Thread aaron morton
nce of different query techniques at Cassandra SFhttp://www.datastax.com/events/cassandrasummit2012/presentations > 1. Consider Leveled compaction instead of Size Tiered. LCS improves > read performance at the cost of more writes. I would look at other options first. If you want to know how m

Re: tuning for read performance

2012-10-22 Thread Aaron Turner
ly only the > metadata is read/written. Would splitting out the document into a separate > column family help? > Some un-expert advice: 1. Consider Leveled compaction instead of Size Tiered. LCS improves read performance at the cost of more writes. 2. You said "skinny column fa

tuning for read performance

2012-10-22 Thread feedly team
Hi, I have a small 2 node cassandra cluster that seems to be constrained by read throughput. There are about 100 writes/s and 60 reads/s mostly against a skinny column family. Here's the cfstats for that family: SSTable count: 13 Space used (live): 231920026568 Space used (total): 231920026

Re: read performance plumetted

2012-10-12 Thread B. Todd Burruss
did the amount of data finally exceed your per machine RAM capacity? is it the same 20% each time you read? or do your periodic reads eventually work through the entire dataset? if you are essentially table scanning your data set, and the size exceeds available RAM, then a degradation like that i

read performance plumetted

2012-10-12 Thread Brian Tarbox
I have a two node cluster hosting a 45 gig dataset. I periodically have to read a high fraction (20% or so) of my 'rows', grabbing a few thousand at a time and then processing them. This used to result in about 300-500 reads a second which seemed quite good. Recently that number has plummeted to

Re: Running repair negatively impacts read performance?

2012-10-03 Thread Charles Brophy
JVM 1.6.31 > > I'll do a repair and get some before/after stats to answer your remaining > questions. > > Thanks Aaron > > On Wed, Sep 26, 2012 at 2:51 PM, aaron morton wrote: > >> Sounds very odd. >> >> Is read performance degrading _after_ repair and co

Re: Running repair negatively impacts read performance?

2012-09-28 Thread Charles Brophy
Sep 26, 2012 at 2:51 PM, aaron morton wrote: > Sounds very odd. > > Is read performance degrading _after_ repair and compactions that normally > result have completed ? > What Compaction Strategy ? > What OS and JVM ? > > What are are the bloom filter false positive stats f

Re: Running repair negatively impacts read performance?

2012-09-26 Thread aaron morton
Sounds very odd. Is read performance degrading _after_ repair and compactions that normally result have completed ? What Compaction Strategy ? What OS and JVM ? What are are the bloom filter false positive stats from cf stats ? Do you have some read latency numbers from cfstats ? Also

Running repair negatively impacts read performance?

2012-09-25 Thread Charles Brophy
Hey guys, I've begun to notice that read operations take a performance nose-dive after a standard (full) repair of a fairly large column family: ~11 million records. Interestingly, I've then noticed that read performance returns to normal after a full scrub of the column family. Is i

Re: Data modeling for read performance

2012-05-20 Thread aaron morton
I would bucket the time stats as well. If you write all the attributes at the same time, and always want to read them together, storing them in something like a JSON blob is legitimate approach. Other Aaron, can you elaborate on > I'm not using composite row keys (it's just > AsciiType) as th

Re: Data modeling for read performance

2012-05-17 Thread Aaron Turner
On Thu, May 17, 2012 at 8:55 AM, jason kowalewski wrote: > We have been attempting to change our data model to provide more > performance in our cluster. > > Currently there are a couple ways to model the data and i was > wondering if some people out there could help us out. > > We are storing tim

Re: about the compaction and read performance

2012-02-19 Thread aaron morton
If you run a manual compaction with nodetool each CF will be compacted to a single SSTable. Not that this is not normally recommended as it means that automatic compaction will take a long time to get to the file. Take a look at nodetool cfhistograms to get an idea of how spread out your dat

about the compaction and read performance

2012-02-16 Thread zhangcheng
Cassandra has no way of knowing that all the data is in the most recent sstable, and will have to check the others too, and this bring a lot of difficulty to data compaction. I have a question that if I want a high performance data compaction, how can I implement that all the columns are all

Re: cassandra read performance on large dataset

2011-12-02 Thread Radim Kolar
Dne 1.12.2011 23:30, Bill napsal(a): > Our largest dataset has 1200 billion rows. Radim, out of curiosity, how many nodes is that running across? 32

Re: cassandra read performance on large dataset

2011-12-01 Thread Bill
ch means average 4 iops per read in cassandra on cold system. After OS cache warms enough to cache indirect seek blocks it gets faster to almost ideal: Workload took 79.76 seconds, thruput 200.59 ops/sec Ideal cassandra read performance is (without caches) is 2 IOPS per read -> one io to read index

cassandra read performance on large dataset

2011-11-28 Thread Radim Kolar
kload took 79.76 seconds, thruput 200.59 ops/sec Ideal cassandra read performance is (without caches) is 2 IOPS per read -> one io to read index, second to data. pure write workload: Running workload in 40 threads 10 ops each. Workload took 302.51 seconds, thruput 13222.62 ops/sec write is s

Re: read performance problem

2011-11-21 Thread Kent Tong
Tong Sent: Monday, November 21, 2011 5:22 AM Subject: Re: read performance problem There is something wrong with the system. Your benchmarks are way off. How are you benchmarking? Are you using the stress lib included? On Nov 19, 2011 8:58 PM, "Kent Tong" wrote: Hi, > > >

Re: read performance problem

2011-11-20 Thread Jahangir Mohammed
There is something wrong with the system. Your benchmarks are way off. How are you benchmarking? Are you using the stress lib included? On Nov 19, 2011 8:58 PM, "Kent Tong" wrote: > Hi, > > On my computer with 2G RAM and a core 2 duo CPU E4600 @ 2.40GHz, I am > testing the > performance of Cassan

Re: read performance problem

2011-11-19 Thread Maxim Potekhin
Try to see if there is a lot of paging going on, and run some benchmarks on the disk itself. Are you running Windows or Linux? Do you think the disk may be fragmented? Maxim On 11/19/2011 8:58 PM, Kent Tong wrote: Hi, On my computer with 2G RAM and a core 2 duo CPU E4600 @ 2.40GHz, I am tes

read performance problem

2011-11-19 Thread Kent Tong
Hi, On my computer with 2G RAM and a core 2 duo CPU E4600 @ 2.40GHz, I am testing the  performance of Cassandra. The write performance is good: It can write a million records  in 10 minutes. However, the query performance is poor and it takes 10 minutes to read  10K records with sequential keys

Re: Read Performance / Schema Design

2011-10-26 Thread David Jeske
On Wed, Oct 26, 2011 at 7:35 PM, Ben Gambley wrote: > Our requirement is to store per user, many unique results (which is > basically an attempt at some questions ..) so I had thought of having the > userid as the row key and the result id as columns. > > The keys for the result ids are maintaine

Re: Read Performance / Schema Design

2011-10-26 Thread Tyler Hobbs
On Wed, Oct 26, 2011 at 9:35 PM, Ben Gambley wrote: > > Hi Everyone > > I have a question with regards read performance and schema design if > someone could help please. > > > Our requirement is to store per user, many unique results (which is > basically an attempt at

Read Performance / Schema Design

2011-10-26 Thread Ben Gambley
Hi Everyone I have a question with regards read performance and schema design if someone could help please. Our requirement is to store per user, many unique results (which is basically an attempt at some questions ..) so I had thought of having the userid as the row key and the result id

Read performance vs. vmstat + your experience with read optimizations

2011-06-20 Thread Philippe
erQuery: measuring the time spent in Hector show that I'm getting at most 20-30 rows per ms and sometimes I get My questions: 1) Any idea where the discrepency can come from ? I'd like to believe there is some magic setting that will x10 my read performance... 2) How do you recommend allocatin

Re: Read Performance

2010-04-02 Thread James Golick
Yes. On Fri, Apr 2, 2010 at 10:35 AM, Ryan King wrote: > On Thu, Apr 1, 2010 at 8:37 PM, James Golick > wrote: > > Well, folks, I'm feeling a little stupid right now (adding to the injury > > inflicted by one Mr. Stump :-P). > > So, here's the story. The cache hit rate is up around 97% now. The

Re: Read Performance

2010-04-02 Thread Ryan King
On Thu, Apr 1, 2010 at 8:37 PM, James Golick wrote: > Well, folks, I'm feeling a little stupid right now (adding to the injury > inflicted by one Mr. Stump :-P). > So, here's the story. The cache hit rate is up around 97% now. The ruby code > is down to around 20-25ms to multiget the 20 rows. I di

Re: Read Performance

2010-04-01 Thread James Golick
Yes. J. Sent from my iPhone. On 2010-04-01, at 9:21 PM, Brandon Williams wrote: On Thu, Apr 1, 2010 at 9:37 PM, James Golick wrote: Well, folks, I'm feeling a little stupid right now (adding to the injury inflicted by one Mr. Stump :-P). So, here's the story. The cache hit rate is up a

Re: Read Performance

2010-04-01 Thread Brandon Williams
On Thu, Apr 1, 2010 at 9:37 PM, James Golick wrote: > Well, folks, I'm feeling a little stupid right now (adding to the injury > inflicted by one Mr. Stump :-P). > > So, here's the story. The cache hit rate is up around 97% now. The ruby > code is down to around 20-25ms to multiget the 20 rows. I

Re: Read Performance

2010-04-01 Thread James Golick
Well, folks, I'm feeling a little stupid right now (adding to the injury inflicted by one Mr. Stump :-P). So, here's the story. The cache hit rate is up around 97% now. The ruby code is down to around 20-25ms to multiget the 20 rows. I did some profiling, though, and realized that a lot of time wa

Re: Read Performance

2010-04-01 Thread Peter Chang
pwned. On Thu, Apr 1, 2010 at 2:09 PM, James Golick wrote: > Damnit! > > > On Thu, Apr 1, 2010 at 2:05 PM, Jeremy Dunck wrote: > >> Or rackspace. ;) >> >> On Thu, Apr 1, 2010 at 2:49 PM, Joseph Stump wrote: >> > Taking our flamewar offline. :-D >> > >> > On Thu, Apr 1, 2010 at 1:36 PM, Ja

Re: Read Performance

2010-04-01 Thread James Golick
Damnit! On Thu, Apr 1, 2010 at 2:05 PM, Jeremy Dunck wrote: > Or rackspace. ;) > > On Thu, Apr 1, 2010 at 2:49 PM, Joseph Stump wrote: > > Taking our flamewar offline. :-D > > > > On Thu, Apr 1, 2010 at 1:36 PM, James Golick > wrote: > >> I don't have the additional hardware to try to iso

Re: Read Performance

2010-04-01 Thread Jeremy Dunck
Or rackspace. ;) On Thu, Apr 1, 2010 at 2:49 PM, Joseph Stump wrote: > Taking our flamewar offline. :-D > > On Thu, Apr 1, 2010 at 1:36 PM, James Golick wrote: >> I don't have the additional hardware to try to isolate this issue atm > > You'd be able to spin up hardware to isolate that issu

Re: Read Performance

2010-04-01 Thread Joseph Stump
Taking our flamewar offline. :-D On Thu, Apr 1, 2010 at 1:36 PM, James Golick wrote: > I don't have the additional hardware to try to isolate this issue atm You'd be able to spin up hardware to isolate that issue on AWS. ;) --Joe

Re: Read Performance

2010-04-01 Thread James Golick
I don't have the additional hardware to try to isolate this issue atm, so I decided to push some code that performs 20% of reads directly from cassandra. The cache hit rate has gone up to about 88% now and it's still climbing, albeit slowly. There remains plenty of free cache space. So far, the av

Re: Read Performance

2010-04-01 Thread Cemal Dalar
Hi James, I don't know how to get the below statistics data and calculate the access times (read/write in ms) in your previous mails. Can you explain a little? Iike to work on it also. CD On Thu, Apr 1, 2010 at 4:15 AM, Jonathan Ellis wrote: > On Wed, Mar 31, 2010 at 6:21 PM, James Golick > w

Re: Read Performance

2010-03-31 Thread Jonathan Ellis
On Wed, Mar 31, 2010 at 6:21 PM, James Golick wrote: > Keyspace: ActivityFeed >         Read Count: 699443 >         Read Latency: 16.11017477192566 ms. >                 Column Family: Events >                 Read Count: 232378 >                 Read Latency: 0.396 ms. >                 Row cac

Re: Read Performance

2010-03-31 Thread James Golick
Keyspace: ActivityFeed Read Count: 699443 Read Latency: 16.11017477192566 ms. Write Count: 69264920 Write Latency: 0.020393242755495856 ms. Pending Tasks: 0 ...snip Column Family: Events SSTable count: 5 Sp

  1   2   >