I notice that the documentation on the read path is quite compressed
on this page:
* http://wiki.apache.org/cassandra/ArchitectureOverview
What is the best documentation of the read path? I'm also curious
about the granularity and policies around caching.
Paul Prescod
On Tue, Apr 13, 2010 at 1:55 PM, Paul Prescod wrote:
> What do you mean by "bad practice"? The document above implies that it
> is nearly impossible. It implies that you will have between 1 and 4
> SSTables. Does the administrator have a choice in this matter?
You can tune the 4 number via JMX (p
On Tue, Apr 13, 2010 at 12:00 PM, Benjamin Black wrote:
>> I am probably being totally naive, but is the answer to the question
>> "worst iops on read" just:
>>
>> 3 reads per SSTable * 4 SStables * ReplicationFactor ?
>>
>> = 3 * 4 * 3 = 36?
>>
>
> Why does RF enter this?
A simplistic model for
On Tue, Apr 13, 2010 at 11:55 AM, Paul Prescod wrote:
>
> What do you mean by "bad practice"? The document above implies that it
> is nearly impossible. It implies that you will have between 1 and 4
> SSTables. Does the administrator have a choice in this matter?
>
Hey, I am arguing the proposed
On Tue, Apr 13, 2010 at 11:52 AM, Scott White wrote:
>
>...
>
> Agreed.
Kind of sorry to see Scott White and Benjamin Black being in
agreementbut I guess that's the way yin and yang works. Opposition
is illusory in any case.
Paul Prescod
On Tue, Apr 13, 2010 at 11:31 AM, Benjamin Black wrote:
> ...
> How frequently do you want to write SSTables? How much memory do you
> want Memtables to consume? How long do you want to wait between
> Memtable flushes? There is an entire wiki page on Memtable tuning:
> http://wiki.apache.org/c
> Do you understand you are assuming there have been no compactions,
> which would be extremely bad practice given this number of SSTables?
> A major compaction, as would be best practice given this volume, would
> result in 1 SSTable per CF per node. One. Similarly, you are
> assuming the update
On Tue, Apr 13, 2010 at 11:31 AM, Paul Prescod wrote:
> I am just checking math, not model.
>
> On Tue, Apr 13, 2010 at 10:48 AM, Time Less wrote:
>
>>
>> numRowsOnNode = 10B / 20 = 500M.
>
> 50 million
>
10B / 20 is 500M. The rest of the analysis from our pseudonymous
friend remains faulty.
On Tue, Apr 13, 2010 at 10:48 AM, Time Less wrote:
>
>
>> > If I have 10B rows in my CF, and I can fit 10k rows per
>> > SStable, and the SStables are spread across 5 nodes, and I have 1 bloom
The error you are making is in thinking the Memtable thresholds are
the SSTable limits. They are not.
I am just checking math, not model.
On Tue, Apr 13, 2010 at 10:48 AM, Time Less wrote:
>
> numRowsOnNode = 10B / 20 = 500M.
50 million
> replicationFactor = 3.
> rowsPerSStable = 128MB / 1K = 131k.
>
> Therefore worst-case iops per read on this cluster are:
> (500M * 3 / 131k) * 3 = 150M / 131
> If I have 10B rows in my CF, and I can fit 10k rows per
> > SStable, and the SStables are spread across 5 nodes, and I have 1 bloom
> > filter false positive and 1 tombstone and ask the wrong node for the key,
> > then:
> >
> > Mv = (((2B/10k)+1+1)*3)+1 == ((200,000)+2)*3+1 == 300,007 iops to rea
On Mon, Apr 12, 2010 at 4:27 PM, Time Less wrote:
> With this formula, we can already begin to formulate more useful answers to
> the question. If I have 10B rows in my CF, and I can fit 10k rows per
> SStable, and the SStables are spread across 5 nodes, and I have 1 bloom
> filter false positive
> > What if we have 10B rows in the column family? What sort of index do you
> use
> > that would only require one iop to find the row index block?
>
> basically what is described in sections 5.3 and 5.4 here:
> http://labs.google.com/papers/bigtable.html
>
Incorrect. Section 4 of the paper descri
On Mon, Apr 12, 2010 at 3:45 PM, Time Less wrote:
> I'm confused. That's really worst-case? 3 iops?
max 3 per sstable, as RK clarified out.
> What if we have 10B rows in the column family? What sort of index do you use
> that would only require one iop to find the row index block?
basically wha
> >> worst case is 2 or 3, depending on row size:
> >>
> >> one seek to read the right row index block
> >> one seek to read the row header (bloom filter + column index)
> >> if it's a big row, one seek to read the column block (block size is
> >> configurable, default is 256KB)
> >
> > [This is al
thanks , that is helpful
S.
- Original Message
From: Jonathan Ellis
To: user@cassandra.apache.org
Sent: Fri, April 9, 2010 11:39:26 AM
Subject: Re: Worst case #iops to read a row
worst case is 2 or 3, depending on row size:
one seek to read the right row index block
one seek to read
Right.
On Fri, Apr 9, 2010 at 11:23 AM, Ryan King wrote:
> On Fri, Apr 9, 2010 at 8:39 AM, Jonathan Ellis wrote:
>> worst case is 2 or 3, depending on row size:
>>
>> one seek to read the right row index block
>> one seek to read the row header (bloom filter + column index)
>> if it's a big row,
On Fri, Apr 9, 2010 at 8:39 AM, Jonathan Ellis wrote:
> worst case is 2 or 3, depending on row size:
>
> one seek to read the right row index block
> one seek to read the row header (bloom filter + column index)
> if it's a big row, one seek to read the column block (block size is
> configurable,
worst case is 2 or 3, depending on row size:
one seek to read the right row index block
one seek to read the row header (bloom filter + column index)
if it's a big row, one seek to read the column block (block size is
configurable, default is 256KB)
On Thu, Apr 8, 2010 at 5:21 PM, Scott Shealy w
19 matches
Mail list logo