I notice that the documentation on the read path is quite compressed
on this page:
* http://wiki.apache.org/cassandra/ArchitectureOverview
What is the best documentation of the read path? I'm also curious
about the granularity and policies around caching.
Paul Prescod
On Tue, Apr 13, 2010 at 1:55 PM, Paul Prescod wrote:
> What do you mean by "bad practice"? The document above implies that it
> is nearly impossible. It implies that you will have between 1 and 4
> SSTables. Does the administrator have a choice in this matter?
You can tune the 4 number via JMX (p
On Tue, Apr 13, 2010 at 12:00 PM, Benjamin Black wrote:
>> I am probably being totally naive, but is the answer to the question
>> "worst iops on read" just:
>>
>> 3 reads per SSTable * 4 SStables * ReplicationFactor ?
>>
>> = 3 * 4 * 3 = 36?
>>
>
> Why does RF enter this?
A simplistic model for
On Tue, Apr 13, 2010 at 11:55 AM, Paul Prescod wrote:
>
> What do you mean by "bad practice"? The document above implies that it
> is nearly impossible. It implies that you will have between 1 and 4
> SSTables. Does the administrator have a choice in this matter?
>
Hey, I am arguing the proposed
On Tue, Apr 13, 2010 at 11:52 AM, Scott White wrote:
>
>...
>
> Agreed.
Kind of sorry to see Scott White and Benjamin Black being in
agreementbut I guess that's the way yin and yang works. Opposition
is illusory in any case.
Paul Prescod
On Tue, Apr 13, 2010 at 11:31 AM, Benjamin Black wrote:
> ...
> How frequently do you want to write SSTables? How much memory do you
> want Memtables to consume? How long do you want to wait between
> Memtable flushes? There is an entire wiki page on Memtable tuning:
> http://wiki.apache.org/c
> Do you understand you are assuming there have been no compactions,
> which would be extremely bad practice given this number of SSTables?
> A major compaction, as would be best practice given this volume, would
> result in 1 SSTable per CF per node. One. Similarly, you are
> assuming the update
On Tue, Apr 13, 2010 at 11:31 AM, Paul Prescod wrote:
> I am just checking math, not model.
>
> On Tue, Apr 13, 2010 at 10:48 AM, Time Less wrote:
>
>>
>> numRowsOnNode = 10B / 20 = 500M.
>
> 50 million
>
10B / 20 is 500M. The rest of the analysis from our pseudonymous
friend remains faulty.
On Tue, Apr 13, 2010 at 10:48 AM, Time Less wrote:
>
>
>> > If I have 10B rows in my CF, and I can fit 10k rows per
>> > SStable, and the SStables are spread across 5 nodes, and I have 1 bloom
The error you are making is in thinking the Memtable thresholds are
the SSTable limits. They are not.
I am just checking math, not model.
On Tue, Apr 13, 2010 at 10:48 AM, Time Less wrote:
>
> numRowsOnNode = 10B / 20 = 500M.
50 million
> replicationFactor = 3.
> rowsPerSStable = 128MB / 1K = 131k.
>
> Therefore worst-case iops per read on this cluster are:
> (500M * 3 / 131k) * 3 = 150M / 131
> If I have 10B rows in my CF, and I can fit 10k rows per
> > SStable, and the SStables are spread across 5 nodes, and I have 1 bloom
> > filter false positive and 1 tombstone and ask the wrong node for the key,
> > then:
> >
> > Mv = (((2B/10k)+1+1)*3)+1 == ((200,000)+2)*3+1 == 300,007 iops to rea
On Mon, Apr 12, 2010 at 4:27 PM, Time Less wrote:
> With this formula, we can already begin to formulate more useful answers to
> the question. If I have 10B rows in my CF, and I can fit 10k rows per
> SStable, and the SStables are spread across 5 nodes, and I have 1 bloom
> filter false positive
> > What if we have 10B rows in the column family? What sort of index do you
> use
> > that would only require one iop to find the row index block?
>
> basically what is described in sections 5.3 and 5.4 here:
> http://labs.google.com/papers/bigtable.html
>
Incorrect. Section 4 of the paper descri
On Mon, Apr 12, 2010 at 3:45 PM, Time Less wrote:
> I'm confused. That's really worst-case? 3 iops?
max 3 per sstable, as RK clarified out.
> What if we have 10B rows in the column family? What sort of index do you use
> that would only require one iop to find the row index block?
basically wha
> >> worst case is 2 or 3, depending on row size:
> >>
> >> one seek to read the right row index block
> >> one seek to read the row header (bloom filter + column index)
> >> if it's a big row, one seek to read the column block (block size is
> >> configurable, default is 256KB)
> >
> > [This is al
thanks , that is helpful
S.
- Original Message
From: Jonathan Ellis
To: user@cassandra.apache.org
Sent: Fri, April 9, 2010 11:39:26 AM
Subject: Re: Worst case #iops to read a row
worst case is 2 or 3, depending on row size:
one seek to read the right row index block
one seek to read
Right.
On Fri, Apr 9, 2010 at 11:23 AM, Ryan King wrote:
> On Fri, Apr 9, 2010 at 8:39 AM, Jonathan Ellis wrote:
>> worst case is 2 or 3, depending on row size:
>>
>> one seek to read the right row index block
>> one seek to read the row header (bloom filter + column index)
>> if it's a big row,
On Fri, Apr 9, 2010 at 8:39 AM, Jonathan Ellis wrote:
> worst case is 2 or 3, depending on row size:
>
> one seek to read the right row index block
> one seek to read the row header (bloom filter + column index)
> if it's a big row, one seek to read the column block (block size is
> configurable,
worst case is 2 or 3, depending on row size:
one seek to read the right row index block
one seek to read the row header (bloom filter + column index)
if it's a big row, one seek to read the column block (block size is
configurable, default is 256KB)
On Thu, Apr 8, 2010 at 5:21 PM, Scott Shealy w
Not knowing know anything about the physical layout of the data on disk or how
it is accessed when it is read... Could someone who does help
estimate the worst case scenario(no caching at any level) for the number of
iops to read a row of modest size and modest number of columns in a
large col
20 matches
Mail list logo