[ 
https://issues.apache.org/jira/browse/KUDU-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16303900#comment-16303900
 ] 

Todd Lipcon commented on KUDU-2243:
-----------------------------------

bq. The current BlockBuilder and BlockDecoder classes appear to wrap an entire 
cblock, though. Any finer-grained organization internally isn't exposed in the 
interface. I may be misunderstanding what you mean - could you point out a 
specific item that could be renamed?

Yea, I think this is the confusion... the hierarchy is:

- each diskrowset contains a "block" for each column. This "block" maps 1:1 to 
the BlockManager's view of a block. It's the unit of allocation and deletion on 
the filesystem, but not the unit of read IO. I'm suggesting that these be 
called "cblocks" -- ie they are blocks (as named by the block manager) which 
store column data. Currently they're called cfiles (because way-back-when we 
had no block manager and they were actually files on disk)
- inside each such cblock the data is split into smaller chunks: header, index 
blocks, data blocks, and footer. These are the units of read IO and caching, 
and also the units of encoding/decoding. I'm suggesting we rename these to 
"pages" (index page / data page) to be more consistent with Parquet and most 
other DBs.

bq. Interesting. I think this may actually be happening, for instance I don't 
see anyplace in PlainBlockDecoder::CopyNextValues which is copying the data.
I think you should be looking at BinaryPlainBlockDecoder, not PlainBlockDecoder 
(which is only used for primitive values, not indirect)

bq. One thing I can't figure out is why cfile_reader has the needs_rewind_ 
field, and all the checks that go along with it. I guess it's so you can call 
Scan multiple times in a row without seek/prepping, but that doesn't seem like 
something we'd actually want to do in the normal case of a scan.

Yea, this might have been a case of something designed-for-but-not-used but I 
can't quite recall the details.



> CFile Reader improvements
> -------------------------
>
>                 Key: KUDU-2243
>                 URL: https://issues.apache.org/jira/browse/KUDU-2243
>             Project: Kudu
>          Issue Type: Improvement
>          Components: cfile
>    Affects Versions: 1.6.0
>            Reporter: Dan Burkert
>
> I've done a pretty thorough review of all the CFile reader code over the last 
> few days in order to make a targeted bug fix, and I've got some ideas for how 
> we can simplify it.  I'd like to get others thoughts.
> * To reduce confusion between CFile data blocks and FS manager blocks, I 
> think we should change all references in code and docs of CFile data blocks 
> to 'cblock'.
> * Much of the complexity of the CFileIterator is due to it's complex public 
> API, which requires separate {{Seek(idx) -> Prepare(nrows) -> Scan(output 
> buf, predicates)}} calls.  Additionally, the Prepare step can materialize 
> many blocks, which then need to be put in a queue. I think all of this could 
> be simplified by changing the API to be {{Seek(idx) -> Scan(nrows, output 
> buf, predicates)}}, and have the CFile iterator only cache the 
> most-recently-materialized block (instead of the queue). For really big scan 
> batches, this will change the internal scan/materialize pattern from 
> materializing all cblocks up front then copying, to materializing and copying 
> of cblocks being interleaved.  Since in most cases cblocks are usually much 
> bigger (256kib) than scan batches (100 cells), I think it won't actually lead 
> to measurably different behavior.
> * {{QueueCurrentDataBlock}} and {{ReadCurrentDataBlock}} should drop 
> {{Current}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to