[ 
https://issues.apache.org/jira/browse/HUDI-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339382#comment-17339382
 ] 

Simon Zhou commented on HUDI-431:
---------------------------------

[~vinoth] what is the intention of having inline files? In which cases we'd use 
them? I don't find relevant info from HUDI-430 and its linked PR.

For a given inline file, it cannot be a mix of eg, both parquet and hfile, 
correct?

If we want to add support for inline parquet, we should also have a new file 
format defined in HoodieFileFormat, something like .inlineParquet. Is my 
understanding correct?

Regarding the code structure, are you saying that we want to expose 
ParquetWriter/ParquetReader from HoodieLogFile? It's not common that we return 
reader/writer from file class. Instead, reader/writer should take file object 
as parameter when reading/writing. I'm thinking of some classes in JDK. 

> Design and develop parquet logging in Log file
> ----------------------------------------------
>
>                 Key: HUDI-431
>                 URL: https://issues.apache.org/jira/browse/HUDI-431
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: Storage Management
>            Reporter: sivabalan narayanan
>            Assignee: Vinoth Chandar
>            Priority: Major
>              Labels: help-requested
>
> We have a basic implementation of inline filesystem, to read a file format 
> like Parquet, embedded "inline" into another file.  
> [https://github.com/apache/hudi/blob/master/hudi-common/src/test/java/org/apache/hudi/common/fs/inline/TestInLineFileSystem.java]
>  for sample usage.
>  This idea here is to see if we can embed parquet/hfile formats into the Hudi 
> log files, to get columnar reads on the delta log files as well. This helps 
> us speed up query performance, given the log is row based today. Once Inline 
> FS is available, enable parquet logging support with HoodieLogFile. LogFile 
> can expose a writer (essentially ParquetWriter) and users can write records 
> as though writing to parquet files. Similarly on the read path, a reader 
> (parquetReader) will be exposed which the user can use to read data out of 
> it. 
> This Jira tracks work to implement such parquet inlining into the log format 
> and have the writer and reader use it. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to