[
https://issues.apache.org/jira/browse/HUDI-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339382#comment-17339382
]
Simon Zhou commented on HUDI-431:
---------------------------------
[~vinoth] what is the intention of having inline files? In which cases we'd use
them? I don't find relevant info from HUDI-430 and its linked PR.
For a given inline file, it cannot be a mix of eg, both parquet and hfile,
correct?
If we want to add support for inline parquet, we should also have a new file
format defined in HoodieFileFormat, something like .inlineParquet. Is my
understanding correct?
Regarding the code structure, are you saying that we want to expose
ParquetWriter/ParquetReader from HoodieLogFile? It's not common that we return
reader/writer from file class. Instead, reader/writer should take file object
as parameter when reading/writing. I'm thinking of some classes in JDK.
> Design and develop parquet logging in Log file
> ----------------------------------------------
>
> Key: HUDI-431
> URL: https://issues.apache.org/jira/browse/HUDI-431
> Project: Apache Hudi
> Issue Type: New Feature
> Components: Storage Management
> Reporter: sivabalan narayanan
> Assignee: Vinoth Chandar
> Priority: Major
> Labels: help-requested
>
> We have a basic implementation of inline filesystem, to read a file format
> like Parquet, embedded "inline" into another file.
> [https://github.com/apache/hudi/blob/master/hudi-common/src/test/java/org/apache/hudi/common/fs/inline/TestInLineFileSystem.java]
> for sample usage.
> This idea here is to see if we can embed parquet/hfile formats into the Hudi
> log files, to get columnar reads on the delta log files as well. This helps
> us speed up query performance, given the log is row based today. Once Inline
> FS is available, enable parquet logging support with HoodieLogFile. LogFile
> can expose a writer (essentially ParquetWriter) and users can write records
> as though writing to parquet files. Similarly on the read path, a reader
> (parquetReader) will be exposed which the user can use to read data out of
> it.
> This Jira tracks work to implement such parquet inlining into the log format
> and have the writer and reader use it.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)