[GitHub] [flink] echauchot commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

GitBox Wed, 27 Oct 2021 02:51:07 -0700


echauchot commented on pull request #17501:
URL: https://github.com/apache/flink/pull/17501#issuecomment-952742574



   > @echauchot
   > 
   > > Sure, I saw the comments about split and data types etc... But I feel 
unconfortable about draft PRs because they usually cannot be merged as is. In 
the case of your PR, merging it without the split support could not be done. So 
I guess the correct way to proceed is to use this PR as an environment for 
design discussions and add extra commits until the PR is ready for prod @JingGe 
@fapaul WDYT ?
   > 
   > you are right, that was the idea of the draft PR. Speaking of the 
splitting support specifically, which will make the implementation way more 
complicated, this PR might be merged without it, because we didn't get any 
requirement for it from the business side. If you have any strong requirement 
w.r.t. the splitting, we'd like to know and reconsider it.
   
   I think splitting is mandatory because if you read a big parquet file with 
no split support, then all the content will end up in a single task manager 
which will lead to OOM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] echauchot commented on pull request #17501: [Draft][FLINK-21406][RecordFormat] build AvroParquetRecordFormat for the new FileSource

Reply via email to