ArnavBalyan commented on code in PR #3269:
URL: https://github.com/apache/parquet-java/pull/3269#discussion_r2304981961
##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileWriter.java:
##########
@@ -1804,14 +1804,27 @@ private static void copy(SeekableInputStream from,
PositionOutputStream to, long
* @throws IOException if there is an error while writing
*/
public void end(Map<String, String> extraMetaData) throws IOException {
+ final long footerStart = out.getPos();
+
+ // Build the footer metadata) in memory using the helper stream
+ InMemoryPositionOutputStream buffer = new
InMemoryPositionOutputStream(footerStart);
Review Comment:
Thanks for the reviews @Fokko @wgtmac, happy to abandon this PR, I agree the
memory pressure would increase. Would you suggest any alternate way to mitigate
this. I see one of the possibilities to snapshot the write to a temporary
directory and do an atomic rename (which would come at the expense of more
intermediate storage requirements, possibly 2x in the worst case). Is there
anything already discussed about it previously? thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]