Hello, Can you please share that why Flink is not able to handle exception and keeps on creating files continuously without closing?
Rgds, Kamal From: Kamal Mittal via user <user@flink.apache.org> Sent: 21 September 2023 07:58 AM To: Feng Jin <jinfeng1...@gmail.com> Cc: user@flink.apache.org Subject: RE: About Flink parquet format Yes. Due to below error, Flink bulk writer never close the part file and keep on creating new part file continuously. Is flink not handling exceptions like below? From: Feng Jin <jinfeng1...@gmail.com<mailto:jinfeng1...@gmail.com>> Sent: 20 September 2023 05:54 PM To: Kamal Mittal <kamal.mit...@ericsson.com<mailto:kamal.mit...@ericsson.com>> Cc: user@flink.apache.org<mailto:user@flink.apache.org> Subject: Re: About Flink parquet format Hi I tested it on my side and also got the same error. This should be a limitation of Parquet. ``` java.lang.IllegalArgumentException: maxCapacityHint can't be less than initialSlabSize 64 1 at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:57) ~[flink-sql-parquet-1.17.1.jar:1.17.1] at org.apache.parquet.bytes.CapacityByteArrayOutputStream.<init>(CapacityByteArrayOutputStream.java:153) ~[flink-sql-parquet-1.17.1.jar:1.17.1] at org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.<init>(RunLengthBitPackingHybridEncoder.jav ``` So I think the current minimum page size that can be set for parquet is 64B. Best, Feng On Tue, Sep 19, 2023 at 6:06 PM Kamal Mittal <kamal.mit...@ericsson.com<mailto:kamal.mit...@ericsson.com>> wrote: Hello, If given page size as 1 byte then encountered exception as - ‘maxCapacityHint can't be less than initialSlabSize %d %d’. This is coming from class CapacityByteArrayOutputStream and contained in parquet-common library. Rgds, Kamal From: Feng Jin <jinfeng1...@gmail.com<mailto:jinfeng1...@gmail.com>> Sent: 19 September 2023 01:01 PM To: Kamal Mittal <kamal.mit...@ericsson.com<mailto:kamal.mit...@ericsson.com>> Cc: user@flink.apache.org<mailto:user@flink.apache.org> Subject: Re: About Flink parquet format Hi Kamal What exception did you encounter? I have tested it locally and it works fine. Best, Feng On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal <kamal.mit...@ericsson.com<mailto:kamal.mit...@ericsson.com>> wrote: Hello, Checkpointing is enabled and works fine if configured parquet page size is at least 64 bytes as otherwise there is exception thrown at back-end. Looks to be an issue which is not handled by file sink bulk writer? Rgds, Kamal From: Feng Jin <jinfeng1...@gmail.com<mailto:jinfeng1...@gmail.com>> Sent: 15 September 2023 04:14 PM To: Kamal Mittal <kamal.mit...@ericsson.com<mailto:kamal.mit...@ericsson.com>> Cc: user@flink.apache.org<mailto:user@flink.apache.org> Subject: Re: About Flink parquet format Hi Kamal Check if the checkpoint of the task is enabled and triggered correctly. By default, write parquet files will roll a new file when checkpointing. Best, Feng On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user <user@flink.apache.org<mailto:user@flink.apache.org>> wrote: Hello, Tried parquet file creation with file sink bulk writer. If configured parquet page size as low as 1 byte (allowed configuration) then flink keeps on creating multiple ‘in-progress’ state files and with content only as ‘PAR1’ and never closed the file. I want to know what is the reason of not closing the file and creating multiple ‘in-progress’ part files or why no error is given if applicable? Rgds, Kamal