Hi Kamal

Indeed, Flink does not handle this exception. When this exception occurs,
the Flink job will fail directly and internally keep restarting,
continuously creating new files.

Personally, I think this logic can be optimized. When this exception
occurs, the file with the exception should be deleted before the Flink job
exits, to avoid generating too many unnecessary files.


Best,
Feng

On Mon, Sep 25, 2023 at 10:27 AM Kamal Mittal <kamal.mit...@ericsson.com>
wrote:

> Hello,
>
>
>
> Can you please share that why Flink is not able to handle exception and
> keeps on creating files continuously without closing?
>
>
>
> Rgds,
>
> Kamal
>
>
>
> *From:* Kamal Mittal via user <user@flink.apache.org>
> *Sent:* 21 September 2023 07:58 AM
> *To:* Feng Jin <jinfeng1...@gmail.com>
> *Cc:* user@flink.apache.org
> *Subject:* RE: About Flink parquet format
>
>
>
> Yes.
>
>
>
> Due to below error, Flink bulk writer never close the part file and keep
> on creating new part file continuously. Is flink not handling exceptions
> like below?
>
>
>
> *From:* Feng Jin <jinfeng1...@gmail.com>
> *Sent:* 20 September 2023 05:54 PM
> *To:* Kamal Mittal <kamal.mit...@ericsson.com>
> *Cc:* user@flink.apache.org
> *Subject:* Re: About Flink parquet format
>
>
>
> Hi
>
>
>
> I tested it on my side and also got the same error. This should be a
> limitation of Parquet.
>
>
>
> ```
>
> java.lang.IllegalArgumentException: maxCapacityHint can't be less than
> initialSlabSize 64 1
>
>     at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:
> 57) ~[flink-sql-parquet-1.17.1.jar:1.17.1]
>
>     at org.apache.parquet.bytes.CapacityByteArrayOutputStream.<init>(
> CapacityByteArrayOutputStream.java:153) ~[flink-sql-parquet-1.17.1.jar:
> 1.17.1]
>
>     at org.apache.parquet.column.values.rle.
> RunLengthBitPackingHybridEncoder.<init>(RunLengthBitPackingHybridEncoder
> .jav
>
> ```
>
>
>
>
>
> So I think the current minimum page size that can be set for parquet is
> 64B.
>
>
>
> Best,
>
> Feng
>
>
>
>
>
> On Tue, Sep 19, 2023 at 6:06 PM Kamal Mittal <kamal.mit...@ericsson.com>
> wrote:
>
> Hello,
>
>
>
> If given page size as 1 byte then encountered exception as  -
> ‘maxCapacityHint can't be less than initialSlabSize %d %d’.
>
>
>
> This is coming from class CapacityByteArrayOutputStream and contained in
> parquet-common library.
>
>
>
> Rgds,
>
> Kamal
>
>
>
> *From:* Feng Jin <jinfeng1...@gmail.com>
> *Sent:* 19 September 2023 01:01 PM
> *To:* Kamal Mittal <kamal.mit...@ericsson.com>
> *Cc:* user@flink.apache.org
> *Subject:* Re: About Flink parquet format
>
>
>
> Hi Kamal
>
>
>
> What exception did you encounter? I have tested it locally and it works
> fine.
>
>
>
>
>
> Best,
>
> Feng
>
>
>
>
>
> On Mon, Sep 18, 2023 at 11:04 AM Kamal Mittal <kamal.mit...@ericsson.com>
> wrote:
>
> Hello,
>
>
>
> Checkpointing is enabled and works fine if configured parquet page size is
> at least 64 bytes as otherwise there is exception thrown at back-end.
>
>
>
> Looks to be an issue which is not handled by file sink bulk writer?
>
>
>
> Rgds,
>
> Kamal
>
>
>
> *From:* Feng Jin <jinfeng1...@gmail.com>
> *Sent:* 15 September 2023 04:14 PM
> *To:* Kamal Mittal <kamal.mit...@ericsson.com>
> *Cc:* user@flink.apache.org
> *Subject:* Re: About Flink parquet format
>
>
>
> Hi Kamal
>
>
>
> Check if the checkpoint of the task is enabled and triggered correctly. By
> default, write parquet files will roll a new file when checkpointing.
>
>
>
>
>
> Best,
>
> Feng
>
>
>
> On Thu, Sep 14, 2023 at 7:27 PM Kamal Mittal via user <
> user@flink.apache.org> wrote:
>
> Hello,
>
>
>
> Tried parquet file creation with file sink bulk writer.
>
>
>
> If configured parquet page size as low as 1 byte (allowed configuration)
> then flink keeps on creating multiple ‘in-progress’ state files and with
> content only as ‘PAR1’ and never closed the file.
>
>
>
> I want to know what is the reason of not closing the file and creating
> multiple ‘in-progress’ part files or why no error is given if applicable?
>
>
>
> Rgds,
>
> Kamal
>
>

Reply via email to