Re: Using Parquet format in Flink hosted in k8s operator

Martijn Visser Wed, 22 Feb 2023 00:43:03 -0800

Hi Frank,

Parquet always requires Hadoop. There is a Parquet ticket to make it
possible to read/write Parquet without depending on Hadoop, but that's
still open. So in order for Flink to be able to work with Hadoop, it
requires the necessary Hadoop dependencies as outlined in
https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/configuration/advanced/#hadoop-dependencies.
When I made a recipe for writing Parquet, I needed to add at
least org.apache.hadoop:hadoop-common
and org.apache.hadoop:hadoop-mapreduce-client-core.


Best regards,

Martijn

On Thu, Feb 9, 2023 at 10:07 AM Frank Lyaruu <flya...@gmail.com> wrote:

> Hi all, I’m using the Flink k8s operator to run a SQL stream to/from
> various connectors, and just added a Parquet format. I customized the image
> a bit per the example (mostly by adding maven downloads of flink-connector*
> jars). If I do that for flink-parquet-1.16.1 it fails on missing
> org/apache/hadoop/conf/Configuration
>
> I started adding hadoop-common (which contains that class), but that one
> is huge and has a bunch of deps, even in that class, so that would be quite
> the rabbit hole. I see an old thread that seems very similar:
> https://www.mail-archive.com/user@flink.apache.org/msg43028.html but
> without any conclusion.
>
> How _is_ this supposed to work? The flink docs on the parquet format don't
> mention anything special:
>
> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/formats/parquet/
>
> regards, Frank
>
>
>

Re: Using Parquet format in Flink hosted in k8s operator

Reply via email to