Hi Frank, Parquet always requires Hadoop. There is a Parquet ticket to make it possible to read/write Parquet without depending on Hadoop, but that's still open. So in order for Flink to be able to work with Hadoop, it requires the necessary Hadoop dependencies as outlined in https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/configuration/advanced/#hadoop-dependencies. When I made a recipe for writing Parquet, I needed to add at least org.apache.hadoop:hadoop-common and org.apache.hadoop:hadoop-mapreduce-client-core.
Best regards, Martijn On Thu, Feb 9, 2023 at 10:07 AM Frank Lyaruu <flya...@gmail.com> wrote: > Hi all, I’m using the Flink k8s operator to run a SQL stream to/from > various connectors, and just added a Parquet format. I customized the image > a bit per the example (mostly by adding maven downloads of flink-connector* > jars). If I do that for flink-parquet-1.16.1 it fails on missing > org/apache/hadoop/conf/Configuration > > I started adding hadoop-common (which contains that class), but that one > is huge and has a bunch of deps, even in that class, so that would be quite > the rabbit hole. I see an old thread that seems very similar: > https://www.mail-archive.com/user@flink.apache.org/msg43028.html but > without any conclusion. > > How _is_ this supposed to work? The flink docs on the parquet format don't > mention anything special: > > https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/table/formats/parquet/ > > regards, Frank > > >