Hi Dian, thanks a lot for the explanation and help. Option 2) is what I needed and it works.
Regards Eduard On Tue, May 18, 2021 at 6:21 AM Dian Fu <dian0511...@gmail.com> wrote: > Hi, > > 1) The cause of the exception: > The dependencies added via pipeline.jars / pipeline.classpaths will be > used to construct user class loader. > > For your job, the exception happens when > HadoopUtils.getHadoopConfiguration is called. The reason is that > HadoopUtils is provided by Flink which is loaded by the system classloader > instead of user classloader. As the hadoop dependences are only available > in the user classloader and so ClassNotFoundException was raised. > > So, the Hadoop dependencies are a little special than the other > dependencies as they are also dependent by Flink, not only the user coder. > > 2) How to support HADOOP_CLASSPATH in local mode (e.g. execute in IDE) > > Currently, you have to copy the hadoop jars into the PyFlink installation > directory: site-packages/pyflink/lib/ > However, it makes sense to support HADOOP_CLASSPATH environment variable > in local mode to avoid copy the jars manually. I will create a ticket for > this. > > 3) How to support HADOOP_CLASSPATH in remote mode (e.g. submitted via > `flink run`) > You could export HADOOP_CLASSPATH=`hadoop classpath` manually before > executing `flink run`. > > Regards, > Dian > > 2021年5月17日 下午8:45,Eduard Tudenhoefner <edu...@dremio.com> 写道: > > Hello, > > I was wondering whether anyone has tried and/or had any luck creating a > custom catalog with Iceberg + Flink via the Python API ( > https://iceberg.apache.org/flink/#custom-catalog)? > > When doing so, the docs mention that dependencies need to be specified via > *pipeline.jars* / *pipeline.classpaths* ( > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/python/dependency_management/). > I'm providing the iceberg flink runtime + the hadoop libs via those and I > can confirm that I'm landing in the *FlinkCatalogFactory* but then things > fail because it doesn't see the *hadoop dependencies* for some reason. > > What would be the right way to provide the *HADOOP_CLASSPATH* when using > the Python API? I have a minimal code example that shows the issue here: > https://gist.github.com/nastra/92bc3bc7b7037d956aa5807988078b8d#file-flink-py-L38 > > > Thanks > > >