Thanks for the answer Gabor!

Just for the sake of clarity:
- The issue is that the `flink-s3-fs-hadoop` does not even read the
`core-site.xml` if it is not on the classpath

Do I understand correctly that the proposal is:
- Write a new `getHadoopConfiguration` method somewhere without using the
dependencies, and reading the files as plain Configuration files
- Start using the new way of accessing these Configurations everywhere in
the Flink code?

Thanks,
Peter

Gabor Somogyi <gabor.g.somo...@gmail.com> ezt írta (időpont: 2022. okt.
25., K, 13:31):

> Hi Peter,
>
> > would this cause issues for the users?
>
> I think yes, it is going to make trouble for users who want to use S3
> without HDFS client.
> Adding HDFS client may happen but enforcing it is not a good direction.
>
> As mentioned I've realized that we have 6 different ways how Hadoop conf is
> loaded
> but not sure one can make one generic from it. Sometimes one need
> HdfsConfiguration
> or YarnConfiguration instances which is hard to generalize.
>
> What I can imagine is the following (but super time consuming):
> * One creates specific configuration instance in the connector
> (HdfsConfiguration, YarnConfiguration)
> * Casting it to Configuration instance
> * Calling a generic loadConfiguration(Configuration conf, List<String>
> filesToLoad)
> * Use locations which are covered in HadoopUtils.getHadoopConfiguration
> (except the deprecated ones)
> * Use this function on all the places around Flink
>
> In filesToLoad one could specify core-site.xml, hdfs-site.xml etc.
> Never tried it out but this idea is in my head for quite some time...
>
> BR,
> G
>
>
> On Tue, Oct 25, 2022 at 11:43 AM Péter Váry <peter.vary.apa...@gmail.com>
> wrote:
>
> > Hi Team,
> >
> > I have recently faced the issue that the S3 FileSystem read my
> > core-site.xml until it was on the classpath, but later when I tried to
> add
> > it using the HADOOP_CONF_DIR then the configuration file was not loaded.
> > Filed a jira [1] and created a PR [2] for fixing it.
> >
> > HadoopUtils.getHadoopConfiguration is the method which considers all the
> > relevant configurations for accessing / loading the hadoop configuration
> > files, so I used it to fix the issue. The downside is that in this method
> > we instantiate the HdfsConfiguration object which requires me to add the
> > hadoop-hdfs-client as a provided dependency.
> >
> > My question for the more experienced folks - would this cause issues for
> > the users? Could we assume that if the hadoop-common is on the classpath
> > then hadoop-hdfs-client is on the classpath as well? Do you see other
> > possible drawbacks or issues with my approach?
> >
> > Thanks,
> > Peter
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-29754
> > [2] https://github.com/apache/flink/pull/21148
> >
>

Reply via email to