Hi Peter,

> would this cause issues for the users?

I think yes, it is going to make trouble for users who want to use S3
without HDFS client.
Adding HDFS client may happen but enforcing it is not a good direction.

As mentioned I've realized that we have 6 different ways how Hadoop conf is
loaded
but not sure one can make one generic from it. Sometimes one need
HdfsConfiguration
or YarnConfiguration instances which is hard to generalize.

What I can imagine is the following (but super time consuming):
* One creates specific configuration instance in the connector
(HdfsConfiguration, YarnConfiguration)
* Casting it to Configuration instance
* Calling a generic loadConfiguration(Configuration conf, List<String>
filesToLoad)
* Use locations which are covered in HadoopUtils.getHadoopConfiguration
(except the deprecated ones)
* Use this function on all the places around Flink

In filesToLoad one could specify core-site.xml, hdfs-site.xml etc.
Never tried it out but this idea is in my head for quite some time...

BR,
G


On Tue, Oct 25, 2022 at 11:43 AM Péter Váry <peter.vary.apa...@gmail.com>
wrote:

> Hi Team,
>
> I have recently faced the issue that the S3 FileSystem read my
> core-site.xml until it was on the classpath, but later when I tried to add
> it using the HADOOP_CONF_DIR then the configuration file was not loaded.
> Filed a jira [1] and created a PR [2] for fixing it.
>
> HadoopUtils.getHadoopConfiguration is the method which considers all the
> relevant configurations for accessing / loading the hadoop configuration
> files, so I used it to fix the issue. The downside is that in this method
> we instantiate the HdfsConfiguration object which requires me to add the
> hadoop-hdfs-client as a provided dependency.
>
> My question for the more experienced folks - would this cause issues for
> the users? Could we assume that if the hadoop-common is on the classpath
> then hadoop-hdfs-client is on the classpath as well? Do you see other
> possible drawbacks or issues with my approach?
>
> Thanks,
> Peter
>
> [1] https://issues.apache.org/jira/browse/FLINK-29754
> [2] https://github.com/apache/flink/pull/21148
>

Reply via email to