Hi Martijn, I agree with your opinion, one must consider it well whether it's a good tradeoff. In my view adding an extra load directory would worth because it's relatively small change/risk (+ structural solution is going this direction) but enforcing HDFS lib is not such good bandaid.
No matter what will happen here I'm intended to clean up this area as much as possible with structural solutions. BR, G On Wed, Oct 26, 2022 at 3:57 PM Martijn Visser <martijnvis...@apache.org> wrote: > Hi Gabor, > > They are absolutely not coupled, I'm merely mentioning it because it could > "waste" potentially someone's time: first by creating a bandaid which would > then need to be replaced with a more permanent solution. Could be fine, but > it all depends on how much effort & errorprone the bandaid is vs creating a > structural solution. > > Thanks, > > Martijn > > On Wed, Oct 26, 2022 at 11:27 AM Gabor Somogyi <gabor.g.somo...@gmail.com> > wrote: > > > Hi Martijn, > > > > > I'm not sure that creating another bandaid is a good idea. > > > > I think we have multiple issues here which are not coupled: > > * Hadoop config handling in Flink is not consistent (for example runtime > > uses hadoop FS connector code and each connector has it's own > > implementation) > > * S3 connector for Flink quality can be definitely improved > > > > Fixing the first doesn't require a full S3 connector rewrite though I > agree > > with your direction. > > > > > I'm not sure if there are any experts on S3 in the Flink Community but > it > > would be great if we could find them and see if/how we can improve. > > > > I'm not calling myself S3 expert but I'm planning to implement a token > > provider for S3 so intended to drive that effort. > > While I'm there I can have a look how it's implemented. > > > > BR, > > G > > > > > > On Tue, Oct 25, 2022 at 8:52 PM Martijn Visser <martijnvis...@apache.org > > > > wrote: > > > > > Hi all, > > > > > > I have been thinking that we should consider creating one new, rock > solid > > > S3 connector for Flink. I think it's confusing for users that there is > an > > > S3 Presto and an S3 Hadoop implementation, which both are not perfect. > > I'm > > > not sure that creating another bandaid is a good idea. > > > > > > I'm not sure if there are any experts on S3 in the Flink Community but > it > > > would be great if we could find them and see if/how we can improve. > > > > > > Thanks, > > > > > > Martijn > > > > > > Op di 25 okt. 2022 om 16:56 schreef Péter Váry < > > > peter.vary.apa...@gmail.com> > > > > > > > Thanks for the answer Gabor! > > > > > > > > Just for the sake of clarity: > > > > - The issue is that the `flink-s3-fs-hadoop` does not even read the > > > > `core-site.xml` if it is not on the classpath > > > > > > > > Do I understand correctly that the proposal is: > > > > - Write a new `getHadoopConfiguration` method somewhere without using > > the > > > > dependencies, and reading the files as plain Configuration files > > > > - Start using the new way of accessing these Configurations > everywhere > > in > > > > the Flink code? > > > > > > > > Thanks, > > > > Peter > > > > > > > > Gabor Somogyi <gabor.g.somo...@gmail.com> ezt írta (időpont: 2022. > > okt. > > > > 25., K, 13:31): > > > > > > > > > Hi Peter, > > > > > > > > > > > would this cause issues for the users? > > > > > > > > > > I think yes, it is going to make trouble for users who want to use > S3 > > > > > without HDFS client. > > > > > Adding HDFS client may happen but enforcing it is not a good > > direction. > > > > > > > > > > As mentioned I've realized that we have 6 different ways how Hadoop > > > conf > > > > is > > > > > loaded > > > > > but not sure one can make one generic from it. Sometimes one need > > > > > HdfsConfiguration > > > > > or YarnConfiguration instances which is hard to generalize. > > > > > > > > > > What I can imagine is the following (but super time consuming): > > > > > * One creates specific configuration instance in the connector > > > > > (HdfsConfiguration, YarnConfiguration) > > > > > * Casting it to Configuration instance > > > > > * Calling a generic loadConfiguration(Configuration conf, > > List<String> > > > > > filesToLoad) > > > > > * Use locations which are covered in > > HadoopUtils.getHadoopConfiguration > > > > > (except the deprecated ones) > > > > > * Use this function on all the places around Flink > > > > > > > > > > In filesToLoad one could specify core-site.xml, hdfs-site.xml etc. > > > > > Never tried it out but this idea is in my head for quite some > time... > > > > > > > > > > BR, > > > > > G > > > > > > > > > > > > > > > On Tue, Oct 25, 2022 at 11:43 AM Péter Váry < > > > peter.vary.apa...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > Hi Team, > > > > > > > > > > > > I have recently faced the issue that the S3 FileSystem read my > > > > > > core-site.xml until it was on the classpath, but later when I > tried > > > to > > > > > add > > > > > > it using the HADOOP_CONF_DIR then the configuration file was not > > > > loaded. > > > > > > Filed a jira [1] and created a PR [2] for fixing it. > > > > > > > > > > > > HadoopUtils.getHadoopConfiguration is the method which considers > > all > > > > the > > > > > > relevant configurations for accessing / loading the hadoop > > > > configuration > > > > > > files, so I used it to fix the issue. The downside is that in > this > > > > method > > > > > > we instantiate the HdfsConfiguration object which requires me to > > add > > > > the > > > > > > hadoop-hdfs-client as a provided dependency. > > > > > > > > > > > > My question for the more experienced folks - would this cause > > issues > > > > for > > > > > > the users? Could we assume that if the hadoop-common is on the > > > > classpath > > > > > > then hadoop-hdfs-client is on the classpath as well? Do you see > > other > > > > > > possible drawbacks or issues with my approach? > > > > > > > > > > > > Thanks, > > > > > > Peter > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-29754 > > > > > > [2] https://github.com/apache/flink/pull/21148 > > > > > > > > > > > > > > > > > > -- > > > Martijn > > > https://twitter.com/MartijnVisser82 > > > https://github.com/MartijnVisser > > > > > >