Awesome. : ) Thanks, Robert for signing up to be the reviewer. I will create Jira and share the link here.
Stay safe. - Sivaprasanna On Thu, May 28, 2020 at 12:13 PM Robert Metzger <rmetz...@apache.org> wrote: > Hi Sivaprasanna, > > thanks a lot for your proposal. Now that I ran into a HadoopUtils-related > issue myself [1] I see the benefit in this proposal. > > I'm happy to be the Flink committer that mentors this change. If we do > this, I would like to have a small scope for the initial change: > - create a "flink-hadoop-utils" module > - move generic, common utils into that module (for example > SerializableHadoopConfiguration) > > I agree with Till that we should initially leave out the Hadoop > compatibility modules. > > You can go ahead with filing a JIRA ticket! Let's discuss the exact scope > there. > > > [1] https://github.com/apache/flink/pull/12146 > > > On Thu, Apr 30, 2020 at 6:54 PM Sivaprasanna <sivaprasanna...@gmail.com> > wrote: > > > Bump. > > > > Please let me know, if someone is interested in reviewing this one. I am > > willing to start working on this. BTW, a small and new addition to the > > list: With FLINK-10114 merged, OrcBulkWriterFactory can also reuse > > `SerializableHadoopConfiguration` along with SequenceFileWriterFactory > and > > CompressWriterFactory. > > > > CC - Kostas Kloudas since he has a better understanding on the > > `SerializableHadoopConfiguration.` > > > > Cheers, > > Sivaprasanna > > > > On Mon, Mar 30, 2020 at 3:17 PM Chesnay Schepler <ches...@apache.org> > > wrote: > > > > > I would recommend to wait until a committer has signed up for reviewing > > > your changes before preparing any PR. > > > Otherwise the chances are high that you invest a lot of time but the > > > changes never get in. > > > > > > On 30/03/2020 11:42, Sivaprasanna wrote: > > > > Hello Till, > > > > > > > > I agree with having the scope limited and more concentrated. I can > > file a > > > > Jira and get started with the code changes, as and when someone has > > some > > > > bandwidth, the review can also be done. What do you think? > > > > > > > > Cheers, > > > > Sivaprasanna > > > > > > > > On Mon, Mar 30, 2020 at 3:00 PM Till Rohrmann <trohrm...@apache.org> > > > wrote: > > > > > > > >> Hi Sivaprasanna, > > > >> > > > >> thanks for starting this discussion. In general I like the idea to > > > remove > > > >> duplications and move common code to a shared module. As a > > > recommendation, > > > >> I would exclude the whole part about Flink's Hadoop compatibility > > > modules > > > >> because they are legacy code and hardly used anymore. This would > also > > > have > > > >> the benefit of making the scope of the proposal a bit smaller. > > > >> > > > >> What we now need is a committer who wants to help with this effort. > It > > > >> might be that this takes a bit of time as many of the committers are > > > quite > > > >> busy. > > > >> > > > >> Cheers, > > > >> Till > > > >> > > > >> On Thu, Mar 19, 2020 at 2:15 PM Sivaprasanna < > > sivaprasanna...@gmail.com > > > > > > > >> wrote: > > > >> > > > >>> Hi, > > > >>> > > > >>> Continuing on an earlier discussion[1] regarding having a separate > > > module > > > >>> for Hadoop related utility components, I have gone through our > > project > > > >>> briefly and found the following components which I feel could be > > moved > > > >> to a > > > >>> separate module for reusability, and better module structure. > > > >>> > > > >>> Module Name Class Name Used at / Remarks > > > >>> > > > >>> flink-hadoop-fs > > > >>> flink.runtime.util.HadoopUtils > > > >>> flink-runtime => HadoopModule & HadoopModuleFactory > > > >>> flink-swift-fs-hadoop => SwiftFileSystemFactory > > > >>> flink-yarn => Utils, YarnClusterDescriptor > > > >>> > > > >>> flink-hadoop-compatability > > > >>> api.java.hadoop.mapred.utils.HadoopUtils > > > >>> Both belong to the same module but with different packages > > > >>> (api.java.hadoop.mapred and api.java.hadoop.mapreduce) > > > >>> api.java.hadoop.mapreduce.utils.HadoopUtils > > > >>> flink-sequeunce-file > > > >>> formats.sequeuncefile.SerializableHadoopConfiguration Currently, > > > >>> it is used at formats.sequencefile.SequenceFileWriterFactory but > can > > > also > > > >>> be used at HadoopCompressionBulkWriter, a potential OrcBulkWriter > and > > > >>> pretty much everywhere to avoid NotSerializableException. > > > >>> > > > >>> *Proposal* > > > >>> To summarise, I believe we can create a new module > > (flink-hadoop-utils > > > ?) > > > >>> and move these reusable components to this new module which will > have > > > an > > > >>> optional/provided dependency on flink-shaded-hadoop-2. > > > >>> > > > >>> *Structure* > > > >>> In the present form, I think we will have two classes with the > > > packaging > > > >>> structure being *org.apache.flink.hadoop.[utils/serialization]* > > > >>> 1. HadoopUtils with all static methods ( after combining and > > > eliminating > > > >>> the duplicate code fragments from the three HadoopUtils classes > > > mentioned > > > >>> above) > > > >>> 2. Move the existing SerializableHadoopConfiguration from the > > > >>> flink-sequence-file to this new module . > > > >>> > > > >>> *Justification* > > > >>> * With this change, we would be stripping away the dependency on > > > >>> flink-hadoop-fs from flink-runtime as I don't see any other classes > > > from > > > >>> flink-hadoop-fs is being used anywhere in flink-runtime module. > > > >>> * We will have a common place where all the utilities related to > > Hadoop > > > >> can > > > >>> go which can be reused easily without leading to jar hell. > > > >>> > > > >>> In addition to this, if you are aware of any other classes that fit > > in > > > >> this > > > >>> approach, please share the details here. > > > >>> > > > >>> *Note* > > > >>> I don't have a complete understanding here but I did see two > > > >>> implementations of the following classes under two different > packages > > > >>> *.mapred and *.mapreduce. > > > >>> * HadoopInputFormat > > > >>> * HadoopInputFormatBase > > > >>> * HadoopOutputFormat > > > >>> * HadoopOutputFormatBase > > > >>> > > > >>> Can we somehow figure and have them in this new module? > > > >>> > > > >>> Thanks, > > > >>> Sivaprasanna > > > >>> > > > >>> [1] > > > >>> > > > >>> > > > >> > > > > > > https://lists.apache.org/thread.html/r198f09496ba46885adbcc41fe778a7a34ad1cd685eeae8beb71e6fbb%40%3Cdev.flink.apache.org%3E > > > > > > > > > > > >