In next release it will be still compatible because we keep module “hadoop-input-format” but we make it deprecated and propose to use it through module “hadoop-format” and proxy class HadoopFormatIO (or HadoopMapReduceFormatIO, whatever we name it) which will provide Write/Read functionality by using MapReduce InputFormat or OutputFormat classes. Then, in future releases after next one, we can drop “hadoop-input-format” since it was deprecated and we provided a time to move to new API. I think this is less painful way for user but most complicated for us if the final goal it to merge “hadoop-input-format” and “hadoop-output-format” together.
> On 7 Sep 2018, at 13:45, Robert Bradshaw <rober...@google.com> wrote: > > Agree about not impacting users. Perhaps I misread (3), isn't it fully > backwards compatible as well? > > On Fri, Sep 7, 2018 at 1:33 PM Jean-Baptiste Onofré <j...@nanthrax.net > <mailto:j...@nanthrax.net>> wrote: > Hi, > > in order to limit the impact for the existing users on Beam 2.x series, > I would go for (1). > > Regards > JB > > On 06/09/2018 17:24, Alexey Romanenko wrote: > > Hello everyone, > > > > I’d like to discuss the following topic (see below) with community since > > the optimal solution is not clear for me. > > > > There is Java IO module, called “/hadoop-input-format/”, which allows to > > use MapReduce InputFormat implementations to read data from different > > sources (for example, org.apache.hadoop.mapreduce.lib.db.DBInputFormat). > > According to its name, it has only “Read" and it's missing “Write” part, > > so, I'm working on “/hadoop-output-format/” to support MapReduce > > OutputFormat (PR 6306 <https://github.com/apache/beam/pull/6306 > > <https://github.com/apache/beam/pull/6306>>). For > > this I created another module with this name. So, in the end, we will > > have two different modules “/hadoop-input-format/” and > > “/hadoop-output-format/” and it looks quite strange for me since, afaik, > > every existed Java IO, that we have, incapsulates Read and Write parts > > into one module. Additionally, we have “/hadoop-common/” and > > /“hadoop-file-system/” as other hadoop-related modules. > > > > Now I’m thinking how it will be better to organise all these Hadoop > > modules better. There are several options in my mind: > > > > 1) Add new module “/hadoop-output-format/” and leave all Hadoop modules > > “as it is”. > > Pros: no breaking changes, no additional work > > Cons: not logical for users to have the same IO in two different modules > > and with different names. > > > > 2) Merge “/hadoop-input-format/” and “/hadoop-output-format/” into one > > module called, say, “/hadoop-format/” or “/hadoop-mapreduce-format/”, > > keep the other Hadoop modules “as it is”. > > Pros: to have InputFormat/OutputFormat in one IO module which is logical > > for users > > Cons: breaking changes for user code because of module/IO renaming > > > > 3) Add new module “/hadoop-format/” (or “/hadoop-mapreduce-format/”) > > which will include new “write” functionality and be a proxy for old > > “/hadoop-input-format/”. In its turn, “/hadoop-input-format/” should > > become deprecated and be finally moved to common “/hadoop-format/” > > module in future releases. Keep the other Hadoop modules “as it is”. > > Pros: finally it will be only one module for hadoop MR format; changes > > are less painful for user > > Cons: hidden difficulties of implementation this strategy; a bit > > confusing for user > > > > 4) Add new module “/hadoop/” and move all already existed modules there > > as submodules (like we have for “/io/google-cloud-platform/”), merge > > “/hadoop-input-format/” and “/hadoop-output-format/” into one module. > > Pros: unification of all hadoop-related modules > > Cons: breaking changes for user code, additional complexity with deps > > and testing > > > > 5) Your suggestion?.. > > > > My personal preferences are lying between 2 and 3 (if 3 is possible). > > > > I’m wondering if there were similar situations in Beam before and how it > > was finally resolved. If yes then probably we need to do here in similar > > way. > > Any suggestions/advices/comments would be very appreciated. > > > > Thanks, > > Alexey > > -- > Jean-Baptiste Onofré > jbono...@apache.org <mailto:jbono...@apache.org> > http://blog.nanthrax.net <http://blog.nanthrax.net/> > Talend - http://www.talend.com <http://www.talend.com/>