I agree with Mortiz. To answer a few specifics in my own words: - It is a perfectly sensible refactor, but as a counterpoint without file-based IO the SDK isn't functional so it is also a reasonable design point to have this included. There are other things in the core SDK that are far less "core" and could be moved out with greater benefit. The main goal for any separation of modules would be lighter weight transitive dependencies, IMO.
- No, Beam has not made any deliberate breaking changes of this nature. Hence we are still on major version 2. We have made some bugfixes for data loss risks that could be called "breaking changes" but since the feature was unsafe to use in the first place we did not bump the major version. - It is sometimes possible to do such a refactor and have the deprecated location proxy to the new location. In this case that seems hard to achieve. - It is not actually necessary to maintain both locations, as we can declare the old location will be unmaintained (but left alone) and all new development goes to the new location. That isn't a great choice for users who may simply upgrade their SDK version and not notice that their old code is now pointing at a version that will not receive e.g. security updates. - I like the style where if/when we transition from Beam 2 to Beam 3 we should have the exact functionality of Beam 3 available as an opt-in flag first. So if a user passes --beam-3 they get exactly what will be the default functionality when we bump the major version. It really is a problem to do a whole bunch of stuff feverishly before a major version bump. The other style that I think works well is the linux kernel style where major versions alternate between stable and unstable (in other words, returning to the 0.x style with every alternating version). - I do think Beam suffers from fear and inability to do significant code gardening. I don't think backwards compatibility in the code sense is the biggest blocker. I think the "pipeline update" feature is perhaps the thing most holding Beam back from making radical rapid forward progress. Kenn On Mon, Dec 12, 2022 at 2:25 AM Moritz Mack <mm...@talend.com> wrote: > Hi Damon, > > > > I fear the current release / versioning strategy of Beam doesn’t lend > itself well for such breaking changes. Alexey and I have spent quite some > time discussing how to proceed with the problematic Avro dependency in core > (and respectively AvroIO, of course). > > Such changes essentially always require duplicating code to continue > supporting a deprecated legacy code path to not break users’ code. But this > comes at a very high price. Until the deprecated code path can be finally > removed again, it must be maintained in two places. > > Unfortunately, the removal of deprecated code is rather problematic > without a major version release as it would break semantic versioning and > people’s expectations. With that deprecations bear the inherent risk to > unintentionally deplete quality rather than improving it. > > I’d therefore recommend against such efforts unless there’s very strong > reasons to do so. > > > > Best, Moritz > > > > On 07.12.22, 18:05, "Damon Douglas via dev" <dev@beam.apache.org> wrote: > > > > Hello Everyone, If you identify yourself on the Beam learning journey, > even if this is your first day, please see yourself as a welcome > participant in this conversation and consider reviewing the bottom portion > of this email for guidance. The > > Hello Everyone, > > > > *If you identify yourself on the Beam learning journey, even if this is > your first day, please see yourself as a welcome participant in this > conversation and consider reviewing the bottom portion of this email for > guidance.* > > > > *The Short Version (For those with Java Beam SDK knowledge)*: > > > > Should we migrate FileIO / TextIO and related classes from :sdks:java:core > to :sdks:java:io:file? If so, should we target such a migration to a > future Beam version with repeated announcements? Does the Beam repository > have any example of a similar change in the past? What learnings from said > past change could be potentially applied to this one? > > > > *The Long Version (For those on the learning path)*: > > > > This email is more about our repository organization rather than Beam. > The proposal is to move two highly used classes (and anything related) in > our Java SDK called FileIO [1] and TextIO [2]. The Beam GitHub repository > uses a software called gradle [3], to automate routine code tasks such as > build and test. Gradle projects, such as Beam, organize code in what are > called modules [4]. The three main ingredients that make a module are 1) a > unique directory path, 2) a file called build.gradle (or build.gradle.kts) > in this directory, 3) referencing the gradle module in a settings.gradle > (or settings.gradle.kts) file at the root of the repository. > > > > The gradle documentation discusses why such organization might matter and > how to achieve this with large projects [5]. Essentially, modules allow us > to have mini-projects inside our large project and focus related > automations to this one focused portion of our larger repository. In Beam, > we have the module :sdks:java:core [6] with all things related to the core > of Beam, whereas we have separate modules related to reading from and > writing to various resources within :sdks:java:io [7]. > > > > The proposal suggests moving the aforementioned file reading and writing > classes, FileIO and TextIO, and anything related, to its own > :sdks:java:io:file module. This would correspond to a new > sdks/java/io/file directory and moving these classes into > sdks/java/io/file/main/java/org/apache/beam/sdk/io/file. > > > > *Definitions / References*: > > > > 1. FileIO - a General-purpose transforms for working with files: listing > files (matching), reading and writing. See - > https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.html > <https://urldefense.com/v3/__https:/beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.html__;!!CiXD_PY!VpiEtZfX43WKYrHgfxds2YmEAnz7H5eFbfvfOW7HQX8htQHFxJkvwJ2PoXmas4i_j40TKRAO322f$> > > > > 2. TextIO - Similar to FileIO but focused on text files. See > https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TextIO.html > <https://urldefense.com/v3/__https:/beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TextIO.html__;!!CiXD_PY!VpiEtZfX43WKYrHgfxds2YmEAnz7H5eFbfvfOW7HQX8htQHFxJkvwJ2PoXmas4i_j40TKdJr8h_h$> > > > > 3. Gradle - a build automation tool used by the Apache Beam repository to > automate code-related tasks. See > https://docs.gradle.org/current/userguide/what_is_gradle.html > <https://urldefense.com/v3/__https:/docs.gradle.org/current/userguide/what_is_gradle.html__;!!CiXD_PY!VpiEtZfX43WKYrHgfxds2YmEAnz7H5eFbfvfOW7HQX8htQHFxJkvwJ2PoXmas4i_j40TKfpKrYIT$> > > > > 4. Gradle Module - a subsection of your larger repository. See > https://docs.gradle.org/current/userguide/dependency_management_terminology.html#sub:terminology_module > <https://urldefense.com/v3/__https:/docs.gradle.org/current/userguide/dependency_management_terminology.html*sub:terminology_module__;Iw!!CiXD_PY!VpiEtZfX43WKYrHgfxds2YmEAnz7H5eFbfvfOW7HQX8htQHFxJkvwJ2PoXmas4i_j40TKa_7kemk$> > > > > 5. Structuring Large Projects with Gradle - > https://docs.gradle.org/current/userguide/structuring_software_products.html > <https://urldefense.com/v3/__https:/docs.gradle.org/current/userguide/structuring_software_products.html__;!!CiXD_PY!VpiEtZfX43WKYrHgfxds2YmEAnz7H5eFbfvfOW7HQX8htQHFxJkvwJ2PoXmas4i_j40TKbcu5E4h$> > > > > 6. sdks:java:core - Corresponds to the sdks/java/core repository > directory. See https://github.com/apache/beam/tree/master/sdks/java/core > <https://urldefense.com/v3/__https:/github.com/apache/beam/tree/master/sdks/java/core__;!!CiXD_PY!VpiEtZfX43WKYrHgfxds2YmEAnz7H5eFbfvfOW7HQX8htQHFxJkvwJ2PoXmas4i_j40TKW9ef-FT$> > > > > 7. sdks:java:io - Corresponds to the sdks/java/io repository directory. > See https://github.com/apache/beam/tree/master/sdks/java/io > <https://urldefense.com/v3/__https:/github.com/apache/beam/tree/master/sdks/java/io__;!!CiXD_PY!VpiEtZfX43WKYrHgfxds2YmEAnz7H5eFbfvfOW7HQX8htQHFxJkvwJ2PoXmas4i_j40TKQbRi8tr$> > > > > Best, > > > > Damon > > > > *As a recipient of an email from Talend, your contact personal data will > be on our systems. Please see our privacy notice. > <https://www.talend.com/privacy/>* > > >