Hi,

"As for the pipeline update feature, we've long discussed
having "pick-your-implementation" transforms that specify
alternative, equivalent implementations."

Could someone point me to where this was discussed please? I seem to have
missed that whole topic. Is it like a dependency injection type of thing?
If so, it's one thing I would love to see in Beam.

Thanks,
Cristian

On Mon, Dec 12, 2022 at 4:23 PM Robert Bradshaw via dev <dev@beam.apache.org>
wrote:

> Saving up all the breaking changes until a major release definitely
> has its downsides (look at Python 3). The migration path is often as
> important (if not more so) than the final destination.
>
> As for this particular change, I would question how the benefit (it's
> unclear what the exact benefit is--better internal organization?)
> exceeds the pain of making every user refactor their code. I think a
> stronger case can be made for things like the Avro dependency that
> cause real pain.
>
> As for the pipeline update feature, we've long discussed having
> "pick-your-implementation" transforms that specify alternative,
> equivalent implementations. Upgrades can choose the old one whereas
> new pipelines can get the latest and greatest. It won't solve all
> issues, and requires keeping old codepaths around, but could be an
> important step forward.
>
> On Mon, Dec 12, 2022 at 10:20 AM Kenneth Knowles <k...@apache.org> wrote:
> >
> > I agree with Mortiz. To answer a few specifics in my own words:
> >
> >  - It is a perfectly sensible refactor, but as a counterpoint without
> file-based IO the SDK isn't functional so it is also a reasonable design
> point to have this included. There are other things in the core SDK that
> are far less "core" and could be moved out with greater benefit. The main
> goal for any separation of modules would be lighter weight transitive
> dependencies, IMO.
> >
> >  - No, Beam has not made any deliberate breaking changes of this nature.
> Hence we are still on major version 2. We have made some bugfixes for data
> loss risks that could be called "breaking changes" but since the feature
> was unsafe to use in the first place we did not bump the major version.
> >
> >  - It is sometimes possible to do such a refactor and have the
> deprecated location proxy to the new location. In this case that seems hard
> to achieve.
> >
> >  - It is not actually necessary to maintain both locations, as we can
> declare the old location will be unmaintained (but left alone) and all new
> development goes to the new location. That isn't a great choice for users
> who may simply upgrade their SDK version and not notice that their old code
> is now pointing at a version that will not receive e.g. security updates.
> >
> >  - I like the style where if/when we transition from Beam 2 to Beam 3 we
> should have the exact functionality of Beam 3 available as an opt-in flag
> first. So if a user passes --beam-3 they get exactly what will be the
> default functionality when we bump the major version. It really is a
> problem to do a whole bunch of stuff feverishly before a major version
> bump. The other style that I think works well is the linux kernel style
> where major versions alternate between stable and unstable (in other words,
> returning to the 0.x style with every alternating version).
> >
> >  - I do think Beam suffers from fear and inability to do significant
> code gardening. I don't think backwards compatibility in the code sense is
> the biggest blocker. I think the "pipeline update" feature is perhaps the
> thing most holding Beam back from making radical rapid forward progress.
> >
> > Kenn
> >
> > On Mon, Dec 12, 2022 at 2:25 AM Moritz Mack <mm...@talend.com> wrote:
> >>
> >> Hi Damon,
> >>
> >>
> >>
> >> I fear the current release / versioning strategy of Beam doesn’t lend
> itself well for such breaking changes. Alexey and I have spent quite some
> time discussing how to proceed with the problematic Avro dependency in core
> (and respectively AvroIO, of course).
> >>
> >> Such changes essentially always require duplicating code to continue
> supporting a deprecated legacy code path to not break users’ code. But this
> comes at a very high price. Until the deprecated code path can be finally
> removed again, it must be maintained in two places.
> >>
> >> Unfortunately, the removal of deprecated code is rather problematic
> without a major version release as it would break semantic versioning and
> people’s expectations. With that deprecations bear the inherent risk to
> unintentionally deplete quality rather than improving it.
> >>
> >> I’d therefore recommend against such efforts unless there’s very strong
> reasons to do so.
> >>
> >>
> >>
> >> Best, Moritz
> >>
> >>
> >>
> >> On 07.12.22, 18:05, "Damon Douglas via dev" <dev@beam.apache.org>
> wrote:
> >>
> >>
> >>
> >> Hello Everyone, If you identify yourself on the Beam learning journey,
> even if this is your first day, please see yourself as a welcome
> participant in this conversation and consider reviewing the bottom portion
> of this email for guidance. The
> >>
> >> Hello Everyone,
> >>
> >>
> >>
> >> If you identify yourself on the Beam learning journey, even if this is
> your first day, please see yourself as a welcome participant in this
> conversation and consider reviewing the bottom portion of this email for
> guidance.
> >>
> >>
> >>
> >> The Short Version (For those with Java Beam SDK knowledge):
> >>
> >>
> >>
> >> Should we migrate FileIO / TextIO and related classes from
> :sdks:java:core to :sdks:java:io:file?  If so, should we target such a
> migration to a future Beam version with repeated announcements?  Does the
> Beam repository have any example of a similar change in the past?  What
> learnings from said past change could be potentially applied to this one?
> >>
> >>
> >>
> >> The Long Version (For those on the learning path):
> >>
> >>
> >>
> >> This email is more about our repository organization rather than Beam.
> The proposal is to move two highly used classes (and anything related) in
> our Java SDK called FileIO [1] and TextIO [2].  The Beam GitHub repository
> uses a software called gradle [3], to automate routine code tasks such as
> build and test.  Gradle projects, such as Beam, organize code in what are
> called modules [4].  The three main ingredients that make a module are 1) a
> unique directory path, 2) a file called build.gradle (or build.gradle.kts)
> in this directory, 3) referencing the gradle module in a settings.gradle
> (or settings.gradle.kts) file at the root of the repository.
> >>
> >>
> >>
> >> The gradle documentation discusses why such organization might matter
> and how to achieve this with large projects [5].  Essentially, modules
> allow us to have mini-projects inside our large project and focus related
> automations to this one focused portion of our larger repository.  In Beam,
> we have the module :sdks:java:core [6] with all things related to the core
> of Beam, whereas we have separate modules related to reading from and
> writing to various resources within :sdks:java:io [7].
> >>
> >>
> >>
> >> The proposal suggests moving the aforementioned file reading and
> writing classes, FileIO and TextIO, and anything related, to its own
> :sdks:java:io:file module.  This would correspond to a new
> sdks/java/io/file directory and moving these classes into
> sdks/java/io/file/main/java/org/apache/beam/sdk/io/file.
> >>
> >>
> >>
> >> Definitions / References:
> >>
> >>
> >>
> >> 1. FileIO - a General-purpose transforms for working with files:
> listing files (matching), reading and writing.  See -
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.html
> >>
> >>
> >>
> >> 2. TextIO - Similar to FileIO but focused on text files.  See
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TextIO.html
> >>
> >>
> >>
> >> 3. Gradle - a build automation tool used by the Apache Beam repository
> to automate code-related tasks.  See
> https://docs.gradle.org/current/userguide/what_is_gradle.html
> >>
> >>
> >>
> >> 4. Gradle Module - a subsection of your larger repository.  See
> https://docs.gradle.org/current/userguide/dependency_management_terminology.html#sub:terminology_module
> >>
> >>
> >>
> >> 5. Structuring Large Projects with Gradle -
> https://docs.gradle.org/current/userguide/structuring_software_products.html
> >>
> >>
> >>
> >> 6. sdks:java:core - Corresponds to the sdks/java/core repository
> directory. See https://github.com/apache/beam/tree/master/sdks/java/core
> >>
> >>
> >>
> >> 7. sdks:java:io - Corresponds to the sdks/java/io repository
> directory.  See https://github.com/apache/beam/tree/master/sdks/java/io
> >>
> >>
> >>
> >> Best,
> >>
> >>
> >>
> >> Damon
> >>
> >>
> >>
> >> As a recipient of an email from Talend, your contact personal data will
> be on our systems. Please see our privacy notice.
> >>
> >>
> >>
>

Reply via email to