I agree with Sachin. Keeping components that users will have to bring
together anyway leads to a better user experience. Counter example to that
is GCP libraries in my opinion. It was a frequent struggle for users to
find a working set of libraries until there was a BOM. And even after the
BOM it is still somewhat of a struggle for users and the developers of
those various libraries need to take on some of the toil of testing those
various libraries together anyway.

re: Talk it with a grain of salt since I'm not even a committer - All
inputs are welcome here. I do not think my comments should carry more
weight just because I am a committer.

On Wed, Dec 14, 2022 at 9:36 AM Sachin Agarwal via dev <dev@beam.apache.org>
wrote:

> I strongly believe that we should continue to have Beam optimize for the
> user - and while having separate components would allow those of us who are
> contributors and committers move faster, the downsides of not having
> everything "in one box" for a new user where the components are all
> relatively guaranteed to work together at that version level are very high.
>
> Beam having everything included is absolutely a competitive advantage for
> Beam and I would not want to lose that.
>
> On Wed, Dec 14, 2022 at 9:31 AM Byron Ellis via dev <dev@beam.apache.org>
> wrote:
>
>> Talk it with a grain of salt since I'm not even a committer, but is
>> perhaps the reorganization of Beam into smaller components the real work of
>> a 3.0 effort? Splitting of Beam into smaller more independently managed
>> components would be a pretty huge breaking change from a dependency
>> management perspective which would potentially be largely separate from any
>> code changes.
>>
>> Best,
>> B
>>
>> On Wed, Dec 14, 2022 at 9:23 AM Alexey Romanenko <
>> aromanenko....@gmail.com> wrote:
>>
>>> On 12 Dec 2022, at 22:23, Robert Bradshaw via dev <dev@beam.apache.org>
>>> wrote:
>>>
>>>
>>> Saving up all the breaking changes until a major release definitely
>>> has its downsides (look at Python 3). The migration path is often as
>>> important (if not more so) than the final destination.
>>>
>>>
>>> Actually, it proves that the major releases *should not* be delayed for
>>> a long period of time and *should* be issued more often to reduce the
>>> number of breaking changes (that, of course, likely may happen). That will
>>> help users to do much more smooth and less risky upgrades, and developers
>>> to not keep burden forever. Beam 2.0.0 was released back in may 2017 and
>>> we've almost never talked about Beam 3.0 and what are the criteria for it.
>>> I understand that it’s a completely different discussion but seems that
>>> this time has come =)
>>>
>>> As for this particular change, I would question how the benefit (it's
>>> unclear what the exact benefit is--better internal organization?)
>>> exceeds the pain of making every user refactor their code. I think a
>>> stronger case can be made for things like the Avro dependency that
>>> cause real pain.
>>>
>>>
>>> Agree. I think that if it doesn’t bring any pain with additional
>>> external dependecies and this code is used in almost every other SDK
>>> module, then there are no reasons for such breaking changes. On the other
>>> hand, Avro case, that you mentioned above, is a good example why sometimes
>>> it would be better to keep such code outside of “core”.
>>>
>>> As for the pipeline update feature, we've long discussed having
>>> "pick-your-implementation" transforms that specify alternative,
>>> equivalent implementations. Upgrades can choose the old one whereas
>>> new pipelines can get the latest and greatest. It won't solve all
>>> issues, and requires keeping old codepaths around, but could be an
>>> important step forward.
>>>
>>> On Mon, Dec 12, 2022 at 10:20 AM Kenneth Knowles <k...@apache.org>
>>> wrote:
>>>
>>>
>>> I agree with Mortiz. To answer a few specifics in my own words:
>>>
>>> - It is a perfectly sensible refactor, but as a counterpoint without
>>> file-based IO the SDK isn't functional so it is also a reasonable design
>>> point to have this included. There are other things in the core SDK that
>>> are far less "core" and could be moved out with greater benefit. The main
>>> goal for any separation of modules would be lighter weight transitive
>>> dependencies, IMO.
>>>
>>> - No, Beam has not made any deliberate breaking changes of this nature.
>>> Hence we are still on major version 2. We have made some bugfixes for data
>>> loss risks that could be called "breaking changes" but since the feature
>>> was unsafe to use in the first place we did not bump the major version.
>>>
>>> - It is sometimes possible to do such a refactor and have the deprecated
>>> location proxy to the new location. In this case that seems hard to achieve.
>>>
>>> - It is not actually necessary to maintain both locations, as we can
>>> declare the old location will be unmaintained (but left alone) and all new
>>> development goes to the new location. That isn't a great choice for users
>>> who may simply upgrade their SDK version and not notice that their old code
>>> is now pointing at a version that will not receive e.g. security updates.
>>>
>>> - I like the style where if/when we transition from Beam 2 to Beam 3 we
>>> should have the exact functionality of Beam 3 available as an opt-in flag
>>> first. So if a user passes --beam-3 they get exactly what will be the
>>> default functionality when we bump the major version. It really is a
>>> problem to do a whole bunch of stuff feverishly before a major version
>>> bump. The other style that I think works well is the linux kernel style
>>> where major versions alternate between stable and unstable (in other words,
>>> returning to the 0.x style with every alternating version).
>>>
>>> - I do think Beam suffers from fear and inability to do significant code
>>> gardening. I don't think backwards compatibility in the code sense is the
>>> biggest blocker. I think the "pipeline update" feature is perhaps the thing
>>> most holding Beam back from making radical rapid forward progress.
>>>
>>> Kenn
>>>
>>> On Mon, Dec 12, 2022 at 2:25 AM Moritz Mack <mm...@talend.com> wrote:
>>>
>>>
>>> Hi Damon,
>>>
>>>
>>>
>>> I fear the current release / versioning strategy of Beam doesn’t lend
>>> itself well for such breaking changes. Alexey and I have spent quite some
>>> time discussing how to proceed with the problematic Avro dependency in core
>>> (and respectively AvroIO, of course).
>>>
>>> Such changes essentially always require duplicating code to continue
>>> supporting a deprecated legacy code path to not break users’ code. But this
>>> comes at a very high price. Until the deprecated code path can be finally
>>> removed again, it must be maintained in two places.
>>>
>>> Unfortunately, the removal of deprecated code is rather problematic
>>> without a major version release as it would break semantic versioning and
>>> people’s expectations. With that deprecations bear the inherent risk to
>>> unintentionally deplete quality rather than improving it.
>>>
>>> I’d therefore recommend against such efforts unless there’s very strong
>>> reasons to do so.
>>>
>>>
>>>
>>> Best, Moritz
>>>
>>>
>>>
>>> On 07.12.22, 18:05, "Damon Douglas via dev" <dev@beam.apache.org> wrote:
>>>
>>>
>>>
>>> Hello Everyone, If you identify yourself on the Beam learning journey,
>>> even if this is your first day, please see yourself as a welcome
>>> participant in this conversation and consider reviewing the bottom portion
>>> of this email for guidance. The
>>>
>>> Hello Everyone,
>>>
>>>
>>>
>>> If you identify yourself on the Beam learning journey, even if this is
>>> your first day, please see yourself as a welcome participant in this
>>> conversation and consider reviewing the bottom portion of this email for
>>> guidance.
>>>
>>>
>>>
>>> The Short Version (For those with Java Beam SDK knowledge):
>>>
>>>
>>>
>>> Should we migrate FileIO / TextIO and related classes from
>>> :sdks:java:core to :sdks:java:io:file?  If so, should we target such a
>>> migration to a future Beam version with repeated announcements?  Does the
>>> Beam repository have any example of a similar change in the past?  What
>>> learnings from said past change could be potentially applied to this one?
>>>
>>>
>>>
>>> The Long Version (For those on the learning path):
>>>
>>>
>>>
>>> This email is more about our repository organization rather than Beam.
>>> The proposal is to move two highly used classes (and anything related) in
>>> our Java SDK called FileIO [1] and TextIO [2].  The Beam GitHub repository
>>> uses a software called gradle [3], to automate routine code tasks such as
>>> build and test.  Gradle projects, such as Beam, organize code in what are
>>> called modules [4].  The three main ingredients that make a module are 1) a
>>> unique directory path, 2) a file called build.gradle (or build.gradle.kts)
>>> in this directory, 3) referencing the gradle module in a settings.gradle
>>> (or settings.gradle.kts) file at the root of the repository.
>>>
>>>
>>>
>>> The gradle documentation discusses why such organization might matter
>>> and how to achieve this with large projects [5].  Essentially, modules
>>> allow us to have mini-projects inside our large project and focus related
>>> automations to this one focused portion of our larger repository.  In Beam,
>>> we have the module :sdks:java:core [6] with all things related to the core
>>> of Beam, whereas we have separate modules related to reading from and
>>> writing to various resources within :sdks:java:io [7].
>>>
>>>
>>>
>>> The proposal suggests moving the aforementioned file reading and writing
>>> classes, FileIO and TextIO, and anything related, to its own
>>> :sdks:java:io:file module.  This would correspond to a new
>>> sdks/java/io/file directory and moving these classes into
>>> sdks/java/io/file/main/java/org/apache/beam/sdk/io/file.
>>>
>>>
>>>
>>> Definitions / References:
>>>
>>>
>>>
>>> 1. FileIO - a General-purpose transforms for working with files: listing
>>> files (matching), reading and writing.  See -
>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.html
>>>
>>>
>>>
>>> 2. TextIO - Similar to FileIO but focused on text files.  See
>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TextIO.html
>>>
>>>
>>>
>>> 3. Gradle - a build automation tool used by the Apache Beam repository
>>> to automate code-related tasks.  See
>>> https://docs.gradle.org/current/userguide/what_is_gradle.html
>>>
>>>
>>>
>>> 4. Gradle Module - a subsection of your larger repository.  See
>>> https://docs.gradle.org/current/userguide/dependency_management_terminology.html#sub:terminology_module
>>>
>>>
>>>
>>> 5. Structuring Large Projects with Gradle -
>>> https://docs.gradle.org/current/userguide/structuring_software_products.html
>>>
>>>
>>>
>>> 6. sdks:java:core - Corresponds to the sdks/java/core repository
>>> directory. See https://github.com/apache/beam/tree/master/sdks/java/core
>>>
>>>
>>>
>>> 7. sdks:java:io - Corresponds to the sdks/java/io repository directory.
>>> See https://github.com/apache/beam/tree/master/sdks/java/io
>>>
>>>
>>>
>>> Best,
>>>
>>>
>>>
>>> Damon
>>>
>>>
>>>
>>> As a recipient of an email from Talend, your contact personal data will
>>> be on our systems. Please see our privacy notice.
>>>
>>>
>>>
>>>
>>>

Reply via email to