On 12 Dec 2022, at 22:23, Robert Bradshaw via dev <dev@beam.apache.org> wrote:
> 
> Saving up all the breaking changes until a major release definitely
> has its downsides (look at Python 3). The migration path is often as
> important (if not more so) than the final destination.

Actually, it proves that the major releases should not be delayed for a long 
period of time and should be issued more often to reduce the number of breaking 
changes (that, of course, likely may happen). That will help users to do much 
more smooth and less risky upgrades, and developers to not keep burden forever. 
Beam 2.0.0 was released back in may 2017 and we've almost never talked about 
Beam 3.0 and what are the criteria for it. I understand that it’s a completely 
different discussion but seems that this time has come =)

> As for this particular change, I would question how the benefit (it's
> unclear what the exact benefit is--better internal organization?)
> exceeds the pain of making every user refactor their code. I think a
> stronger case can be made for things like the Avro dependency that
> cause real pain.

Agree. I think that if it doesn’t bring any pain with additional external 
dependecies and this code is used in almost every other SDK module, then there 
are no reasons for such breaking changes. On the other hand, Avro case, that 
you mentioned above, is a good example why sometimes it would be better to keep 
such code outside of “core”.

> As for the pipeline update feature, we've long discussed having
> "pick-your-implementation" transforms that specify alternative,
> equivalent implementations. Upgrades can choose the old one whereas
> new pipelines can get the latest and greatest. It won't solve all
> issues, and requires keeping old codepaths around, but could be an
> important step forward.
> 
> On Mon, Dec 12, 2022 at 10:20 AM Kenneth Knowles <k...@apache.org> wrote:
>> 
>> I agree with Mortiz. To answer a few specifics in my own words:
>> 
>> - It is a perfectly sensible refactor, but as a counterpoint without 
>> file-based IO the SDK isn't functional so it is also a reasonable design 
>> point to have this included. There are other things in the core SDK that are 
>> far less "core" and could be moved out with greater benefit. The main goal 
>> for any separation of modules would be lighter weight transitive 
>> dependencies, IMO.
>> 
>> - No, Beam has not made any deliberate breaking changes of this nature. 
>> Hence we are still on major version 2. We have made some bugfixes for data 
>> loss risks that could be called "breaking changes" but since the feature was 
>> unsafe to use in the first place we did not bump the major version.
>> 
>> - It is sometimes possible to do such a refactor and have the deprecated 
>> location proxy to the new location. In this case that seems hard to achieve.
>> 
>> - It is not actually necessary to maintain both locations, as we can declare 
>> the old location will be unmaintained (but left alone) and all new 
>> development goes to the new location. That isn't a great choice for users 
>> who may simply upgrade their SDK version and not notice that their old code 
>> is now pointing at a version that will not receive e.g. security updates.
>> 
>> - I like the style where if/when we transition from Beam 2 to Beam 3 we 
>> should have the exact functionality of Beam 3 available as an opt-in flag 
>> first. So if a user passes --beam-3 they get exactly what will be the 
>> default functionality when we bump the major version. It really is a problem 
>> to do a whole bunch of stuff feverishly before a major version bump. The 
>> other style that I think works well is the linux kernel style where major 
>> versions alternate between stable and unstable (in other words, returning to 
>> the 0.x style with every alternating version).
>> 
>> - I do think Beam suffers from fear and inability to do significant code 
>> gardening. I don't think backwards compatibility in the code sense is the 
>> biggest blocker. I think the "pipeline update" feature is perhaps the thing 
>> most holding Beam back from making radical rapid forward progress.
>> 
>> Kenn
>> 
>> On Mon, Dec 12, 2022 at 2:25 AM Moritz Mack <mm...@talend.com> wrote:
>>> 
>>> Hi Damon,
>>> 
>>> 
>>> 
>>> I fear the current release / versioning strategy of Beam doesn’t lend 
>>> itself well for such breaking changes. Alexey and I have spent quite some 
>>> time discussing how to proceed with the problematic Avro dependency in core 
>>> (and respectively AvroIO, of course).
>>> 
>>> Such changes essentially always require duplicating code to continue 
>>> supporting a deprecated legacy code path to not break users’ code. But this 
>>> comes at a very high price. Until the deprecated code path can be finally 
>>> removed again, it must be maintained in two places.
>>> 
>>> Unfortunately, the removal of deprecated code is rather problematic without 
>>> a major version release as it would break semantic versioning and people’s 
>>> expectations. With that deprecations bear the inherent risk to 
>>> unintentionally deplete quality rather than improving it.
>>> 
>>> I’d therefore recommend against such efforts unless there’s very strong 
>>> reasons to do so.
>>> 
>>> 
>>> 
>>> Best, Moritz
>>> 
>>> 
>>> 
>>> On 07.12.22, 18:05, "Damon Douglas via dev" <dev@beam.apache.org> wrote:
>>> 
>>> 
>>> 
>>> Hello Everyone, If you identify yourself on the Beam learning journey, even 
>>> if this is your first day, please see yourself as a welcome participant in 
>>> this conversation and consider reviewing the bottom portion of this email 
>>> for guidance. The
>>> 
>>> Hello Everyone,
>>> 
>>> 
>>> 
>>> If you identify yourself on the Beam learning journey, even if this is your 
>>> first day, please see yourself as a welcome participant in this 
>>> conversation and consider reviewing the bottom portion of this email for 
>>> guidance.
>>> 
>>> 
>>> 
>>> The Short Version (For those with Java Beam SDK knowledge):
>>> 
>>> 
>>> 
>>> Should we migrate FileIO / TextIO and related classes from :sdks:java:core 
>>> to :sdks:java:io:file?  If so, should we target such a migration to a 
>>> future Beam version with repeated announcements?  Does the Beam repository 
>>> have any example of a similar change in the past?  What learnings from said 
>>> past change could be potentially applied to this one?
>>> 
>>> 
>>> 
>>> The Long Version (For those on the learning path):
>>> 
>>> 
>>> 
>>> This email is more about our repository organization rather than Beam.  The 
>>> proposal is to move two highly used classes (and anything related) in our 
>>> Java SDK called FileIO [1] and TextIO [2].  The Beam GitHub repository uses 
>>> a software called gradle [3], to automate routine code tasks such as build 
>>> and test.  Gradle projects, such as Beam, organize code in what are called 
>>> modules [4].  The three main ingredients that make a module are 1) a unique 
>>> directory path, 2) a file called build.gradle (or build.gradle.kts) in this 
>>> directory, 3) referencing the gradle module in a settings.gradle (or 
>>> settings.gradle.kts) file at the root of the repository.
>>> 
>>> 
>>> 
>>> The gradle documentation discusses why such organization might matter and 
>>> how to achieve this with large projects [5].  Essentially, modules allow us 
>>> to have mini-projects inside our large project and focus related 
>>> automations to this one focused portion of our larger repository.  In Beam, 
>>> we have the module :sdks:java:core [6] with all things related to the core 
>>> of Beam, whereas we have separate modules related to reading from and 
>>> writing to various resources within :sdks:java:io [7].
>>> 
>>> 
>>> 
>>> The proposal suggests moving the aforementioned file reading and writing 
>>> classes, FileIO and TextIO, and anything related, to its own 
>>> :sdks:java:io:file module.  This would correspond to a new 
>>> sdks/java/io/file directory and moving these classes into 
>>> sdks/java/io/file/main/java/org/apache/beam/sdk/io/file.
>>> 
>>> 
>>> 
>>> Definitions / References:
>>> 
>>> 
>>> 
>>> 1. FileIO - a General-purpose transforms for working with files: listing 
>>> files (matching), reading and writing.  See - 
>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.html
>>> 
>>> 
>>> 
>>> 2. TextIO - Similar to FileIO but focused on text files.  See 
>>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TextIO.html
>>> 
>>> 
>>> 
>>> 3. Gradle - a build automation tool used by the Apache Beam repository to 
>>> automate code-related tasks.  See 
>>> https://docs.gradle.org/current/userguide/what_is_gradle.html
>>> 
>>> 
>>> 
>>> 4. Gradle Module - a subsection of your larger repository.  See 
>>> https://docs.gradle.org/current/userguide/dependency_management_terminology.html#sub:terminology_module
>>> 
>>> 
>>> 
>>> 5. Structuring Large Projects with Gradle - 
>>> https://docs.gradle.org/current/userguide/structuring_software_products.html
>>> 
>>> 
>>> 
>>> 6. sdks:java:core - Corresponds to the sdks/java/core repository directory. 
>>> See https://github.com/apache/beam/tree/master/sdks/java/core
>>> 
>>> 
>>> 
>>> 7. sdks:java:io - Corresponds to the sdks/java/io repository directory.  
>>> See https://github.com/apache/beam/tree/master/sdks/java/io
>>> 
>>> 
>>> 
>>> Best,
>>> 
>>> 
>>> 
>>> Damon
>>> 
>>> 
>>> 
>>> As a recipient of an email from Talend, your contact personal data will be 
>>> on our systems. Please see our privacy notice.
>>> 
>>> 
>>> 

Reply via email to