Saving up all the breaking changes until a major release definitely
has its downsides (look at Python 3). The migration path is often as
important (if not more so) than the final destination.

As for this particular change, I would question how the benefit (it's
unclear what the exact benefit is--better internal organization?)
exceeds the pain of making every user refactor their code. I think a
stronger case can be made for things like the Avro dependency that
cause real pain.

As for the pipeline update feature, we've long discussed having
"pick-your-implementation" transforms that specify alternative,
equivalent implementations. Upgrades can choose the old one whereas
new pipelines can get the latest and greatest. It won't solve all
issues, and requires keeping old codepaths around, but could be an
important step forward.

On Mon, Dec 12, 2022 at 10:20 AM Kenneth Knowles <k...@apache.org> wrote:
>
> I agree with Mortiz. To answer a few specifics in my own words:
>
>  - It is a perfectly sensible refactor, but as a counterpoint without 
> file-based IO the SDK isn't functional so it is also a reasonable design 
> point to have this included. There are other things in the core SDK that are 
> far less "core" and could be moved out with greater benefit. The main goal 
> for any separation of modules would be lighter weight transitive 
> dependencies, IMO.
>
>  - No, Beam has not made any deliberate breaking changes of this nature. 
> Hence we are still on major version 2. We have made some bugfixes for data 
> loss risks that could be called "breaking changes" but since the feature was 
> unsafe to use in the first place we did not bump the major version.
>
>  - It is sometimes possible to do such a refactor and have the deprecated 
> location proxy to the new location. In this case that seems hard to achieve.
>
>  - It is not actually necessary to maintain both locations, as we can declare 
> the old location will be unmaintained (but left alone) and all new 
> development goes to the new location. That isn't a great choice for users who 
> may simply upgrade their SDK version and not notice that their old code is 
> now pointing at a version that will not receive e.g. security updates.
>
>  - I like the style where if/when we transition from Beam 2 to Beam 3 we 
> should have the exact functionality of Beam 3 available as an opt-in flag 
> first. So if a user passes --beam-3 they get exactly what will be the default 
> functionality when we bump the major version. It really is a problem to do a 
> whole bunch of stuff feverishly before a major version bump. The other style 
> that I think works well is the linux kernel style where major versions 
> alternate between stable and unstable (in other words, returning to the 0.x 
> style with every alternating version).
>
>  - I do think Beam suffers from fear and inability to do significant code 
> gardening. I don't think backwards compatibility in the code sense is the 
> biggest blocker. I think the "pipeline update" feature is perhaps the thing 
> most holding Beam back from making radical rapid forward progress.
>
> Kenn
>
> On Mon, Dec 12, 2022 at 2:25 AM Moritz Mack <mm...@talend.com> wrote:
>>
>> Hi Damon,
>>
>>
>>
>> I fear the current release / versioning strategy of Beam doesn’t lend itself 
>> well for such breaking changes. Alexey and I have spent quite some time 
>> discussing how to proceed with the problematic Avro dependency in core (and 
>> respectively AvroIO, of course).
>>
>> Such changes essentially always require duplicating code to continue 
>> supporting a deprecated legacy code path to not break users’ code. But this 
>> comes at a very high price. Until the deprecated code path can be finally 
>> removed again, it must be maintained in two places.
>>
>> Unfortunately, the removal of deprecated code is rather problematic without 
>> a major version release as it would break semantic versioning and people’s 
>> expectations. With that deprecations bear the inherent risk to 
>> unintentionally deplete quality rather than improving it.
>>
>> I’d therefore recommend against such efforts unless there’s very strong 
>> reasons to do so.
>>
>>
>>
>> Best, Moritz
>>
>>
>>
>> On 07.12.22, 18:05, "Damon Douglas via dev" <dev@beam.apache.org> wrote:
>>
>>
>>
>> Hello Everyone, If you identify yourself on the Beam learning journey, even 
>> if this is your first day, please see yourself as a welcome participant in 
>> this conversation and consider reviewing the bottom portion of this email 
>> for guidance. The
>>
>> Hello Everyone,
>>
>>
>>
>> If you identify yourself on the Beam learning journey, even if this is your 
>> first day, please see yourself as a welcome participant in this conversation 
>> and consider reviewing the bottom portion of this email for guidance.
>>
>>
>>
>> The Short Version (For those with Java Beam SDK knowledge):
>>
>>
>>
>> Should we migrate FileIO / TextIO and related classes from :sdks:java:core 
>> to :sdks:java:io:file?  If so, should we target such a migration to a future 
>> Beam version with repeated announcements?  Does the Beam repository have any 
>> example of a similar change in the past?  What learnings from said past 
>> change could be potentially applied to this one?
>>
>>
>>
>> The Long Version (For those on the learning path):
>>
>>
>>
>> This email is more about our repository organization rather than Beam.  The 
>> proposal is to move two highly used classes (and anything related) in our 
>> Java SDK called FileIO [1] and TextIO [2].  The Beam GitHub repository uses 
>> a software called gradle [3], to automate routine code tasks such as build 
>> and test.  Gradle projects, such as Beam, organize code in what are called 
>> modules [4].  The three main ingredients that make a module are 1) a unique 
>> directory path, 2) a file called build.gradle (or build.gradle.kts) in this 
>> directory, 3) referencing the gradle module in a settings.gradle (or 
>> settings.gradle.kts) file at the root of the repository.
>>
>>
>>
>> The gradle documentation discusses why such organization might matter and 
>> how to achieve this with large projects [5].  Essentially, modules allow us 
>> to have mini-projects inside our large project and focus related automations 
>> to this one focused portion of our larger repository.  In Beam, we have the 
>> module :sdks:java:core [6] with all things related to the core of Beam, 
>> whereas we have separate modules related to reading from and writing to 
>> various resources within :sdks:java:io [7].
>>
>>
>>
>> The proposal suggests moving the aforementioned file reading and writing 
>> classes, FileIO and TextIO, and anything related, to its own 
>> :sdks:java:io:file module.  This would correspond to a new sdks/java/io/file 
>> directory and moving these classes into 
>> sdks/java/io/file/main/java/org/apache/beam/sdk/io/file.
>>
>>
>>
>> Definitions / References:
>>
>>
>>
>> 1. FileIO - a General-purpose transforms for working with files: listing 
>> files (matching), reading and writing.  See - 
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.html
>>
>>
>>
>> 2. TextIO - Similar to FileIO but focused on text files.  See 
>> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TextIO.html
>>
>>
>>
>> 3. Gradle - a build automation tool used by the Apache Beam repository to 
>> automate code-related tasks.  See 
>> https://docs.gradle.org/current/userguide/what_is_gradle.html
>>
>>
>>
>> 4. Gradle Module - a subsection of your larger repository.  See 
>> https://docs.gradle.org/current/userguide/dependency_management_terminology.html#sub:terminology_module
>>
>>
>>
>> 5. Structuring Large Projects with Gradle - 
>> https://docs.gradle.org/current/userguide/structuring_software_products.html
>>
>>
>>
>> 6. sdks:java:core - Corresponds to the sdks/java/core repository directory. 
>> See https://github.com/apache/beam/tree/master/sdks/java/core
>>
>>
>>
>> 7. sdks:java:io - Corresponds to the sdks/java/io repository directory.  See 
>> https://github.com/apache/beam/tree/master/sdks/java/io
>>
>>
>>
>> Best,
>>
>>
>>
>> Damon
>>
>>
>>
>> As a recipient of an email from Talend, your contact personal data will be 
>> on our systems. Please see our privacy notice.
>>
>>
>>

Reply via email to