Hello Everyone,

*If you identify yourself on the Beam learning journey, even if this is
your first day, please see yourself as a welcome participant in this
conversation and consider reviewing the bottom portion of this email for
guidance.*

*The Short Version (For those with Java Beam SDK knowledge)*:

Should we migrate FileIO / TextIO and related classes from :sdks:java:core
to :sdks:java:io:file?  If so, should we target such a migration to a
future Beam version with repeated announcements?  Does the Beam repository
have any example of a similar change in the past?  What learnings from said
past change could be potentially applied to this one?

*The Long Version (For those on the learning path)*:

This email is more about our repository organization rather than Beam.  The
proposal is to move two highly used classes (and anything related) in our
Java SDK called FileIO [1] and TextIO [2].  The Beam GitHub repository uses
a software called gradle [3], to automate routine code tasks such as build
and test.  Gradle projects, such as Beam, organize code in what are called
modules [4].  The three main ingredients that make a module are 1) a unique
directory path, 2) a file called build.gradle (or build.gradle.kts) in this
directory, 3) referencing the gradle module in a settings.gradle (or
settings.gradle.kts) file at the root of the repository.

The gradle documentation discusses why such organization might matter and
how to achieve this with large projects [5].  Essentially, modules allow us
to have mini-projects inside our large project and focus related
automations to this one focused portion of our larger repository.  In Beam,
we have the module :sdks:java:core [6] with all things related to the core
of Beam, whereas we have separate modules related to reading from and
writing to various resources within :sdks:java:io [7].

The proposal suggests moving the aforementioned file reading and writing
classes, FileIO and TextIO, and anything related, to its own
:sdks:java:io:file module.  This would correspond to a new
sdks/java/io/file directory and moving these classes into
sdks/java/io/file/main/java/org/apache/beam/sdk/io/file.

*Definitions / References*:

1. FileIO - a General-purpose transforms for working with files: listing
files (matching), reading and writing.  See -
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.html

2. TextIO - Similar to FileIO but focused on text files.  See
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TextIO.html

3. Gradle - a build automation tool used by the Apache Beam repository to
automate code-related tasks.  See
https://docs.gradle.org/current/userguide/what_is_gradle.html

4. Gradle Module - a subsection of your larger repository.  See
https://docs.gradle.org/current/userguide/dependency_management_terminology.html#sub:terminology_module

5. Structuring Large Projects with Gradle -
https://docs.gradle.org/current/userguide/structuring_software_products.html

6. sdks:java:core - Corresponds to the sdks/java/core repository directory.
See https://github.com/apache/beam/tree/master/sdks/java/core

7. sdks:java:io - Corresponds to the sdks/java/io repository directory.
See https://github.com/apache/beam/tree/master/sdks/java/io

Best,

Damon

Reply via email to