Hello Everyone,

*For those new to Beam, even if this is your first day, consider yourselves
a welcome contributor to this conversation.  I remember what it was like
first learning Beam on my own and I am passionate about everyone's learning
experience.  Below are definitions/references and a suggested learning path
to understand this email.*

*Short Version (assumes Beam knowledge)*:  *Could someone review
https://github.com/apache/beam/pull/24479
<https://github.com/apache/beam/pull/24479>? *Based on the design document
[1], It's the first of a series of pull requests that enable FileIO.Write
[2] support for Schema Transforms [3].

*Long Version (for those first learning Beam)*:

Explaining this without using Beam specific language.

Suppose my team needs to quickly write to a file or object storage system
without writing the specific code to accomplish this final step.  This pull
request begins work in enabling such ability.  I can specify the format
such as avro, json, xml, etc in the configuration file and a backend
service will deal with the remaining details of how to achieve this at
scale.

If you are interested in how this works, please see the design document [1].

*Definitions/References*:

1. bit.ly/fileioschematransformwriteprovider
2. FileIO.Write - A Beam transform that writes to file or object storage
systems
See
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.Write.html
3. Schema Transform - Enables Schema Aware transforms when reading from and
writing to sources and sinks using Beam IOs defined declaratively using a
single configuration input
4. Schema Awareness refers to transforms that know how to process pipeline
elements with inherent knowledge of their properties and types.
This collection of properties and types is called a Schema.  A Beam Row
contains properties and a data structure described by a Schema.  Think of
the Row as the data element described by its Schema.

Best,

Damon

Reply via email to