Hello Everyone, *For those new to Beam, even if this is your first day, consider yourselves a welcome contributor to this conversation. I remember what it was like first learning Beam on my own and I am passionate about everyone's learning experience. Below are definitions/references and a suggested learning path to understand this email.*
*Short Version (assumes Beam knowledge)*: *Could someone review https://github.com/apache/beam/pull/24479 <https://github.com/apache/beam/pull/24479>? *Based on the design document [1], It's the first of a series of pull requests that enable FileIO.Write [2] support for Schema Transforms [3]. *Long Version (for those first learning Beam)*: Explaining this without using Beam specific language. Suppose my team needs to quickly write to a file or object storage system without writing the specific code to accomplish this final step. This pull request begins work in enabling such ability. I can specify the format such as avro, json, xml, etc in the configuration file and a backend service will deal with the remaining details of how to achieve this at scale. If you are interested in how this works, please see the design document [1]. *Definitions/References*: 1. bit.ly/fileioschematransformwriteprovider 2. FileIO.Write - A Beam transform that writes to file or object storage systems See https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.Write.html 3. Schema Transform - Enables Schema Aware transforms when reading from and writing to sources and sinks using Beam IOs defined declaratively using a single configuration input 4. Schema Awareness refers to transforms that know how to process pipeline elements with inherent knowledge of their properties and types. This collection of properties and types is called a Schema. A Beam Row contains properties and a data structure described by a Schema. Think of the Row as the data element described by its Schema. Best, Damon