Thanks for looking into this and the careful writeup. I've read the
design doc and it looks great, but have a couple of questions.

(1) Why did you decide on having a single top-level FileWrite
transform whose config is ([common_parameters], [xml-params],
[csv-params], ...) rather than separate schema transforms for each.
(2) Is there a plan to do a similar thing for the Read side?

On Fri, Dec 2, 2022 at 9:48 AM Damon Douglas <douglas.da...@gmail.com> wrote:
>
> Hello Everyone,
>
> For those new to Beam, even if this is your first day, consider yourselves a 
> welcome contributor to this conversation.  I remember what it was like first 
> learning Beam on my own and I am passionate about everyone's learning 
> experience.  Below are definitions/references and a suggested learning path 
> to understand this email.
>
> Short Version (assumes Beam knowledge):  Could someone review 
> https://github.com/apache/beam/pull/24479? Based on the design document [1], 
> It's the first of a series of pull requests that enable FileIO.Write [2] 
> support for Schema Transforms [3].
>
> Long Version (for those first learning Beam):
>
> Explaining this without using Beam specific language.
>
> Suppose my team needs to quickly write to a file or object storage system 
> without writing the specific code to accomplish this final step.  This pull 
> request begins work in enabling such ability.  I can specify the format such 
> as avro, json, xml, etc in the configuration file and a backend service will 
> deal with the remaining details of how to achieve this at scale.
>
> If you are interested in how this works, please see the design document [1].
>
> Definitions/References:
>
> 1. bit.ly/fileioschematransformwriteprovider
> 2. FileIO.Write - A Beam transform that writes to file or object storage 
> systems
> See 
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.Write.html
> 3. Schema Transform - Enables Schema Aware transforms when reading from and 
> writing to sources and sinks using Beam IOs defined declaratively using a 
> single configuration input
> 4. Schema Awareness refers to transforms that know how to process pipeline 
> elements with inherent knowledge of their properties and types.  This 
> collection of properties and types is called a Schema.  A Beam Row contains 
> properties and a data structure described by a Schema.  Think of the Row as 
> the data element described by its Schema.
>
> Best,
>
> Damon
>

Reply via email to