Thanks for looking into this and the careful writeup. I've read the design doc and it looks great, but have a couple of questions.
(1) Why did you decide on having a single top-level FileWrite transform whose config is ([common_parameters], [xml-params], [csv-params], ...) rather than separate schema transforms for each. (2) Is there a plan to do a similar thing for the Read side? On Fri, Dec 2, 2022 at 9:48 AM Damon Douglas <douglas.da...@gmail.com> wrote: > > Hello Everyone, > > For those new to Beam, even if this is your first day, consider yourselves a > welcome contributor to this conversation. I remember what it was like first > learning Beam on my own and I am passionate about everyone's learning > experience. Below are definitions/references and a suggested learning path > to understand this email. > > Short Version (assumes Beam knowledge): Could someone review > https://github.com/apache/beam/pull/24479? Based on the design document [1], > It's the first of a series of pull requests that enable FileIO.Write [2] > support for Schema Transforms [3]. > > Long Version (for those first learning Beam): > > Explaining this without using Beam specific language. > > Suppose my team needs to quickly write to a file or object storage system > without writing the specific code to accomplish this final step. This pull > request begins work in enabling such ability. I can specify the format such > as avro, json, xml, etc in the configuration file and a backend service will > deal with the remaining details of how to achieve this at scale. > > If you are interested in how this works, please see the design document [1]. > > Definitions/References: > > 1. bit.ly/fileioschematransformwriteprovider > 2. FileIO.Write - A Beam transform that writes to file or object storage > systems > See > https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.Write.html > 3. Schema Transform - Enables Schema Aware transforms when reading from and > writing to sources and sinks using Beam IOs defined declaratively using a > single configuration input > 4. Schema Awareness refers to transforms that know how to process pipeline > elements with inherent knowledge of their properties and types. This > collection of properties and types is called a Schema. A Beam Row contains > properties and a data structure described by a Schema. Think of the Row as > the data element described by its Schema. > > Best, > > Damon >