Hello all, The proposed I/O Standards are now ready as a new page for the Apache Beam website, please review this PR https://github.com/apache/beam/pull/24962
Thanks! Herman Mak | Customer Engineer, Hong Kong, Google Cloud | herman...@google.com | +852-3923-5417 On Wed, Dec 21, 2022 at 6:02 PM Herman Mak <herman...@google.com> wrote: > Hello all, > > I've addressed the areas with comments with updated explanations and > responses where necessary. > > Please do have a quick read if you have time. > I shall follow-up with these datas as markdown changes to beam site in a > couple of days for feedback. > > Thanks! > > Herman Mak | Customer Engineer, Hong Kong, Google Cloud | > herman...@google.com | +852-3923-5417 <+852%203923%205417> > > > > > > On Sat, Dec 17, 2022 at 2:13 AM Andrew Pilloud <apill...@google.com> > wrote: > >> By "Relational" I mean things like: Column Pruning, Filter Pushdown, >> Table Statistics, Partition Metadata, Metastore. We have a bunch of one-off >> implementations in various IOs (mostly BigQueryIO) and have been waiting >> for IO standards to push them out to all IOs. This was section "F5 - >> Relational" from https://s.apache.org/beam-io-api-standard-documentation >> >> On Thu, Dec 15, 2022 at 6:50 PM Herman Mak <herman...@google.com> wrote: >> >>> Hey all, >>> >>> Firstly apologies for the confusion. >>> >>> The scope of this effort is to *finalize and have this added to the >>> Beam public documentation* to be used as a PR reference once we have >>> resolved the comments. >>> YES this document is a continuation of the below docs with some >>> additional components such as testing! >>> >>> The idea is to convert this to a MD file and add a page under >>> "Developing new I/O connectors" with some small cleanup work around this >>> area in other pages. >>> [image: image.png] >>> >>> >>> >>> >>> Docs that this is a continuation of: >>> https://s.apache.org/beam-io-api-standard-documentation >>> https://s.apache.org/beam-io-api-standard >>> >>> >>> @Andrew Pilloud <apill...@google.com> Totally not intending to start >>> from the beginning here, by relational do you mean having this hosting in >>> the Beam confluence? >>> >>> Thanks all, and keep the feedback to the docs coming >>> >>> Herman Mak | Customer Engineer, Hong Kong, Google Cloud | >>> herman...@google.com | +852-3923-5417 <+852%203923%205417> >>> >>> >>> >>> >>> >>> On Fri, Dec 16, 2022 at 1:36 AM Chamikara Jayalath <chamik...@google.com> >>> wrote: >>> >>>> >>>> >>>> On Thu, Dec 15, 2022, 8:33 AM Alexey Romanenko < >>>> aromanenko....@gmail.com> wrote: >>>> >>>>> Cham, do you remember what was a reason to not finalise that doc? >>>>> >>>> >>>> I think this is a continuation of those docs (so we are trying to >>>> finalize) but probably Herman can explain better. >>>> >>>> >>>>> Personally, I find having such standards very useful (if they are >>>>> flexible during a time, of course), especially for new developers and PR >>>>> reviewers, and it’d be great to finally have such doc as a part of >>>>> contribution guide. >>>>> >>>> >>>> +1 >>>> >>>> Thanks, >>>> Cham >>>> >>>>> >>>>> — >>>>> Alexey >>>>> >>>>> On 13 Dec 2022, at 04:32, Chamikara Jayalath via dev < >>>>> dev@beam.apache.org> wrote: >>>>> >>>>> Yeah, I don't think either finalized or documented (in the Website) >>>>> the previous iteration. This doc seems to contain details from the >>>>> documents shared in the previous iteration. >>>>> >>>>> Thanks, >>>>> Cham >>>>> >>>>> >>>>> >>>>> On Mon, Dec 12, 2022 at 6:49 PM Robert Burke <rob...@frantil.com> >>>>> wrote: >>>>> >>>>>> I think ultimately: until the docs a clearly available on the Beam >>>>>> site itself, it's not documentation. See also, design docs, previous >>>>>> emails, and similar. >>>>>> >>>>>> On Mon, Dec 12, 2022, 6:07 PM Andrew Pilloud via dev < >>>>>> dev@beam.apache.org> wrote: >>>>>> >>>>>>> I believe the previous iteration was here: >>>>>>> https://lists.apache.org/thread/3o8glwkn70kqjrf6wm4dyf8bt27s52hk >>>>>>> >>>>>>> The associated docs are: >>>>>>> https://s.apache.org/beam-io-api-standard-documentation >>>>>>> https://s.apache.org/beam-io-api-standard >>>>>>> >>>>>>> This is missing all the relational stuff that was in those docs, >>>>>>> this appears to be another attempt starting from the beginning? >>>>>>> >>>>>>> Andrew >>>>>>> >>>>>>> >>>>>>> On Mon, Dec 12, 2022 at 9:57 AM Alexey Romanenko < >>>>>>> aromanenko....@gmail.com> wrote: >>>>>>> >>>>>>>> Thanks for writing this! >>>>>>>> >>>>>>>> IIRC, the similar design doc was sent for review here a while ago. >>>>>>>> Is this just an updated version and a new one? >>>>>>>> >>>>>>>> — >>>>>>>> Alexey >>>>>>>> >>>>>>>> On 11 Dec 2022, at 15:16, Herman Mak via dev <dev@beam.apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Hello Everyone, >>>>>>>> >>>>>>>> *TLDR* >>>>>>>> >>>>>>>> Should we adopt a set of standards that Connector I/Os should >>>>>>>> adhere to? >>>>>>>> Attached is a first version of a Beam I/O Standards guideline that >>>>>>>> includes opinionated best practices across important components of a >>>>>>>> Connector I/O, namely Documentation, Development and Testing. >>>>>>>> >>>>>>>> *The Long Version* >>>>>>>> >>>>>>>> Apache Beam is a unified open-source programming model for both >>>>>>>> batch and streaming. It runs on multiple platform runners and >>>>>>>> integrates >>>>>>>> with over 50 services using individually developed I/O Connectors >>>>>>>> <https://beam.apache.org/documentation/io/connectors/>. >>>>>>>> >>>>>>>> Given that Apache Beam connectors are written by many different >>>>>>>> developers and at varying points in time, they vary in syntax style, >>>>>>>> documentation completeness and testing done. For a new adopter of >>>>>>>> Apache >>>>>>>> Beam, that can definitely cause some uncertainty. >>>>>>>> >>>>>>>> So should we adopt a set of standards that Connector I/Os should >>>>>>>> adhere to? >>>>>>>> Attached is a first version, in Doc format, of a Beam I/O Standards >>>>>>>> guideline that includes opinionated best practices across important >>>>>>>> components of a Connector I/O, namely Documentation, Development and >>>>>>>> Testing. And the aim is to incorporate this into the documentation and >>>>>>>> to >>>>>>>> have it referenced as standards for new Connector I/Os (and ideally >>>>>>>> have >>>>>>>> existing Connectors upgraded over time). If it looks helpful, the >>>>>>>> immediate >>>>>>>> next step is that we can convert it into a .md as a PR into the Beam >>>>>>>> repo! >>>>>>>> >>>>>>>> Thanks and looking forward to feedbacks and discussion, >>>>>>>> >>>>>>>> [PUBLIC] Beam I/O Standards >>>>>>>> <https://docs.google.com/document/d/1BCTpSZDUjK90hYZjcn8aAnPd9vuRfj8YU1j3mpSgRwI/edit?usp=drive_web> >>>>>>>> >>>>>>>> Herman Mak | Customer Engineer, Hong Kong, Google Cloud | >>>>>>>> herman...@google.com | +852-3923-5417 <+852%203923%205417> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>