Hi all, I've chatted with a couple of people offline about this and my impression is that folks are generally amenable to a separate repo to match the target community? I have no idea what the next steps would be though other than guessing that there's probably some sort of PMC thing involved? Should I write something up somewhere?
Best, B On Thu, Sep 14, 2023 at 9:00 AM Byron Ellis <byronel...@google.com> wrote: > Hi all, > > I've been on vacation, but mostly working on getting External Transform > support going (which in turn basically requires Schema support as well). It > also looks like macros landed in Swift 5.9 for Linux so we'll be able to > use those to do some compile-time automation. In particular, this lets us > do something similar to what Java does with ByteBuddy for generating schema > coders though it has to be ahead of time so not quite the same. (As far as > I can tell this is a reason why macros got added to the language in the > first place---Apple's SwiftData library makes heavy use of the feature). > > I do have one question for the group though: should the Swift SDK > distribution take on Beam community properties or Swift community > properties? Specifically, in the Swift world the Swift SDK would live in > its own repo (beam-swift for example), which allows it to be most easily > consumed and keeps the checkout size under control for users. "Releases" in > the Swift world (much like Go) are just repo tags. The downside here is > that there's overhead in setting up the various github actions and other > CI/CD bits and bobs. > > The alternative would be to keep it in the beam repo itself like it is > now, but we'd probably want to move Package.swift to the root since for > whatever reason the Swift community (much to some people's annoyance) has > chosen to have packages only really able to live at the top of a repo. This > has less overhead from a CI/CD perspective, but lots of overhead for users > as they'd be checking out the entire Beam repo to use the SDK, which > happens a lot. > > There's a third option which is basically "do both" but honestly that just > seems like the worst of both worlds as it would require constant syncing if > we wanted to make it possible for Swift users to target unreleased SDKs for > development and testing. > > Personally, I would lean towards the former option (and would volunteer to > set up & document the various automations) as it is lighter for the actual > users of the SDK and more consistent with the community experience they > expect. The CI/CD stuff is mostly a "do it once" whereas checking out the > entire repo with many updates the user doesn't care about is something they > will be doing all the time. FWIW some of our dependencies also chose this > route---most notably GRPC which started with the latter approach and has > moved to the former. > > Interested to hear any feedback on the subject since I'm guessing it > probably came up with the Go SDK back in the day? > > Best, > B > > > > On Tue, Aug 29, 2023 at 7:59 AM Byron Ellis <byronel...@google.com> wrote: > >> After a couple of iterations (thanks rebo!) we've also gotten the Swift >> SDK working with the new Prism runner. The fact that it doesn't do fusion >> caught a couple of configuration bugs (e.g. that the grpc message receiver >> buffer should be fairly large). It would seem that at the moment Prism and >> the Flink runner have similar orders of strictness when interpreting the >> pipeline graph while the Python portable runner is far more forgiving. >> >> Also added support for bounded vs unbounded pcollections through the >> "type" parameter when adding a pardo. Impulse is a bounded pcollection I >> believe? >> >> On Fri, Aug 25, 2023 at 2:04 PM Byron Ellis <byronel...@google.com> >> wrote: >> >>> Okay, after a brief detour through "get this working in the Flink >>> Portable Runner" I think I have something pretty workable. >>> >>> PInput and POutput can actually be structs rather than protocols, which >>> simplifies things quite a bit. It also allows us to use them with property >>> wrappers for a SwiftUI-like experience if we want when defining DoFns >>> (which is what I was originally intending to use them for). That also means >>> the function signature you use for closures would match full-fledged DoFn >>> definitions for the most part which is satisfying. >>> >>> >>> >>> On Thu, Aug 24, 2023 at 5:55 PM Byron Ellis <byronel...@google.com> >>> wrote: >>> >>>> Okay, I tried a couple of different things. >>>> >>>> Implicitly passing the timestamp and window during iteration did not go >>>> well. While physically possible it introduces an invisible side effect into >>>> loop iteration which confused me when I tried to use it and I implemented >>>> it. Also, I'm pretty sure there'd end up being some sort of race condition >>>> nightmare continuing down that path. >>>> >>>> What I decided to do instead was the following: >>>> >>>> 1. Rename the existing "pardo" functions to "pstream" and require that >>>> they always emit a window and timestamp along with their value. This >>>> eliminates the side effect but lets us keep iteration in a bundle where >>>> that might be convenient. For example, in my cheesy GCS implementation it >>>> means that I can keep an OAuth token around for the lifetime of the bundle >>>> as a local variable, which is convenient. It's a bit more typing for users >>>> of pstream, but the expectation here is that if you're using pstream >>>> functions You Know What You Are Doing and most people won't be using it >>>> directly. >>>> >>>> 2. Introduce a new set of pardo functions (I didn't do all of them yet, >>>> but enough to test the functionality and decide I liked it) which take a >>>> function signature of (any PInput<InputType>,any POutput<OutputType>). >>>> PInput takes the (InputType,Date,Window) tuple and converts it into a >>>> struct with friendlier names. Not strictly necessary, but makes the code >>>> nicer to read I think. POutput introduces emit functions that optionally >>>> allow you to specify a timestamp and a window. If you don't for either one >>>> it will take the timestamp and/or window of the input. >>>> >>>> Trying to use that was pretty pleasant to use so I think we should >>>> continue down that path. If you'd like to see it in use, I reimplemented >>>> map() and flatMap() in terms of this new pardo functionality. >>>> >>>> Code has been pushed to the branch/PR if you're interested in taking a >>>> look. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Thu, Aug 24, 2023 at 2:15 PM Byron Ellis <byronel...@google.com> >>>> wrote: >>>> >>>>> Gotcha, I think there's a fairly easy solution to link input and >>>>> output streams.... Let me try it out... might even be possible to have >>>>> both >>>>> element and stream-wise closure pardos. Definitely possible to have that >>>>> at >>>>> the DoFn level (called SerializableFn in the SDK because I want to >>>>> use @DoFn as a macro) >>>>> >>>>> On Thu, Aug 24, 2023 at 1:09 PM Robert Bradshaw <rober...@google.com> >>>>> wrote: >>>>> >>>>>> On Thu, Aug 24, 2023 at 12:58 PM Chamikara Jayalath < >>>>>> chamik...@google.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Aug 24, 2023 at 12:27 PM Robert Bradshaw < >>>>>>> rober...@google.com> wrote: >>>>>>> >>>>>>>> I would like to figure out a way to get the stream-y interface to >>>>>>>> work, as I think it's more natural overall. >>>>>>>> >>>>>>>> One hypothesis is that if any elements are carried over loop >>>>>>>> iterations, there will likely be some that are carried over beyond the >>>>>>>> loop >>>>>>>> (after all the callee doesn't know when the loop is supposed to end). >>>>>>>> We >>>>>>>> could reject "plain" elements that are emitted after this point, >>>>>>>> requiring >>>>>>>> one to emit timestamp-windowed-values. >>>>>>>> >>>>>>> >>>>>>> Are you assuming that the same stream (or overlapping sets of data) >>>>>>> are pushed to multiple workers ? I thought that the set of data streamed >>>>>>> here are the data that belong to the current bundle (hence already >>>>>>> assigned >>>>>>> to the current worker) so any output from the current bundle invocation >>>>>>> would be a valid output of that bundle. >>>>>>> >>>>>>>> >>>>>> Yes, the content of the stream is exactly the contents of the bundle. >>>>>> The question is how to do the input_element:output_element correlation >>>>>> for >>>>>> automatically propagating metadata. >>>>>> >>>>>> >>>>>>> Related to this, we could enforce that the only (user-accessible) >>>>>>>> way to get such a timestamped value is to start with one, e.g. a >>>>>>>> WindowedValue<T>.withValue(O) produces a WindowedValue<O> with the same >>>>>>>> metadata but a new value. Thus a user wanting to do anything "fancy" >>>>>>>> would >>>>>>>> have to explicitly request iteration over these windowed values rather >>>>>>>> than >>>>>>>> over the raw elements. (This is also forward compatible with expanding >>>>>>>> the >>>>>>>> metadata that can get attached, e.g. pane infos, and makes the right >>>>>>>> thing >>>>>>>> the easiest/most natural.) >>>>>>>> >>>>>>>> On Thu, Aug 24, 2023 at 12:10 PM Byron Ellis <byronel...@google.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Ah, that is a good point—being element-wise would make managing >>>>>>>>> windows and time stamps easier for the user. Fortunately it’s a >>>>>>>>> fairly easy >>>>>>>>> change to make and maybe even less typing for the user. I was >>>>>>>>> originally >>>>>>>>> thinking side inputs and metrics would happen outside the loop, but I >>>>>>>>> think >>>>>>>>> you want a class and not a closure at that point for sanity. >>>>>>>>> >>>>>>>>> On Thu, Aug 24, 2023 at 12:02 PM Robert Bradshaw < >>>>>>>>> rober...@google.com> wrote: >>>>>>>>> >>>>>>>>>> Ah, I see. >>>>>>>>>> >>>>>>>>>> Yeah, I've thought about using an iterable for the whole bundle >>>>>>>>>> rather than start/finish bundle callbacks, but one of the questions >>>>>>>>>> is how >>>>>>>>>> that would impact implicit passing of the timestamp (and other) >>>>>>>>>> metadata >>>>>>>>>> from input elements to output elements. (You can of course attach the >>>>>>>>>> metadata to any output that happens in the loop body, but it's very >>>>>>>>>> easy to >>>>>>>>>> implicitly to break the 1:1 relationship here (e.g. by doing >>>>>>>>>> buffering or >>>>>>>>>> otherwise modifying local state) and this would be hard to detect. (I >>>>>>>>>> suppose trying to output after the loop finishes could require >>>>>>>>>> something more explicit). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Aug 23, 2023 at 6:56 PM Byron Ellis < >>>>>>>>>> byronel...@google.com> wrote: >>>>>>>>>> >>>>>>>>>>> Oh, I also forgot to mention that I included element-wise >>>>>>>>>>> collection operations like "map" that eliminate the need for pardo >>>>>>>>>>> in many >>>>>>>>>>> cases. the groupBy command is actually a map + groupByKey under the >>>>>>>>>>> hood. >>>>>>>>>>> That was to be more consistent with Swift's collection protocol >>>>>>>>>>> (and is >>>>>>>>>>> also why PCollection and PCollectionStream are different types... >>>>>>>>>>> PCollection implements map and friends as pipeline construction >>>>>>>>>>> operations >>>>>>>>>>> whereas PCollectionStream is an actual stream) >>>>>>>>>>> >>>>>>>>>>> I just happened to push some "IO primitives" that uses map >>>>>>>>>>> rather than pardo in a couple of places to do a true wordcount >>>>>>>>>>> using good >>>>>>>>>>> ol' Shakespeare and very very primitive GCS IO. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> B >>>>>>>>>>> >>>>>>>>>>> On Wed, Aug 23, 2023 at 6:08 PM Byron Ellis < >>>>>>>>>>> byronel...@google.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Indeed :-) Yeah, I went back and forth on the pardo syntax >>>>>>>>>>>> quite a bit before settling on where I ended up. Ultimately I >>>>>>>>>>>> decided to go >>>>>>>>>>>> with something that felt more Swift-y than anything else which >>>>>>>>>>>> means that >>>>>>>>>>>> rather than dealing with a single element like you do in the other >>>>>>>>>>>> SDKs >>>>>>>>>>>> you're dealing with a stream of elements (which of course will >>>>>>>>>>>> often be of >>>>>>>>>>>> size 1). That's a really natural paradigm in the Swift world >>>>>>>>>>>> especially >>>>>>>>>>>> with the async / await structures. So when you see something like: >>>>>>>>>>>> >>>>>>>>>>>> pardo(name:"Read Files") { filenames,output,errors in >>>>>>>>>>>> >>>>>>>>>>>> for try await (filename,_,_) in filenames { >>>>>>>>>>>> ... >>>>>>>>>>>> output.emit(data) >>>>>>>>>>>> >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> filenames is the input stream and then output and errors are >>>>>>>>>>>> both output streams. In theory you can have as many output streams >>>>>>>>>>>> as you >>>>>>>>>>>> like though at the moment there's a compiler bug in the new type >>>>>>>>>>>> pack >>>>>>>>>>>> feature that limits it to "as many as I felt like supporting". >>>>>>>>>>>> Presumably >>>>>>>>>>>> this will get fixed before the official 5.9 release which will >>>>>>>>>>>> probably be >>>>>>>>>>>> in the October timeframe if history is any guide) >>>>>>>>>>>> >>>>>>>>>>>> If you had parameterization you wanted to send that would look >>>>>>>>>>>> like pardo("Parameter") { param,filenames,output,error in ... } >>>>>>>>>>>> where >>>>>>>>>>>> "param" would take on the value of "Parameter." All of this is >>>>>>>>>>>> being >>>>>>>>>>>> typechecked at compile time BTW. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> the (filename,_,_) is a tuple spreading construct like you have >>>>>>>>>>>> in ES6 and other things where "_" is Swift for "ignore." In this >>>>>>>>>>>> case >>>>>>>>>>>> PCollectionStreams have an element signature of (Of,Date,Window) >>>>>>>>>>>> so you can >>>>>>>>>>>> optionally extract the timestamp and the window if you want to >>>>>>>>>>>> manipulate >>>>>>>>>>>> it somehow. >>>>>>>>>>>> >>>>>>>>>>>> That said it would also be natural to provide elementwise >>>>>>>>>>>> pardos--- that would probably mean having explicit type signatures >>>>>>>>>>>> in the >>>>>>>>>>>> closure. I had that at one point, but it felt less natural the >>>>>>>>>>>> more I used >>>>>>>>>>>> it. I'm also slowly working towards adding a more "traditional" >>>>>>>>>>>> DoFn >>>>>>>>>>>> implementation approach where you implement the DoFn as an object >>>>>>>>>>>> type. In >>>>>>>>>>>> that case it would be very very easy to support both by having a >>>>>>>>>>>> default >>>>>>>>>>>> stream implementation call the equivalent of processElement. To >>>>>>>>>>>> make that >>>>>>>>>>>> performant I need to implement an @DoFn macro and I just haven't >>>>>>>>>>>> gotten to >>>>>>>>>>>> it yet. >>>>>>>>>>>> >>>>>>>>>>>> It's a bit more work and I've been prioritizing implementing >>>>>>>>>>>> composite and external transforms for the reasons you suggest. :-) >>>>>>>>>>>> I've got >>>>>>>>>>>> the basics of a composite transform (there's an equivalent >>>>>>>>>>>> wordcount >>>>>>>>>>>> example) and am hooking it into the pipeline generation, which >>>>>>>>>>>> should also >>>>>>>>>>>> give me everything I need to successfully hook in external >>>>>>>>>>>> transforms as >>>>>>>>>>>> well. That will give me the jump on IOs as you say. I can also >>>>>>>>>>>> treat the >>>>>>>>>>>> pipeline itself as a composite transform which lets me get rid of >>>>>>>>>>>> the >>>>>>>>>>>> Pipeline { pipeline in ... } and just instead have things attach >>>>>>>>>>>> themselves >>>>>>>>>>>> to the pipeline implicitly. >>>>>>>>>>>> >>>>>>>>>>>> That said, there are some interesting IO possibilities that >>>>>>>>>>>> would be Swift native. In particularly, I've been looking at the >>>>>>>>>>>> native >>>>>>>>>>>> Swift binding for DuckDB (which is C++ based). DuckDB is SQL based >>>>>>>>>>>> but not >>>>>>>>>>>> distributed in the same was as, say, Beam SQL... but it would >>>>>>>>>>>> allow for SQL >>>>>>>>>>>> statements on individual files with projection pushdown supported >>>>>>>>>>>> for >>>>>>>>>>>> things like Parquet which could have some cool and performant data >>>>>>>>>>>> lake >>>>>>>>>>>> applications. I'll probably do a couple of the simpler IOs as >>>>>>>>>>>> well---there's a Swift AWS SDK binding that's pretty good that >>>>>>>>>>>> would give >>>>>>>>>>>> me S3 and there's a Cloud auth library as well that makes it >>>>>>>>>>>> pretty easy to >>>>>>>>>>>> work with GCS. >>>>>>>>>>>> >>>>>>>>>>>> In any case, I'm updating the branch as I find a minute here >>>>>>>>>>>> and there. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> B >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Aug 23, 2023 at 5:02 PM Robert Bradshaw < >>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Neat. >>>>>>>>>>>>> >>>>>>>>>>>>> Nothing like writing and SDK to actually understand how the >>>>>>>>>>>>> FnAPI works :). I like the use of groupBy. I have to admit I'm a >>>>>>>>>>>>> bit >>>>>>>>>>>>> mystified by the syntax for parDo (I don't know swift at all >>>>>>>>>>>>> which is >>>>>>>>>>>>> probably tripping me up). The addition of external >>>>>>>>>>>>> (cross-language) >>>>>>>>>>>>> transforms could let you steal everything (e.g. IOs) pretty >>>>>>>>>>>>> quickly from >>>>>>>>>>>>> other SDKs. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Aug 18, 2023 at 7:55 AM Byron Ellis via user < >>>>>>>>>>>>> user@beam.apache.org> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> For everyone who is interested, here's the draft PR: >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://github.com/apache/beam/pull/28062 >>>>>>>>>>>>>> >>>>>>>>>>>>>> I haven't had a chance to test it on my M1 machine yet though >>>>>>>>>>>>>> (there's a good chance there are a few places that need to >>>>>>>>>>>>>> properly address >>>>>>>>>>>>>> endianness. Specifically timestamps in windowed values and >>>>>>>>>>>>>> length in >>>>>>>>>>>>>> iterable coders as those both use specifically bigendian >>>>>>>>>>>>>> representations) >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Aug 17, 2023 at 8:57 PM Byron Ellis < >>>>>>>>>>>>>> byronel...@google.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks Cham, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Definitely happy to open a draft PR so folks can >>>>>>>>>>>>>>> comment---there's not as much code as it looks like since most >>>>>>>>>>>>>>> of the LOC >>>>>>>>>>>>>>> is just generated protobuf. As for the support, I definitely >>>>>>>>>>>>>>> want to add >>>>>>>>>>>>>>> external transforms and may actually add that support before >>>>>>>>>>>>>>> adding the >>>>>>>>>>>>>>> ability to make composites in the language itself. With the way >>>>>>>>>>>>>>> the SDK is >>>>>>>>>>>>>>> laid out adding composites to the pipeline graph is a separate >>>>>>>>>>>>>>> operation >>>>>>>>>>>>>>> than defining a composite. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Aug 17, 2023 at 4:28 PM Chamikara Jayalath < >>>>>>>>>>>>>>> chamik...@google.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks Byron. This sounds great. I wonder if there is >>>>>>>>>>>>>>>> interest in Swift SDK from folks currently subscribed to the >>>>>>>>>>>>>>>> +user <user@beam.apache.org> list. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Aug 16, 2023 at 6:53 PM Byron Ellis via dev < >>>>>>>>>>>>>>>> d...@beam.apache.org> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hello everyone, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> A couple of months ago I decided that I wanted to really >>>>>>>>>>>>>>>>> understand how the Beam FnApi works and how it interacts with >>>>>>>>>>>>>>>>> the Portable >>>>>>>>>>>>>>>>> Runner. For me at least that usually means I need to write >>>>>>>>>>>>>>>>> some code so I >>>>>>>>>>>>>>>>> can see things happening in a debugger and to really prove to >>>>>>>>>>>>>>>>> myself I >>>>>>>>>>>>>>>>> understood what was going on I decided I couldn't use an >>>>>>>>>>>>>>>>> existing SDK >>>>>>>>>>>>>>>>> language to do it since there would be the temptation to read >>>>>>>>>>>>>>>>> some code and >>>>>>>>>>>>>>>>> convince myself that I actually understood what was going on. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> One thing led to another and it turns out that to get a >>>>>>>>>>>>>>>>> minimal FnApi integration going you end up writing a fair bit >>>>>>>>>>>>>>>>> of an SDK. So >>>>>>>>>>>>>>>>> I decided to take things to a point where I had an SDK that >>>>>>>>>>>>>>>>> could execute a >>>>>>>>>>>>>>>>> word count example via a portable runner backend. I've now >>>>>>>>>>>>>>>>> reached that >>>>>>>>>>>>>>>>> point and would like to submit my prototype SDK to the list >>>>>>>>>>>>>>>>> for feedback. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It's currently living in a branch on my fork here: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> https://github.com/byronellis/beam/tree/swift-sdk/sdks/swift >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> At the moment it runs via the most recent XCode Beta using >>>>>>>>>>>>>>>>> Swift 5.9 on Intel Macs, but should also work using beta >>>>>>>>>>>>>>>>> builds of 5.9 for >>>>>>>>>>>>>>>>> Linux running on Intel hardware. I haven't had a chance to >>>>>>>>>>>>>>>>> try it on ARM >>>>>>>>>>>>>>>>> hardware and make sure all of the endian checks are complete. >>>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>> "IntegrationTests.swift" file contains a word count example >>>>>>>>>>>>>>>>> that reads some >>>>>>>>>>>>>>>>> local files (as well as a missing file to exercise DLQ >>>>>>>>>>>>>>>>> functionality) and >>>>>>>>>>>>>>>>> output counts through two separate group by operations to get >>>>>>>>>>>>>>>>> it past the >>>>>>>>>>>>>>>>> "map reduce" size of pipeline. I've tested it against the >>>>>>>>>>>>>>>>> Python Portable >>>>>>>>>>>>>>>>> Runner. Since my goal was to learn FnApi there is no Direct >>>>>>>>>>>>>>>>> Runner at this >>>>>>>>>>>>>>>>> time. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I've shown it to a couple of folks already and >>>>>>>>>>>>>>>>> incorporated some of that feedback already (for example pardo >>>>>>>>>>>>>>>>> was >>>>>>>>>>>>>>>>> originally called dofn when defining pipelines). In general >>>>>>>>>>>>>>>>> I've tried to >>>>>>>>>>>>>>>>> make the API as "Swift-y" as possible, hence the heavy >>>>>>>>>>>>>>>>> reliance on closures >>>>>>>>>>>>>>>>> and while there aren't yet composite PTransforms there's the >>>>>>>>>>>>>>>>> beginnings of >>>>>>>>>>>>>>>>> what would be needed for a SwiftUI-like declarative API for >>>>>>>>>>>>>>>>> creating them. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There are of course a ton of missing bits still to be >>>>>>>>>>>>>>>>> implemented, like counters, metrics, windowing, state, >>>>>>>>>>>>>>>>> timers, etc. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This should be fine and we can get the code documented >>>>>>>>>>>>>>>> without these features. I think support for composites and >>>>>>>>>>>>>>>> adding an >>>>>>>>>>>>>>>> external transform (see, Java >>>>>>>>>>>>>>>> <https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java>, >>>>>>>>>>>>>>>> Python >>>>>>>>>>>>>>>> <https://github.com/apache/beam/blob/c7b7921185686da573f76ce7320817c32375c7d0/sdks/python/apache_beam/transforms/external.py#L556>, >>>>>>>>>>>>>>>> Go >>>>>>>>>>>>>>>> <https://github.com/apache/beam/blob/c7b7921185686da573f76ce7320817c32375c7d0/sdks/go/pkg/beam/xlang.go#L155>, >>>>>>>>>>>>>>>> TypeScript >>>>>>>>>>>>>>>> <https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/transforms/external.ts>) >>>>>>>>>>>>>>>> to add support for multi-lang will bring in a lot of features >>>>>>>>>>>>>>>> (for example, >>>>>>>>>>>>>>>> I/O connectors) for free. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Any and all feedback welcome and happy to submit a PR if >>>>>>>>>>>>>>>>> folks are interested, though the "Swift Way" would be to have >>>>>>>>>>>>>>>>> it in its own >>>>>>>>>>>>>>>>> repo so that it can easily be used from the Swift Package >>>>>>>>>>>>>>>>> Manager. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> +1 for creating a PR (may be as a draft initially). Also >>>>>>>>>>>>>>>> it'll be easier to comment on a PR :) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Cham >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> B >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>