On Wed, Sep 20, 2023 at 9:54 AM Byron Ellis <byronel...@google.com> wrote:
> Hi all, > > I've chatted with a couple of people offline about this and my impression > is that folks are generally amenable to a separate repo to match the target > community? I have no idea what the next steps would be though other than > guessing that there's probably some sort of PMC thing involved? Should I > write something up somewhere? > I think the process should be similar to other code/design reviews for large contributions. I don't think you need a PMC involvement here. > > Best, > B > > On Thu, Sep 14, 2023 at 9:00 AM Byron Ellis <byronel...@google.com> wrote: > >> Hi all, >> >> I've been on vacation, but mostly working on getting External Transform >> support going (which in turn basically requires Schema support as well). It >> also looks like macros landed in Swift 5.9 for Linux so we'll be able to >> use those to do some compile-time automation. In particular, this lets us >> do something similar to what Java does with ByteBuddy for generating schema >> coders though it has to be ahead of time so not quite the same. (As far as >> I can tell this is a reason why macros got added to the language in the >> first place---Apple's SwiftData library makes heavy use of the feature). >> >> I do have one question for the group though: should the Swift SDK >> distribution take on Beam community properties or Swift community >> properties? Specifically, in the Swift world the Swift SDK would live in >> its own repo (beam-swift for example), which allows it to be most easily >> consumed and keeps the checkout size under control for users. "Releases" in >> the Swift world (much like Go) are just repo tags. The downside here is >> that there's overhead in setting up the various github actions and other >> CI/CD bits and bobs. >> >> > The alternative would be to keep it in the beam repo itself like it is >> now, but we'd probably want to move Package.swift to the root since for >> whatever reason the Swift community (much to some people's annoyance) has >> chosen to have packages only really able to live at the top of a repo. This >> has less overhead from a CI/CD perspective, but lots of overhead for users >> as they'd be checking out the entire Beam repo to use the SDK, which >> happens a lot. >> >> There's a third option which is basically "do both" but honestly that >> just seems like the worst of both worlds as it would require constant >> syncing if we wanted to make it possible for Swift users to target >> unreleased SDKs for development and testing. >> >> Personally, I would lean towards the former option (and would volunteer >> to set up & document the various automations) as it is lighter for the >> actual users of the SDK and more consistent with the community experience >> they expect. The CI/CD stuff is mostly a "do it once" whereas checking out >> the entire repo with many updates the user doesn't care about is something >> they will be doing all the time. FWIW some of our dependencies also chose >> this route---most notably GRPC which started with the latter approach and >> has moved to the former. >> > I believe existing SDKs benefit from living in the same repo. For example, it's easier to keep them consistent with any model/proto changes and it's easier to manage distributions/tags. Also it's easier to keep components consistent for multi-lang. If we add Swift to a separate repo, we'll probably have to add tooling/scripts to keep things consistent. Is it possible to create a separate repo, but also add a reference (and Gradle tasks) under "beam/sdks/swift" so that we can add Beam tests to make sure that things stay consistent ? Thanks, Cham > >> Interested to hear any feedback on the subject since I'm guessing it >> probably came up with the Go SDK back in the day? >> >> Best, >> B >> >> >> >> On Tue, Aug 29, 2023 at 7:59 AM Byron Ellis <byronel...@google.com> >> wrote: >> >>> After a couple of iterations (thanks rebo!) we've also gotten the Swift >>> SDK working with the new Prism runner. The fact that it doesn't do fusion >>> caught a couple of configuration bugs (e.g. that the grpc message receiver >>> buffer should be fairly large). It would seem that at the moment Prism and >>> the Flink runner have similar orders of strictness when interpreting the >>> pipeline graph while the Python portable runner is far more forgiving. >>> >>> Also added support for bounded vs unbounded pcollections through the >>> "type" parameter when adding a pardo. Impulse is a bounded pcollection I >>> believe? >>> >>> On Fri, Aug 25, 2023 at 2:04 PM Byron Ellis <byronel...@google.com> >>> wrote: >>> >>>> Okay, after a brief detour through "get this working in the Flink >>>> Portable Runner" I think I have something pretty workable. >>>> >>>> PInput and POutput can actually be structs rather than protocols, which >>>> simplifies things quite a bit. It also allows us to use them with property >>>> wrappers for a SwiftUI-like experience if we want when defining DoFns >>>> (which is what I was originally intending to use them for). That also means >>>> the function signature you use for closures would match full-fledged DoFn >>>> definitions for the most part which is satisfying. >>>> >>>> >>>> >>>> On Thu, Aug 24, 2023 at 5:55 PM Byron Ellis <byronel...@google.com> >>>> wrote: >>>> >>>>> Okay, I tried a couple of different things. >>>>> >>>>> Implicitly passing the timestamp and window during iteration did not >>>>> go well. While physically possible it introduces an invisible side effect >>>>> into loop iteration which confused me when I tried to use it and I >>>>> implemented it. Also, I'm pretty sure there'd end up being some sort of >>>>> race condition nightmare continuing down that path. >>>>> >>>>> What I decided to do instead was the following: >>>>> >>>>> 1. Rename the existing "pardo" functions to "pstream" and require that >>>>> they always emit a window and timestamp along with their value. This >>>>> eliminates the side effect but lets us keep iteration in a bundle where >>>>> that might be convenient. For example, in my cheesy GCS implementation it >>>>> means that I can keep an OAuth token around for the lifetime of the bundle >>>>> as a local variable, which is convenient. It's a bit more typing for users >>>>> of pstream, but the expectation here is that if you're using pstream >>>>> functions You Know What You Are Doing and most people won't be using it >>>>> directly. >>>>> >>>>> 2. Introduce a new set of pardo functions (I didn't do all of them >>>>> yet, but enough to test the functionality and decide I liked it) which >>>>> take >>>>> a function signature of (any PInput<InputType>,any POutput<OutputType>). >>>>> PInput takes the (InputType,Date,Window) tuple and converts it into a >>>>> struct with friendlier names. Not strictly necessary, but makes the code >>>>> nicer to read I think. POutput introduces emit functions that optionally >>>>> allow you to specify a timestamp and a window. If you don't for either one >>>>> it will take the timestamp and/or window of the input. >>>>> >>>>> Trying to use that was pretty pleasant to use so I think we should >>>>> continue down that path. If you'd like to see it in use, I reimplemented >>>>> map() and flatMap() in terms of this new pardo functionality. >>>>> >>>>> Code has been pushed to the branch/PR if you're interested in taking a >>>>> look. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Aug 24, 2023 at 2:15 PM Byron Ellis <byronel...@google.com> >>>>> wrote: >>>>> >>>>>> Gotcha, I think there's a fairly easy solution to link input and >>>>>> output streams.... Let me try it out... might even be possible to have >>>>>> both >>>>>> element and stream-wise closure pardos. Definitely possible to have that >>>>>> at >>>>>> the DoFn level (called SerializableFn in the SDK because I want to >>>>>> use @DoFn as a macro) >>>>>> >>>>>> On Thu, Aug 24, 2023 at 1:09 PM Robert Bradshaw <rober...@google.com> >>>>>> wrote: >>>>>> >>>>>>> On Thu, Aug 24, 2023 at 12:58 PM Chamikara Jayalath < >>>>>>> chamik...@google.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Aug 24, 2023 at 12:27 PM Robert Bradshaw < >>>>>>>> rober...@google.com> wrote: >>>>>>>> >>>>>>>>> I would like to figure out a way to get the stream-y interface to >>>>>>>>> work, as I think it's more natural overall. >>>>>>>>> >>>>>>>>> One hypothesis is that if any elements are carried over loop >>>>>>>>> iterations, there will likely be some that are carried over beyond >>>>>>>>> the loop >>>>>>>>> (after all the callee doesn't know when the loop is supposed to end). >>>>>>>>> We >>>>>>>>> could reject "plain" elements that are emitted after this point, >>>>>>>>> requiring >>>>>>>>> one to emit timestamp-windowed-values. >>>>>>>>> >>>>>>>> >>>>>>>> Are you assuming that the same stream (or overlapping sets of data) >>>>>>>> are pushed to multiple workers ? I thought that the set of data >>>>>>>> streamed >>>>>>>> here are the data that belong to the current bundle (hence already >>>>>>>> assigned >>>>>>>> to the current worker) so any output from the current bundle invocation >>>>>>>> would be a valid output of that bundle. >>>>>>>> >>>>>>>>> >>>>>>> Yes, the content of the stream is exactly the contents of the >>>>>>> bundle. The question is how to do the input_element:output_element >>>>>>> correlation for automatically propagating metadata. >>>>>>> >>>>>>> >>>>>>>> Related to this, we could enforce that the only (user-accessible) >>>>>>>>> way to get such a timestamped value is to start with one, e.g. a >>>>>>>>> WindowedValue<T>.withValue(O) produces a WindowedValue<O> with the >>>>>>>>> same >>>>>>>>> metadata but a new value. Thus a user wanting to do anything "fancy" >>>>>>>>> would >>>>>>>>> have to explicitly request iteration over these windowed values >>>>>>>>> rather than >>>>>>>>> over the raw elements. (This is also forward compatible with >>>>>>>>> expanding the >>>>>>>>> metadata that can get attached, e.g. pane infos, and makes the right >>>>>>>>> thing >>>>>>>>> the easiest/most natural.) >>>>>>>>> >>>>>>>>> On Thu, Aug 24, 2023 at 12:10 PM Byron Ellis < >>>>>>>>> byronel...@google.com> wrote: >>>>>>>>> >>>>>>>>>> Ah, that is a good point—being element-wise would make managing >>>>>>>>>> windows and time stamps easier for the user. Fortunately it’s a >>>>>>>>>> fairly easy >>>>>>>>>> change to make and maybe even less typing for the user. I was >>>>>>>>>> originally >>>>>>>>>> thinking side inputs and metrics would happen outside the loop, but >>>>>>>>>> I think >>>>>>>>>> you want a class and not a closure at that point for sanity. >>>>>>>>>> >>>>>>>>>> On Thu, Aug 24, 2023 at 12:02 PM Robert Bradshaw < >>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>> >>>>>>>>>>> Ah, I see. >>>>>>>>>>> >>>>>>>>>>> Yeah, I've thought about using an iterable for the whole bundle >>>>>>>>>>> rather than start/finish bundle callbacks, but one of the questions >>>>>>>>>>> is how >>>>>>>>>>> that would impact implicit passing of the timestamp (and other) >>>>>>>>>>> metadata >>>>>>>>>>> from input elements to output elements. (You can of course attach >>>>>>>>>>> the >>>>>>>>>>> metadata to any output that happens in the loop body, but it's very >>>>>>>>>>> easy to >>>>>>>>>>> implicitly to break the 1:1 relationship here (e.g. by doing >>>>>>>>>>> buffering or >>>>>>>>>>> otherwise modifying local state) and this would be hard to detect. >>>>>>>>>>> (I >>>>>>>>>>> suppose trying to output after the loop finishes could require >>>>>>>>>>> something more explicit). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Aug 23, 2023 at 6:56 PM Byron Ellis < >>>>>>>>>>> byronel...@google.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Oh, I also forgot to mention that I included element-wise >>>>>>>>>>>> collection operations like "map" that eliminate the need for pardo >>>>>>>>>>>> in many >>>>>>>>>>>> cases. the groupBy command is actually a map + groupByKey under >>>>>>>>>>>> the hood. >>>>>>>>>>>> That was to be more consistent with Swift's collection protocol >>>>>>>>>>>> (and is >>>>>>>>>>>> also why PCollection and PCollectionStream are different types... >>>>>>>>>>>> PCollection implements map and friends as pipeline construction >>>>>>>>>>>> operations >>>>>>>>>>>> whereas PCollectionStream is an actual stream) >>>>>>>>>>>> >>>>>>>>>>>> I just happened to push some "IO primitives" that uses map >>>>>>>>>>>> rather than pardo in a couple of places to do a true wordcount >>>>>>>>>>>> using good >>>>>>>>>>>> ol' Shakespeare and very very primitive GCS IO. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> B >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Aug 23, 2023 at 6:08 PM Byron Ellis < >>>>>>>>>>>> byronel...@google.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Indeed :-) Yeah, I went back and forth on the pardo syntax >>>>>>>>>>>>> quite a bit before settling on where I ended up. Ultimately I >>>>>>>>>>>>> decided to go >>>>>>>>>>>>> with something that felt more Swift-y than anything else which >>>>>>>>>>>>> means that >>>>>>>>>>>>> rather than dealing with a single element like you do in the >>>>>>>>>>>>> other SDKs >>>>>>>>>>>>> you're dealing with a stream of elements (which of course will >>>>>>>>>>>>> often be of >>>>>>>>>>>>> size 1). That's a really natural paradigm in the Swift world >>>>>>>>>>>>> especially >>>>>>>>>>>>> with the async / await structures. So when you see something like: >>>>>>>>>>>>> >>>>>>>>>>>>> pardo(name:"Read Files") { filenames,output,errors in >>>>>>>>>>>>> >>>>>>>>>>>>> for try await (filename,_,_) in filenames { >>>>>>>>>>>>> ... >>>>>>>>>>>>> output.emit(data) >>>>>>>>>>>>> >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> filenames is the input stream and then output and errors are >>>>>>>>>>>>> both output streams. In theory you can have as many output >>>>>>>>>>>>> streams as you >>>>>>>>>>>>> like though at the moment there's a compiler bug in the new type >>>>>>>>>>>>> pack >>>>>>>>>>>>> feature that limits it to "as many as I felt like supporting". >>>>>>>>>>>>> Presumably >>>>>>>>>>>>> this will get fixed before the official 5.9 release which will >>>>>>>>>>>>> probably be >>>>>>>>>>>>> in the October timeframe if history is any guide) >>>>>>>>>>>>> >>>>>>>>>>>>> If you had parameterization you wanted to send that would look >>>>>>>>>>>>> like pardo("Parameter") { param,filenames,output,error in ... } >>>>>>>>>>>>> where >>>>>>>>>>>>> "param" would take on the value of "Parameter." All of this is >>>>>>>>>>>>> being >>>>>>>>>>>>> typechecked at compile time BTW. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> the (filename,_,_) is a tuple spreading construct like you >>>>>>>>>>>>> have in ES6 and other things where "_" is Swift for "ignore." In >>>>>>>>>>>>> this case >>>>>>>>>>>>> PCollectionStreams have an element signature of (Of,Date,Window) >>>>>>>>>>>>> so you can >>>>>>>>>>>>> optionally extract the timestamp and the window if you want to >>>>>>>>>>>>> manipulate >>>>>>>>>>>>> it somehow. >>>>>>>>>>>>> >>>>>>>>>>>>> That said it would also be natural to provide elementwise >>>>>>>>>>>>> pardos--- that would probably mean having explicit type >>>>>>>>>>>>> signatures in the >>>>>>>>>>>>> closure. I had that at one point, but it felt less natural the >>>>>>>>>>>>> more I used >>>>>>>>>>>>> it. I'm also slowly working towards adding a more "traditional" >>>>>>>>>>>>> DoFn >>>>>>>>>>>>> implementation approach where you implement the DoFn as an object >>>>>>>>>>>>> type. In >>>>>>>>>>>>> that case it would be very very easy to support both by having a >>>>>>>>>>>>> default >>>>>>>>>>>>> stream implementation call the equivalent of processElement. To >>>>>>>>>>>>> make that >>>>>>>>>>>>> performant I need to implement an @DoFn macro and I just haven't >>>>>>>>>>>>> gotten to >>>>>>>>>>>>> it yet. >>>>>>>>>>>>> >>>>>>>>>>>>> It's a bit more work and I've been prioritizing implementing >>>>>>>>>>>>> composite and external transforms for the reasons you suggest. >>>>>>>>>>>>> :-) I've got >>>>>>>>>>>>> the basics of a composite transform (there's an equivalent >>>>>>>>>>>>> wordcount >>>>>>>>>>>>> example) and am hooking it into the pipeline generation, which >>>>>>>>>>>>> should also >>>>>>>>>>>>> give me everything I need to successfully hook in external >>>>>>>>>>>>> transforms as >>>>>>>>>>>>> well. That will give me the jump on IOs as you say. I can also >>>>>>>>>>>>> treat the >>>>>>>>>>>>> pipeline itself as a composite transform which lets me get rid of >>>>>>>>>>>>> the >>>>>>>>>>>>> Pipeline { pipeline in ... } and just instead have things attach >>>>>>>>>>>>> themselves >>>>>>>>>>>>> to the pipeline implicitly. >>>>>>>>>>>>> >>>>>>>>>>>>> That said, there are some interesting IO possibilities that >>>>>>>>>>>>> would be Swift native. In particularly, I've been looking at the >>>>>>>>>>>>> native >>>>>>>>>>>>> Swift binding for DuckDB (which is C++ based). DuckDB is SQL >>>>>>>>>>>>> based but not >>>>>>>>>>>>> distributed in the same was as, say, Beam SQL... but it would >>>>>>>>>>>>> allow for SQL >>>>>>>>>>>>> statements on individual files with projection pushdown supported >>>>>>>>>>>>> for >>>>>>>>>>>>> things like Parquet which could have some cool and performant >>>>>>>>>>>>> data lake >>>>>>>>>>>>> applications. I'll probably do a couple of the simpler IOs as >>>>>>>>>>>>> well---there's a Swift AWS SDK binding that's pretty good that >>>>>>>>>>>>> would give >>>>>>>>>>>>> me S3 and there's a Cloud auth library as well that makes it >>>>>>>>>>>>> pretty easy to >>>>>>>>>>>>> work with GCS. >>>>>>>>>>>>> >>>>>>>>>>>>> In any case, I'm updating the branch as I find a minute here >>>>>>>>>>>>> and there. >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> B >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Aug 23, 2023 at 5:02 PM Robert Bradshaw < >>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Neat. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Nothing like writing and SDK to actually understand how the >>>>>>>>>>>>>> FnAPI works :). I like the use of groupBy. I have to admit I'm a >>>>>>>>>>>>>> bit >>>>>>>>>>>>>> mystified by the syntax for parDo (I don't know swift at all >>>>>>>>>>>>>> which is >>>>>>>>>>>>>> probably tripping me up). The addition of external >>>>>>>>>>>>>> (cross-language) >>>>>>>>>>>>>> transforms could let you steal everything (e.g. IOs) pretty >>>>>>>>>>>>>> quickly from >>>>>>>>>>>>>> other SDKs. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Aug 18, 2023 at 7:55 AM Byron Ellis via user < >>>>>>>>>>>>>> user@beam.apache.org> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> For everyone who is interested, here's the draft PR: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://github.com/apache/beam/pull/28062 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I haven't had a chance to test it on my M1 machine yet >>>>>>>>>>>>>>> though (there's a good chance there are a few places that need >>>>>>>>>>>>>>> to properly >>>>>>>>>>>>>>> address endianness. Specifically timestamps in windowed values >>>>>>>>>>>>>>> and length >>>>>>>>>>>>>>> in iterable coders as those both use specifically bigendian >>>>>>>>>>>>>>> representations) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Aug 17, 2023 at 8:57 PM Byron Ellis < >>>>>>>>>>>>>>> byronel...@google.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks Cham, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Definitely happy to open a draft PR so folks can >>>>>>>>>>>>>>>> comment---there's not as much code as it looks like since most >>>>>>>>>>>>>>>> of the LOC >>>>>>>>>>>>>>>> is just generated protobuf. As for the support, I definitely >>>>>>>>>>>>>>>> want to add >>>>>>>>>>>>>>>> external transforms and may actually add that support before >>>>>>>>>>>>>>>> adding the >>>>>>>>>>>>>>>> ability to make composites in the language itself. With the >>>>>>>>>>>>>>>> way the SDK is >>>>>>>>>>>>>>>> laid out adding composites to the pipeline graph is a separate >>>>>>>>>>>>>>>> operation >>>>>>>>>>>>>>>> than defining a composite. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Aug 17, 2023 at 4:28 PM Chamikara Jayalath < >>>>>>>>>>>>>>>> chamik...@google.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks Byron. This sounds great. I wonder if there is >>>>>>>>>>>>>>>>> interest in Swift SDK from folks currently subscribed to the >>>>>>>>>>>>>>>>> +user <user@beam.apache.org> list. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Aug 16, 2023 at 6:53 PM Byron Ellis via dev < >>>>>>>>>>>>>>>>> d...@beam.apache.org> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hello everyone, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> A couple of months ago I decided that I wanted to really >>>>>>>>>>>>>>>>>> understand how the Beam FnApi works and how it interacts >>>>>>>>>>>>>>>>>> with the Portable >>>>>>>>>>>>>>>>>> Runner. For me at least that usually means I need to write >>>>>>>>>>>>>>>>>> some code so I >>>>>>>>>>>>>>>>>> can see things happening in a debugger and to really prove >>>>>>>>>>>>>>>>>> to myself I >>>>>>>>>>>>>>>>>> understood what was going on I decided I couldn't use an >>>>>>>>>>>>>>>>>> existing SDK >>>>>>>>>>>>>>>>>> language to do it since there would be the temptation to >>>>>>>>>>>>>>>>>> read some code and >>>>>>>>>>>>>>>>>> convince myself that I actually understood what was going on. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> One thing led to another and it turns out that to get a >>>>>>>>>>>>>>>>>> minimal FnApi integration going you end up writing a fair >>>>>>>>>>>>>>>>>> bit of an SDK. So >>>>>>>>>>>>>>>>>> I decided to take things to a point where I had an SDK that >>>>>>>>>>>>>>>>>> could execute a >>>>>>>>>>>>>>>>>> word count example via a portable runner backend. I've now >>>>>>>>>>>>>>>>>> reached that >>>>>>>>>>>>>>>>>> point and would like to submit my prototype SDK to the list >>>>>>>>>>>>>>>>>> for feedback. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It's currently living in a branch on my fork here: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> https://github.com/byronellis/beam/tree/swift-sdk/sdks/swift >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> At the moment it runs via the most recent XCode Beta >>>>>>>>>>>>>>>>>> using Swift 5.9 on Intel Macs, but should also work using >>>>>>>>>>>>>>>>>> beta builds of >>>>>>>>>>>>>>>>>> 5.9 for Linux running on Intel hardware. I haven't had a >>>>>>>>>>>>>>>>>> chance to try it >>>>>>>>>>>>>>>>>> on ARM hardware and make sure all of the endian checks are >>>>>>>>>>>>>>>>>> complete. The >>>>>>>>>>>>>>>>>> "IntegrationTests.swift" file contains a word count example >>>>>>>>>>>>>>>>>> that reads some >>>>>>>>>>>>>>>>>> local files (as well as a missing file to exercise DLQ >>>>>>>>>>>>>>>>>> functionality) and >>>>>>>>>>>>>>>>>> output counts through two separate group by operations to >>>>>>>>>>>>>>>>>> get it past the >>>>>>>>>>>>>>>>>> "map reduce" size of pipeline. I've tested it against the >>>>>>>>>>>>>>>>>> Python Portable >>>>>>>>>>>>>>>>>> Runner. Since my goal was to learn FnApi there is no Direct >>>>>>>>>>>>>>>>>> Runner at this >>>>>>>>>>>>>>>>>> time. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I've shown it to a couple of folks already and >>>>>>>>>>>>>>>>>> incorporated some of that feedback already (for example >>>>>>>>>>>>>>>>>> pardo was >>>>>>>>>>>>>>>>>> originally called dofn when defining pipelines). In general >>>>>>>>>>>>>>>>>> I've tried to >>>>>>>>>>>>>>>>>> make the API as "Swift-y" as possible, hence the heavy >>>>>>>>>>>>>>>>>> reliance on closures >>>>>>>>>>>>>>>>>> and while there aren't yet composite PTransforms there's the >>>>>>>>>>>>>>>>>> beginnings of >>>>>>>>>>>>>>>>>> what would be needed for a SwiftUI-like declarative API for >>>>>>>>>>>>>>>>>> creating them. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> There are of course a ton of missing bits still to be >>>>>>>>>>>>>>>>>> implemented, like counters, metrics, windowing, state, >>>>>>>>>>>>>>>>>> timers, etc. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This should be fine and we can get the code documented >>>>>>>>>>>>>>>>> without these features. I think support for composites and >>>>>>>>>>>>>>>>> adding an >>>>>>>>>>>>>>>>> external transform (see, Java >>>>>>>>>>>>>>>>> <https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java>, >>>>>>>>>>>>>>>>> Python >>>>>>>>>>>>>>>>> <https://github.com/apache/beam/blob/c7b7921185686da573f76ce7320817c32375c7d0/sdks/python/apache_beam/transforms/external.py#L556>, >>>>>>>>>>>>>>>>> Go >>>>>>>>>>>>>>>>> <https://github.com/apache/beam/blob/c7b7921185686da573f76ce7320817c32375c7d0/sdks/go/pkg/beam/xlang.go#L155>, >>>>>>>>>>>>>>>>> TypeScript >>>>>>>>>>>>>>>>> <https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/transforms/external.ts>) >>>>>>>>>>>>>>>>> to add support for multi-lang will bring in a lot of features >>>>>>>>>>>>>>>>> (for example, >>>>>>>>>>>>>>>>> I/O connectors) for free. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Any and all feedback welcome and happy to submit a PR if >>>>>>>>>>>>>>>>>> folks are interested, though the "Swift Way" would be to >>>>>>>>>>>>>>>>>> have it in its own >>>>>>>>>>>>>>>>>> repo so that it can easily be used from the Swift Package >>>>>>>>>>>>>>>>>> Manager. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> +1 for creating a PR (may be as a draft initially). Also >>>>>>>>>>>>>>>>> it'll be easier to comment on a PR :) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - Cham >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>> B >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>