I would say the first actual release would still be a ways out, though this would make it easier to contribute and accelerate that process :-) (I am implicitly +1 of course if that matters). FWIW the release process for Swift libraries is "create a github tag" + any relevant testing.
On Mon, Sep 25, 2023 at 10:07 AM Valentyn Tymofieiev via user < user@beam.apache.org> wrote: > Do we anticipate any short-term changes to the release process to start > releasing switft SDK artifacts or we can hold that off for a certain time > while SDK is in active development? > > On Mon, Sep 25, 2023 at 9:56 AM Robert Burke <rob...@frantil.com> wrote: > >> I lost this thread for a bit. I'm glad Prism showed some use while it's >> doing unfused stages! >> >> I have no objections to a separate repo, and in a "Beam Go SDK V3" world >> that's what I'd want as well, because it works better for the Go usage >> patterns and is more natural for the tooling. And it would be a cleaner way >> to do a full overhaul of the user API given the way Go has evolved since >> it's initial design, and our own experience with it. But that's a very >> different topic for when I have a real proposal around it. >> >> I do see the clean thread Kenn started, but since i have no objections, >> I'll leave it to silent consensus. >> >> I agree that copying/building the protos isn't a burden, since that's >> entirely what protos are for. We're already treating them as properly >> stable and not making breaking proto, so compatibility is maintained by >> normal proto behavior. >> >> Robert Burke >> Beam Go Busybody >> >> On Thu, Sep 21, 2023, 9:52 AM Byron Ellis via user <user@beam.apache.org> >> wrote: >> >>> Also, seems like we're getting something like a consensus? One the repo >>> exists I'm happy to do the slog work of moving everything around (though >>> I'm not a committer so somebody else actually has to do the pushes). We can >>> do that in chunks to make life easier on people and I'm not super concerned >>> with losing the commit history on my current branch >>> >>> On Wed, Sep 20, 2023 at 11:10 AM Byron Ellis <byronel...@google.com> >>> wrote: >>> >>>> I actually don't think we'll need any of the multi-repo github actions, >>>> Swift packages are basically 1:1 with repos so the build process will >>>> actually do all the checkouts. What we'd do is put a test package in the >>>> sdks/swift, which works fine since it doesn't ever get used as a dependency >>>> that depends on the swift SDKs with the appropriate dependencies we want to >>>> make sure we're testing. This should also catch breaking changes to the >>>> protos (which in theory proto is helping us avoid). >>>> >>>> Syncing the protos hasn't been a huge deal and it's already scripted so >>>> definitely easy to automate. I also don't think we would want to do that >>>> all the time anyway as that would require pipeline authors to install >>>> protoc for something that doesn't happen all that often. We can take care >>>> of that for users. >>>> >>>> >>>> On Wed, Sep 20, 2023 at 10:48 AM Danny McCormick < >>>> dannymccorm...@google.com> wrote: >>>> >>>>> > I think the process should be similar to other code/design reviews >>>>> for large contributions. I don't think you need a PMC involvement here. >>>>> >>>>> I think it does require PMC involvement to create the actual repo once >>>>> we have public consensus. I tried the flow at >>>>> https://infra.apache.org/version-control.html#create but it seems >>>>> like its PMC only. It's unclear to me if consensus has been achieved, >>>>> maybe >>>>> a dedicated voting thread with implied lazy consensus would help here. >>>>> >>>>> > Sure, we could definitely include things as a submodule for stuff >>>>> like testing multi-language, though I think there's actually a cleaner way >>>>> just using the Swift package manager's test facilities to access the swift >>>>> sdk repo. >>>>> >>>>> +1 on avoiding submodules. If needed we could also use multi-repo >>>>> checkout with GitHub Actions. I think my biggest question is what we'd >>>>> actually be enforcing though. In general, I'd expect the normal update >>>>> flow >>>>> to be >>>>> >>>>> 1) Update Beam protos and/or multi-lang components (though the set of >>>>> things that needs updated for multi-lang is unclear to me) >>>>> 2) Mirror those changes to the Swift SDK. >>>>> >>>>> The thing that is most likely to be forgotten is the 2nd step, and >>>>> that is hard to enforce with automation since the automation would either >>>>> be on the first step which doesn't have anything to enforce or on some >>>>> sort >>>>> of schedule in the swift repo, which is less likely to be visible. I'm a >>>>> little worried we wouldn't notice breakages until release time. >>>>> >>>>> I wonder how much stuff happens outside of the proto directory that >>>>> needs to be mirrored. Could we just create scheduled automation to exactly >>>>> copy changes in the proto directory and version changes for multi-lang >>>>> stuff to the swift SDK repo? >>>>> >>>>> --------------------------------------------------------------------- >>>>> >>>>> Regardless, I'm +1 on a dedicated repo; I'd rather we take on some >>>>> organizational weirdness than push that pain to users. >>>>> >>>>> Thanks, >>>>> Danny >>>>> >>>>> On Wed, Sep 20, 2023 at 1:38 PM Byron Ellis via user < >>>>> user@beam.apache.org> wrote: >>>>> >>>>>> Sure, we could definitely include things as a submodule for stuff >>>>>> like testing multi-language, though I think there's actually a cleaner >>>>>> way >>>>>> just using the Swift package manager's test facilities to access the >>>>>> swift >>>>>> sdk repo. >>>>>> >>>>>> That would also be consistent with the user-side experience and let >>>>>> us test things like build-time integrations with multi-language as well >>>>>> (which is possible in Swift through compiler plugins) in the same way as >>>>>> a >>>>>> pipeline author would. You also maybe get backwards compatibility testing >>>>>> as a side effect in that case as well. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Sep 20, 2023 at 10:20 AM Chamikara Jayalath < >>>>>> chamik...@google.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Sep 20, 2023 at 9:54 AM Byron Ellis <byronel...@google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I've chatted with a couple of people offline about this and my >>>>>>>> impression is that folks are generally amenable to a separate repo to >>>>>>>> match >>>>>>>> the target community? I have no idea what the next steps would be >>>>>>>> though >>>>>>>> other than guessing that there's probably some sort of PMC thing >>>>>>>> involved? >>>>>>>> Should I write something up somewhere? >>>>>>>> >>>>>>> >>>>>>> I think the process should be similar to other code/design reviews >>>>>>> for large contributions. I don't think you need a PMC involvement here. >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Best, >>>>>>>> B >>>>>>>> >>>>>>>> On Thu, Sep 14, 2023 at 9:00 AM Byron Ellis <byronel...@google.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I've been on vacation, but mostly working on getting External >>>>>>>>> Transform support going (which in turn basically requires Schema >>>>>>>>> support as >>>>>>>>> well). It also looks like macros landed in Swift 5.9 for Linux so >>>>>>>>> we'll be >>>>>>>>> able to use those to do some compile-time automation. In particular, >>>>>>>>> this >>>>>>>>> lets us do something similar to what Java does with ByteBuddy for >>>>>>>>> generating schema coders though it has to be ahead of time so not >>>>>>>>> quite the >>>>>>>>> same. (As far as I can tell this is a reason why macros got added to >>>>>>>>> the >>>>>>>>> language in the first place---Apple's SwiftData library makes heavy >>>>>>>>> use of >>>>>>>>> the feature). >>>>>>>>> >>>>>>>>> I do have one question for the group though: should the Swift SDK >>>>>>>>> distribution take on Beam community properties or Swift community >>>>>>>>> properties? Specifically, in the Swift world the Swift SDK would live >>>>>>>>> in >>>>>>>>> its own repo (beam-swift for example), which allows it to be most >>>>>>>>> easily >>>>>>>>> consumed and keeps the checkout size under control for users. >>>>>>>>> "Releases" in >>>>>>>>> the Swift world (much like Go) are just repo tags. The downside here >>>>>>>>> is >>>>>>>>> that there's overhead in setting up the various github actions and >>>>>>>>> other >>>>>>>>> CI/CD bits and bobs. >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>>> The alternative would be to keep it in the beam repo itself like it >>>>>>>>> is now, but we'd probably want to move Package.swift to the root >>>>>>>>> since for >>>>>>>>> whatever reason the Swift community (much to some people's annoyance) >>>>>>>>> has >>>>>>>>> chosen to have packages only really able to live at the top of a >>>>>>>>> repo. This >>>>>>>>> has less overhead from a CI/CD perspective, but lots of overhead for >>>>>>>>> users >>>>>>>>> as they'd be checking out the entire Beam repo to use the SDK, which >>>>>>>>> happens a lot. >>>>>>>>> >>>>>>>>> There's a third option which is basically "do both" but honestly >>>>>>>>> that just seems like the worst of both worlds as it would require >>>>>>>>> constant >>>>>>>>> syncing if we wanted to make it possible for Swift users to target >>>>>>>>> unreleased SDKs for development and testing. >>>>>>>>> >>>>>>>>> Personally, I would lean towards the former option (and would >>>>>>>>> volunteer to set up & document the various automations) as it is >>>>>>>>> lighter >>>>>>>>> for the actual users of the SDK and more consistent with the community >>>>>>>>> experience they expect. The CI/CD stuff is mostly a "do it once" >>>>>>>>> whereas >>>>>>>>> checking out the entire repo with many updates the user doesn't care >>>>>>>>> about >>>>>>>>> is something they will be doing all the time. FWIW some of our >>>>>>>>> dependencies >>>>>>>>> also chose this route---most notably GRPC which started with the >>>>>>>>> latter >>>>>>>>> approach and has moved to the former. >>>>>>>>> >>>>>>>> >>>>>>> I believe existing SDKs benefit from living in the same repo. For >>>>>>> example, it's easier to keep them consistent with any model/proto >>>>>>> changes >>>>>>> and it's easier to manage distributions/tags. Also it's easier to keep >>>>>>> components consistent for multi-lang. If we add Swift to a separate >>>>>>> repo, >>>>>>> we'll probably have to add tooling/scripts to keep things consistent. >>>>>>> Is it possible to create a separate repo, but also add a reference >>>>>>> (and Gradle tasks) under "beam/sdks/swift" so that we can add Beam >>>>>>> tests to >>>>>>> make sure that things stay consistent ? >>>>>>> >>>>>>> Thanks, >>>>>>> Cham >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> Interested to hear any feedback on the subject since I'm guessing >>>>>>>>> it probably came up with the Go SDK back in the day? >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> B >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Aug 29, 2023 at 7:59 AM Byron Ellis <byronel...@google.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> After a couple of iterations (thanks rebo!) we've also gotten the >>>>>>>>>> Swift SDK working with the new Prism runner. The fact that it >>>>>>>>>> doesn't do >>>>>>>>>> fusion caught a couple of configuration bugs (e.g. that the grpc >>>>>>>>>> message >>>>>>>>>> receiver buffer should be fairly large). It would seem that at the >>>>>>>>>> moment >>>>>>>>>> Prism and the Flink runner have similar orders of strictness when >>>>>>>>>> interpreting the pipeline graph while the Python portable runner is >>>>>>>>>> far >>>>>>>>>> more forgiving. >>>>>>>>>> >>>>>>>>>> Also added support for bounded vs unbounded pcollections through >>>>>>>>>> the "type" parameter when adding a pardo. Impulse is a bounded >>>>>>>>>> pcollection >>>>>>>>>> I believe? >>>>>>>>>> >>>>>>>>>> On Fri, Aug 25, 2023 at 2:04 PM Byron Ellis < >>>>>>>>>> byronel...@google.com> wrote: >>>>>>>>>> >>>>>>>>>>> Okay, after a brief detour through "get this working in the >>>>>>>>>>> Flink Portable Runner" I think I have something pretty workable. >>>>>>>>>>> >>>>>>>>>>> PInput and POutput can actually be structs rather than >>>>>>>>>>> protocols, which simplifies things quite a bit. It also allows us >>>>>>>>>>> to use >>>>>>>>>>> them with property wrappers for a SwiftUI-like experience if we >>>>>>>>>>> want when >>>>>>>>>>> defining DoFns (which is what I was originally intending to use >>>>>>>>>>> them for). >>>>>>>>>>> That also means the function signature you use for closures would >>>>>>>>>>> match >>>>>>>>>>> full-fledged DoFn definitions for the most part which is satisfying. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Aug 24, 2023 at 5:55 PM Byron Ellis < >>>>>>>>>>> byronel...@google.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Okay, I tried a couple of different things. >>>>>>>>>>>> >>>>>>>>>>>> Implicitly passing the timestamp and window during iteration >>>>>>>>>>>> did not go well. While physically possible it introduces an >>>>>>>>>>>> invisible side >>>>>>>>>>>> effect into loop iteration which confused me when I tried to use >>>>>>>>>>>> it and I >>>>>>>>>>>> implemented it. Also, I'm pretty sure there'd end up being some >>>>>>>>>>>> sort of >>>>>>>>>>>> race condition nightmare continuing down that path. >>>>>>>>>>>> >>>>>>>>>>>> What I decided to do instead was the following: >>>>>>>>>>>> >>>>>>>>>>>> 1. Rename the existing "pardo" functions to "pstream" and >>>>>>>>>>>> require that they always emit a window and timestamp along with >>>>>>>>>>>> their >>>>>>>>>>>> value. This eliminates the side effect but lets us keep iteration >>>>>>>>>>>> in a >>>>>>>>>>>> bundle where that might be convenient. For example, in my cheesy >>>>>>>>>>>> GCS >>>>>>>>>>>> implementation it means that I can keep an OAuth token around for >>>>>>>>>>>> the >>>>>>>>>>>> lifetime of the bundle as a local variable, which is convenient. >>>>>>>>>>>> It's a bit >>>>>>>>>>>> more typing for users of pstream, but the expectation here is that >>>>>>>>>>>> if >>>>>>>>>>>> you're using pstream functions You Know What You Are Doing and >>>>>>>>>>>> most people >>>>>>>>>>>> won't be using it directly. >>>>>>>>>>>> >>>>>>>>>>>> 2. Introduce a new set of pardo functions (I didn't do all of >>>>>>>>>>>> them yet, but enough to test the functionality and decide I liked >>>>>>>>>>>> it) which >>>>>>>>>>>> take a function signature of (any PInput<InputType>,any >>>>>>>>>>>> POutput<OutputType>). PInput takes the (InputType,Date,Window) >>>>>>>>>>>> tuple and >>>>>>>>>>>> converts it into a struct with friendlier names. Not strictly >>>>>>>>>>>> necessary, >>>>>>>>>>>> but makes the code nicer to read I think. POutput introduces emit >>>>>>>>>>>> functions >>>>>>>>>>>> that optionally allow you to specify a timestamp and a window. If >>>>>>>>>>>> you don't >>>>>>>>>>>> for either one it will take the timestamp and/or window of the >>>>>>>>>>>> input. >>>>>>>>>>>> >>>>>>>>>>>> Trying to use that was pretty pleasant to use so I think we >>>>>>>>>>>> should continue down that path. If you'd like to see it in use, I >>>>>>>>>>>> reimplemented map() and flatMap() in terms of this new pardo >>>>>>>>>>>> functionality. >>>>>>>>>>>> >>>>>>>>>>>> Code has been pushed to the branch/PR if you're interested in >>>>>>>>>>>> taking a look. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Aug 24, 2023 at 2:15 PM Byron Ellis < >>>>>>>>>>>> byronel...@google.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Gotcha, I think there's a fairly easy solution to link input >>>>>>>>>>>>> and output streams.... Let me try it out... might even be >>>>>>>>>>>>> possible to have >>>>>>>>>>>>> both element and stream-wise closure pardos. Definitely possible >>>>>>>>>>>>> to have >>>>>>>>>>>>> that at the DoFn level (called SerializableFn in the SDK because >>>>>>>>>>>>> I want to >>>>>>>>>>>>> use @DoFn as a macro) >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Aug 24, 2023 at 1:09 PM Robert Bradshaw < >>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Aug 24, 2023 at 12:58 PM Chamikara Jayalath < >>>>>>>>>>>>>> chamik...@google.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Aug 24, 2023 at 12:27 PM Robert Bradshaw < >>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I would like to figure out a way to get the stream-y >>>>>>>>>>>>>>>> interface to work, as I think it's more natural overall. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> One hypothesis is that if any elements are carried over >>>>>>>>>>>>>>>> loop iterations, there will likely be some that are carried >>>>>>>>>>>>>>>> over beyond the >>>>>>>>>>>>>>>> loop (after all the callee doesn't know when the loop is >>>>>>>>>>>>>>>> supposed to end). >>>>>>>>>>>>>>>> We could reject "plain" elements that are emitted after this >>>>>>>>>>>>>>>> point, >>>>>>>>>>>>>>>> requiring one to emit timestamp-windowed-values. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Are you assuming that the same stream (or overlapping sets >>>>>>>>>>>>>>> of data) are pushed to multiple workers ? I thought that the >>>>>>>>>>>>>>> set of data >>>>>>>>>>>>>>> streamed here are the data that belong to the current bundle >>>>>>>>>>>>>>> (hence already >>>>>>>>>>>>>>> assigned to the current worker) so any output from the current >>>>>>>>>>>>>>> bundle >>>>>>>>>>>>>>> invocation would be a valid output of that bundle. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, the content of the stream is exactly the contents of the >>>>>>>>>>>>>> bundle. The question is how to do the >>>>>>>>>>>>>> input_element:output_element >>>>>>>>>>>>>> correlation for automatically propagating metadata. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Related to this, we could enforce that the only >>>>>>>>>>>>>>>> (user-accessible) way to get such a timestamped value is to >>>>>>>>>>>>>>>> start with one, >>>>>>>>>>>>>>>> e.g. a WindowedValue<T>.withValue(O) produces a >>>>>>>>>>>>>>>> WindowedValue<O> with the >>>>>>>>>>>>>>>> same metadata but a new value. Thus a user wanting to do >>>>>>>>>>>>>>>> anything "fancy" >>>>>>>>>>>>>>>> would have to explicitly request iteration over these windowed >>>>>>>>>>>>>>>> values >>>>>>>>>>>>>>>> rather than over the raw elements. (This is also forward >>>>>>>>>>>>>>>> compatible with >>>>>>>>>>>>>>>> expanding the metadata that can get attached, e.g. pane infos, >>>>>>>>>>>>>>>> and makes >>>>>>>>>>>>>>>> the right thing the easiest/most natural.) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Aug 24, 2023 at 12:10 PM Byron Ellis < >>>>>>>>>>>>>>>> byronel...@google.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Ah, that is a good point—being element-wise would make >>>>>>>>>>>>>>>>> managing windows and time stamps easier for the user. >>>>>>>>>>>>>>>>> Fortunately it’s a >>>>>>>>>>>>>>>>> fairly easy change to make and maybe even less typing for the >>>>>>>>>>>>>>>>> user. I was >>>>>>>>>>>>>>>>> originally thinking side inputs and metrics would happen >>>>>>>>>>>>>>>>> outside the loop, >>>>>>>>>>>>>>>>> but I think you want a class and not a closure at that point >>>>>>>>>>>>>>>>> for sanity. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Aug 24, 2023 at 12:02 PM Robert Bradshaw < >>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Ah, I see. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Yeah, I've thought about using an iterable for the whole >>>>>>>>>>>>>>>>>> bundle rather than start/finish bundle callbacks, but one of >>>>>>>>>>>>>>>>>> the questions >>>>>>>>>>>>>>>>>> is how that would impact implicit passing of the timestamp >>>>>>>>>>>>>>>>>> (and other) >>>>>>>>>>>>>>>>>> metadata from input elements to output elements. (You can of >>>>>>>>>>>>>>>>>> course attach >>>>>>>>>>>>>>>>>> the metadata to any output that happens in the loop body, >>>>>>>>>>>>>>>>>> but it's very >>>>>>>>>>>>>>>>>> easy to implicitly to break the 1:1 relationship here (e.g. >>>>>>>>>>>>>>>>>> by doing >>>>>>>>>>>>>>>>>> buffering or otherwise modifying local state) and this would >>>>>>>>>>>>>>>>>> be hard to >>>>>>>>>>>>>>>>>> detect. (I suppose trying to output after the loop finishes >>>>>>>>>>>>>>>>>> could require >>>>>>>>>>>>>>>>>> something more explicit). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Aug 23, 2023 at 6:56 PM Byron Ellis < >>>>>>>>>>>>>>>>>> byronel...@google.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Oh, I also forgot to mention that I included >>>>>>>>>>>>>>>>>>> element-wise collection operations like "map" that >>>>>>>>>>>>>>>>>>> eliminate the need for >>>>>>>>>>>>>>>>>>> pardo in many cases. the groupBy command is actually a map >>>>>>>>>>>>>>>>>>> + groupByKey >>>>>>>>>>>>>>>>>>> under the hood. That was to be more consistent with Swift's >>>>>>>>>>>>>>>>>>> collection >>>>>>>>>>>>>>>>>>> protocol (and is also why PCollection and PCollectionStream >>>>>>>>>>>>>>>>>>> are different >>>>>>>>>>>>>>>>>>> types... PCollection implements map and friends as pipeline >>>>>>>>>>>>>>>>>>> construction >>>>>>>>>>>>>>>>>>> operations whereas PCollectionStream is an actual stream) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I just happened to push some "IO primitives" that uses >>>>>>>>>>>>>>>>>>> map rather than pardo in a couple of places to do a true >>>>>>>>>>>>>>>>>>> wordcount using >>>>>>>>>>>>>>>>>>> good ol' Shakespeare and very very primitive GCS IO. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>> B >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Aug 23, 2023 at 6:08 PM Byron Ellis < >>>>>>>>>>>>>>>>>>> byronel...@google.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Indeed :-) Yeah, I went back and forth on the pardo >>>>>>>>>>>>>>>>>>>> syntax quite a bit before settling on where I ended up. >>>>>>>>>>>>>>>>>>>> Ultimately I >>>>>>>>>>>>>>>>>>>> decided to go with something that felt more Swift-y than >>>>>>>>>>>>>>>>>>>> anything else >>>>>>>>>>>>>>>>>>>> which means that rather than dealing with a single element >>>>>>>>>>>>>>>>>>>> like you do in >>>>>>>>>>>>>>>>>>>> the other SDKs you're dealing with a stream of elements >>>>>>>>>>>>>>>>>>>> (which of course >>>>>>>>>>>>>>>>>>>> will often be of size 1). That's a really natural paradigm >>>>>>>>>>>>>>>>>>>> in the Swift >>>>>>>>>>>>>>>>>>>> world especially with the async / await structures. So >>>>>>>>>>>>>>>>>>>> when you see >>>>>>>>>>>>>>>>>>>> something like: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> pardo(name:"Read Files") { filenames,output,errors in >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> for try await (filename,_,_) in filenames { >>>>>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>>>> output.emit(data) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> filenames is the input stream and then output and >>>>>>>>>>>>>>>>>>>> errors are both output streams. In theory you can have as >>>>>>>>>>>>>>>>>>>> many output >>>>>>>>>>>>>>>>>>>> streams as you like though at the moment there's a >>>>>>>>>>>>>>>>>>>> compiler bug in the new >>>>>>>>>>>>>>>>>>>> type pack feature that limits it to "as many as I felt >>>>>>>>>>>>>>>>>>>> like supporting". >>>>>>>>>>>>>>>>>>>> Presumably this will get fixed before the official 5.9 >>>>>>>>>>>>>>>>>>>> release which will >>>>>>>>>>>>>>>>>>>> probably be in the October timeframe if history is any >>>>>>>>>>>>>>>>>>>> guide) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> If you had parameterization you wanted to send that >>>>>>>>>>>>>>>>>>>> would look like pardo("Parameter") { >>>>>>>>>>>>>>>>>>>> param,filenames,output,error in ... } >>>>>>>>>>>>>>>>>>>> where "param" would take on the value of "Parameter." All >>>>>>>>>>>>>>>>>>>> of this is being >>>>>>>>>>>>>>>>>>>> typechecked at compile time BTW. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> the (filename,_,_) is a tuple spreading construct like >>>>>>>>>>>>>>>>>>>> you have in ES6 and other things where "_" is Swift for >>>>>>>>>>>>>>>>>>>> "ignore." In this >>>>>>>>>>>>>>>>>>>> case PCollectionStreams have an element signature of >>>>>>>>>>>>>>>>>>>> (Of,Date,Window) so >>>>>>>>>>>>>>>>>>>> you can optionally extract the timestamp and the window if >>>>>>>>>>>>>>>>>>>> you want to >>>>>>>>>>>>>>>>>>>> manipulate it somehow. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> That said it would also be natural to provide >>>>>>>>>>>>>>>>>>>> elementwise pardos--- that would probably mean having >>>>>>>>>>>>>>>>>>>> explicit type >>>>>>>>>>>>>>>>>>>> signatures in the closure. I had that at one point, but it >>>>>>>>>>>>>>>>>>>> felt less >>>>>>>>>>>>>>>>>>>> natural the more I used it. I'm also slowly working >>>>>>>>>>>>>>>>>>>> towards adding a more >>>>>>>>>>>>>>>>>>>> "traditional" DoFn implementation approach where you >>>>>>>>>>>>>>>>>>>> implement the DoFn as >>>>>>>>>>>>>>>>>>>> an object type. In that case it would be very very easy to >>>>>>>>>>>>>>>>>>>> support both by >>>>>>>>>>>>>>>>>>>> having a default stream implementation call the equivalent >>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>> processElement. To make that performant I need to >>>>>>>>>>>>>>>>>>>> implement an @DoFn macro >>>>>>>>>>>>>>>>>>>> and I just haven't gotten to it yet. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> It's a bit more work and I've been prioritizing >>>>>>>>>>>>>>>>>>>> implementing composite and external transforms for the >>>>>>>>>>>>>>>>>>>> reasons you suggest. >>>>>>>>>>>>>>>>>>>> :-) I've got the basics of a composite transform (there's >>>>>>>>>>>>>>>>>>>> an equivalent >>>>>>>>>>>>>>>>>>>> wordcount example) and am hooking it into the pipeline >>>>>>>>>>>>>>>>>>>> generation, which >>>>>>>>>>>>>>>>>>>> should also give me everything I need to successfully hook >>>>>>>>>>>>>>>>>>>> in external >>>>>>>>>>>>>>>>>>>> transforms as well. That will give me the jump on IOs as >>>>>>>>>>>>>>>>>>>> you say. I can >>>>>>>>>>>>>>>>>>>> also treat the pipeline itself as a composite transform >>>>>>>>>>>>>>>>>>>> which lets me get >>>>>>>>>>>>>>>>>>>> rid of the Pipeline { pipeline in ... } and just instead >>>>>>>>>>>>>>>>>>>> have things attach >>>>>>>>>>>>>>>>>>>> themselves to the pipeline implicitly. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> That said, there are some interesting IO possibilities >>>>>>>>>>>>>>>>>>>> that would be Swift native. In particularly, I've been >>>>>>>>>>>>>>>>>>>> looking at the >>>>>>>>>>>>>>>>>>>> native Swift binding for DuckDB (which is C++ based). >>>>>>>>>>>>>>>>>>>> DuckDB is SQL based >>>>>>>>>>>>>>>>>>>> but not distributed in the same was as, say, Beam SQL... >>>>>>>>>>>>>>>>>>>> but it would allow >>>>>>>>>>>>>>>>>>>> for SQL statements on individual files with projection >>>>>>>>>>>>>>>>>>>> pushdown supported >>>>>>>>>>>>>>>>>>>> for things like Parquet which could have some cool and >>>>>>>>>>>>>>>>>>>> performant data lake >>>>>>>>>>>>>>>>>>>> applications. I'll probably do a couple of the simpler IOs >>>>>>>>>>>>>>>>>>>> as >>>>>>>>>>>>>>>>>>>> well---there's a Swift AWS SDK binding that's pretty good >>>>>>>>>>>>>>>>>>>> that would give >>>>>>>>>>>>>>>>>>>> me S3 and there's a Cloud auth library as well that makes >>>>>>>>>>>>>>>>>>>> it pretty easy to >>>>>>>>>>>>>>>>>>>> work with GCS. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> In any case, I'm updating the branch as I find a minute >>>>>>>>>>>>>>>>>>>> here and there. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>> B >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Wed, Aug 23, 2023 at 5:02 PM Robert Bradshaw < >>>>>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Neat. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Nothing like writing and SDK to actually understand >>>>>>>>>>>>>>>>>>>>> how the FnAPI works :). I like the use of groupBy. I have >>>>>>>>>>>>>>>>>>>>> to admit I'm a >>>>>>>>>>>>>>>>>>>>> bit mystified by the syntax for parDo (I don't know swift >>>>>>>>>>>>>>>>>>>>> at all which is >>>>>>>>>>>>>>>>>>>>> probably tripping me up). The addition of external >>>>>>>>>>>>>>>>>>>>> (cross-language) >>>>>>>>>>>>>>>>>>>>> transforms could let you steal everything (e.g. IOs) >>>>>>>>>>>>>>>>>>>>> pretty quickly from >>>>>>>>>>>>>>>>>>>>> other SDKs. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Fri, Aug 18, 2023 at 7:55 AM Byron Ellis via user < >>>>>>>>>>>>>>>>>>>>> user@beam.apache.org> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> For everyone who is interested, here's the draft PR: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/beam/pull/28062 >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I haven't had a chance to test it on my M1 machine >>>>>>>>>>>>>>>>>>>>>> yet though (there's a good chance there are a few places >>>>>>>>>>>>>>>>>>>>>> that need to >>>>>>>>>>>>>>>>>>>>>> properly address endianness. Specifically timestamps in >>>>>>>>>>>>>>>>>>>>>> windowed values and >>>>>>>>>>>>>>>>>>>>>> length in iterable coders as those both use specifically >>>>>>>>>>>>>>>>>>>>>> bigendian representations) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 17, 2023 at 8:57 PM Byron Ellis < >>>>>>>>>>>>>>>>>>>>>> byronel...@google.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks Cham, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Definitely happy to open a draft PR so folks can >>>>>>>>>>>>>>>>>>>>>>> comment---there's not as much code as it looks like >>>>>>>>>>>>>>>>>>>>>>> since most of the LOC >>>>>>>>>>>>>>>>>>>>>>> is just generated protobuf. As for the support, I >>>>>>>>>>>>>>>>>>>>>>> definitely want to add >>>>>>>>>>>>>>>>>>>>>>> external transforms and may actually add that support >>>>>>>>>>>>>>>>>>>>>>> before adding the >>>>>>>>>>>>>>>>>>>>>>> ability to make composites in the language itself. With >>>>>>>>>>>>>>>>>>>>>>> the way the SDK is >>>>>>>>>>>>>>>>>>>>>>> laid out adding composites to the pipeline graph is a >>>>>>>>>>>>>>>>>>>>>>> separate operation >>>>>>>>>>>>>>>>>>>>>>> than defining a composite. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Aug 17, 2023 at 4:28 PM Chamikara Jayalath < >>>>>>>>>>>>>>>>>>>>>>> chamik...@google.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks Byron. This sounds great. I wonder if there >>>>>>>>>>>>>>>>>>>>>>>> is interest in Swift SDK from folks currently >>>>>>>>>>>>>>>>>>>>>>>> subscribed to the >>>>>>>>>>>>>>>>>>>>>>>> +user <user@beam.apache.org> list. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, Aug 16, 2023 at 6:53 PM Byron Ellis via dev >>>>>>>>>>>>>>>>>>>>>>>> <d...@beam.apache.org> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hello everyone, >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> A couple of months ago I decided that I wanted to >>>>>>>>>>>>>>>>>>>>>>>>> really understand how the Beam FnApi works and how it >>>>>>>>>>>>>>>>>>>>>>>>> interacts with the >>>>>>>>>>>>>>>>>>>>>>>>> Portable Runner. For me at least that usually means I >>>>>>>>>>>>>>>>>>>>>>>>> need to write some >>>>>>>>>>>>>>>>>>>>>>>>> code so I can see things happening in a debugger and >>>>>>>>>>>>>>>>>>>>>>>>> to really prove to >>>>>>>>>>>>>>>>>>>>>>>>> myself I understood what was going on I decided I >>>>>>>>>>>>>>>>>>>>>>>>> couldn't use an existing >>>>>>>>>>>>>>>>>>>>>>>>> SDK language to do it since there would be the >>>>>>>>>>>>>>>>>>>>>>>>> temptation to read some code >>>>>>>>>>>>>>>>>>>>>>>>> and convince myself that I actually understood what >>>>>>>>>>>>>>>>>>>>>>>>> was going on. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> One thing led to another and it turns out that to >>>>>>>>>>>>>>>>>>>>>>>>> get a minimal FnApi integration going you end up >>>>>>>>>>>>>>>>>>>>>>>>> writing a fair bit of an >>>>>>>>>>>>>>>>>>>>>>>>> SDK. So I decided to take things to a point where I >>>>>>>>>>>>>>>>>>>>>>>>> had an SDK that could >>>>>>>>>>>>>>>>>>>>>>>>> execute a word count example via a portable runner >>>>>>>>>>>>>>>>>>>>>>>>> backend. I've now >>>>>>>>>>>>>>>>>>>>>>>>> reached that point and would like to submit my >>>>>>>>>>>>>>>>>>>>>>>>> prototype SDK to the list >>>>>>>>>>>>>>>>>>>>>>>>> for feedback. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> It's currently living in a branch on my fork here: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/byronellis/beam/tree/swift-sdk/sdks/swift >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> At the moment it runs via the most recent XCode >>>>>>>>>>>>>>>>>>>>>>>>> Beta using Swift 5.9 on Intel Macs, but should also >>>>>>>>>>>>>>>>>>>>>>>>> work using beta builds >>>>>>>>>>>>>>>>>>>>>>>>> of 5.9 for Linux running on Intel hardware. I haven't >>>>>>>>>>>>>>>>>>>>>>>>> had a chance to try >>>>>>>>>>>>>>>>>>>>>>>>> it on ARM hardware and make sure all of the endian >>>>>>>>>>>>>>>>>>>>>>>>> checks are complete. The >>>>>>>>>>>>>>>>>>>>>>>>> "IntegrationTests.swift" file contains a word count >>>>>>>>>>>>>>>>>>>>>>>>> example that reads some >>>>>>>>>>>>>>>>>>>>>>>>> local files (as well as a missing file to exercise >>>>>>>>>>>>>>>>>>>>>>>>> DLQ functionality) and >>>>>>>>>>>>>>>>>>>>>>>>> output counts through two separate group by >>>>>>>>>>>>>>>>>>>>>>>>> operations to get it past the >>>>>>>>>>>>>>>>>>>>>>>>> "map reduce" size of pipeline. I've tested it against >>>>>>>>>>>>>>>>>>>>>>>>> the Python Portable >>>>>>>>>>>>>>>>>>>>>>>>> Runner. Since my goal was to learn FnApi there is no >>>>>>>>>>>>>>>>>>>>>>>>> Direct Runner at this >>>>>>>>>>>>>>>>>>>>>>>>> time. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I've shown it to a couple of folks already and >>>>>>>>>>>>>>>>>>>>>>>>> incorporated some of that feedback already (for >>>>>>>>>>>>>>>>>>>>>>>>> example pardo was >>>>>>>>>>>>>>>>>>>>>>>>> originally called dofn when defining pipelines). In >>>>>>>>>>>>>>>>>>>>>>>>> general I've tried to >>>>>>>>>>>>>>>>>>>>>>>>> make the API as "Swift-y" as possible, hence the >>>>>>>>>>>>>>>>>>>>>>>>> heavy reliance on closures >>>>>>>>>>>>>>>>>>>>>>>>> and while there aren't yet composite PTransforms >>>>>>>>>>>>>>>>>>>>>>>>> there's the beginnings of >>>>>>>>>>>>>>>>>>>>>>>>> what would be needed for a SwiftUI-like declarative >>>>>>>>>>>>>>>>>>>>>>>>> API for creating them. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> There are of course a ton of missing bits still to >>>>>>>>>>>>>>>>>>>>>>>>> be implemented, like counters, metrics, windowing, >>>>>>>>>>>>>>>>>>>>>>>>> state, timers, etc. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> This should be fine and we can get the code >>>>>>>>>>>>>>>>>>>>>>>> documented without these features. I think support for >>>>>>>>>>>>>>>>>>>>>>>> composites and >>>>>>>>>>>>>>>>>>>>>>>> adding an external transform (see, Java >>>>>>>>>>>>>>>>>>>>>>>> <https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/External.java>, >>>>>>>>>>>>>>>>>>>>>>>> Python >>>>>>>>>>>>>>>>>>>>>>>> <https://github.com/apache/beam/blob/c7b7921185686da573f76ce7320817c32375c7d0/sdks/python/apache_beam/transforms/external.py#L556>, >>>>>>>>>>>>>>>>>>>>>>>> Go >>>>>>>>>>>>>>>>>>>>>>>> <https://github.com/apache/beam/blob/c7b7921185686da573f76ce7320817c32375c7d0/sdks/go/pkg/beam/xlang.go#L155>, >>>>>>>>>>>>>>>>>>>>>>>> TypeScript >>>>>>>>>>>>>>>>>>>>>>>> <https://github.com/apache/beam/blob/master/sdks/typescript/src/apache_beam/transforms/external.ts>) >>>>>>>>>>>>>>>>>>>>>>>> to add support for multi-lang will bring in a lot of >>>>>>>>>>>>>>>>>>>>>>>> features (for example, >>>>>>>>>>>>>>>>>>>>>>>> I/O connectors) for free. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Any and all feedback welcome and happy to submit a >>>>>>>>>>>>>>>>>>>>>>>>> PR if folks are interested, though the "Swift Way" >>>>>>>>>>>>>>>>>>>>>>>>> would be to have it in >>>>>>>>>>>>>>>>>>>>>>>>> its own repo so that it can easily be used from the >>>>>>>>>>>>>>>>>>>>>>>>> Swift Package Manager. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> +1 for creating a PR (may be as a draft initially). >>>>>>>>>>>>>>>>>>>>>>>> Also it'll be easier to comment on a PR :) >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> - Cham >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>>>>>>> [3] >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>> B >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>