On Wed, Jul 13, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote: > First we'll want to choose whether we want to target Wasm, WASI or Wagi. >
These terms are defined here <https://www.fermyon.com/blog/wasm-wasi-wagi?gclid=CjwKCAjw2rmWBhB4EiwAiJ0mtVhiTuMZmy4bJSlk4nJj1deNX3KueomLgkG8JMyGeiHJ3FJRPpVn7BoCs58QAvD_BwE> if anybody is confused as I am :) > WASI adds a lot of simple things like access to a clock, random number > generator, ... that would expand the scope of what transpiled code can do. > It is debatable whether we'll want the power to run the transpiled code as > a microservice. Using UDFs for XLang and UDFs and UDAFs for SQL as our > expected use cases seem to make WASI the best choice. The issue is in the > details as there is a hodgepodge of what language runtimes support and what > are the limits of transpiling from a language to WebAssembly. > Agree that WASI seems like a good target since it gives access to additional system resources/tooling. > > Assuming WASI then it breaks down to these two aspects: > 1) Does the host language have a runtime? > Java: https://github.com/wasmerio/wasmer-java > Python: https://github.com/wasmerio/wasmer-python > Go: https://github.com/wasmerio/wasmer-go > > 2) How good is compilation from source language to WebAssembly > <https://github.com/appcypher/awesome-wasm-langs>? > Java (very limited): > Issues with garbage collection and the need to transpile/replace much of > the VM's capabilities plus the large standard library that everyone uses > causes a lot of challenges. > JWebAssembly can do simple things like basic classes, strings, method > calls. Should be able to compile trivial lambdas to Wasm. There are other > choices but to my knowledge all are very limited. > That's unfortunate. But hopefully Java support will be implemented soon ? > > Python <https://pythondev.readthedocs.io/wasm.html> (quite good): > Features CPython Emscripten browser CPython Emscripten node Pyodide > subprocess (fork, exec) no no no > threads no YES WIP > file system no (only MEMFS) YES (Node raw FS) YES (IDB, Node, …) > shared extension modules WIP WIP YES > PyPI packages no no YES > sockets ? ? ? > urllib, asyncio no no WebAPI fetch / WebSocket > signals no WIP YES > > Go (excellent): Native support in go compiler > Great. Could executing Go UDFs in Python x-lang transforms (for example, Dataframe, RunInference, Python Map) be a good first target ? Thanks, Cham > > On Tue, Jul 12, 2022 at 5:51 PM Chamikara Jayalath via dev < > dev@beam.apache.org> wrote: > >> >> >> On Wed, Jun 29, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote: >> >>> I have had interest in integrating Wasm within Beam as well as I have >>> had a lot of interest in improving language portability. >>> >>> Wasm has a lot of benefits over using docker containers to provide a >>> place for code to execute. From experience implementing working on the >>> Beam's portability layer and internal Flume knowledge: >>> * encoding and decoding data is expensive, anything which ensures that >>> in-memory representations for data being transferred from the host to the >>> guest and back without transcoding/re-interpreting will be a big win. >>> * reducing the amount of times we need to pass data between guest and >>> host and back is important >>> * fusing transforms reduces the number of data passing points >>> * batching (row or columnar) data reduces the amount of times we need >>> to pass data at each data passing point >>> * there are enough complicated use cases (state & timers, large >>> iterables, side inputs) where handling the trivial map/flatmap usecase will >>> provide little value since it will prevent fusion >>> >>> I have been meaning to work on a prototype where we replace the current >>> gRPC + docker path with one in which we use Wasm to execute a fused graph >>> re-using large parts of the existing code base written to support >>> portability. >>> >> >> This sounds very interesting. Probably using Wasm to implement proper UDF >> support for x-lang (for example, executing Python timestamp/watermark >> functions provided through the Kafka Python x-lang wrapper on the Java >> Kafka transform) will be a good first target ? My main question for this at >> this point is whether Wasm has adequate support for existing SDKs that use >> x-lang to implement this in a useful way. >> >> Thanks, >> Cham >> >> >>> >>> >>> On Fri, Jun 17, 2022 at 2:19 PM Brian Hulette <bhule...@google.com> >>> wrote: >>> >>>> Re: Arrow - it's long been my dream to use Arrow for interchange in >>>> Beam [1]. I'm trying to move us in that direction with >>>> https://s.apache.org/batched-dofns (arrow is discussed briefly in the >>>> Future Work section). This gives the Python SDK a concept of batches of >>>> logical elements. My goal is Beam schemas + batches of logical elements -> >>>> Arrow RecordBatches. >>>> >>>> The Batched DoFn infrastructure is stable as of the 2.40.0 release cut >>>> and I'm currently working on adding what I'm calling a "BatchConverter" [2] >>>> for Beam Rows -> Arrow RecordBatch. Once that's done it could be >>>> interesting to experiment with a "WasmDoFn" that uses Arrow for >>>> interchange. >>>> >>>> Brian >>>> >>>> [1] >>>> https://docs.google.com/presentation/d/1D9vigwYTCuAuz_CO8nex3GK3h873acmQJE5Ui8TFsDY/edit#slide=id.g608e662464_0_160 >>>> [2] >>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py >>>> >>>> >>>> On Thu, Jun 16, 2022 at 10:55 AM Sean Jensen-Grey < >>>> jenseng...@google.com> wrote: >>>> >>>>> Interesting. >>>>> >>>>> Robert, I was just served an ad for Redpanda when I searched for >>>>> "golang wasm" :) >>>>> >>>>> The storage and execution grid systems are all embracing wasm in some >>>>> way. >>>>> >>>>> https://redpanda.com/ >>>>> https://www.fluvio.io/ >>>>> https://temporal.io/ (Cadence fork by the Cadence folks, I met Maxim >>>>> the lead at Temporal at the 2020 Wasm Summit) >>>>> https://github.com/pachyderm/pachyderm no mention of wasm, yet. >>>>> >>>>> Keep the Wasm+Beam demos coming. >>>>> >>>>> Sean >>>>> >>>>> >>>>> >>>>> On Thu, Jun 16, 2022 at 4:23 AM Steven van Rossum < >>>>> sjvanros...@google.com> wrote: >>>>> >>>>>> I caught up with all the replies through the web interface, but I >>>>>> didn't have my list subscription set up correctly so my reply (TL;DR >>>>>> sample >>>>>> code available at https://github.com/sjvanrossum/beam-wasm) didn't >>>>>> come through until a bit later yesterday I think. >>>>>> >>>>>> Sean, I agree with your suggestion of Arrow as the interchange format >>>>>> for Wasm transforms and it's something I thought about exploring when I >>>>>> was >>>>>> adding serialization/deserialization of complex (meaning anything that's >>>>>> not an integer or float in the context of Wasm) data types in the demo. >>>>>> It's an unfortunate bit of overhead which could very well be solved with >>>>>> Arrow and shared memory between Wasm modules. >>>>>> I've seen Wasm transforms pop up in a few other places, notably in >>>>>> streaming data platforms like Fluvio and Redpanda and they seem to incur >>>>>> the same overhead when moving data into and out of the guest context so >>>>>> maybe it's negligible, but I haven't done any serious benchmark yet to >>>>>> validate that. >>>>>> >>>>>> Regards, >>>>>> >>>>>> Steve >>>>>> >>>>>> On Thu, Jun 16, 2022 at 3:04 AM Robert Burke <rob...@frantil.com> >>>>>> wrote: >>>>>> >>>>>>> Obligatory mention that WASM is basically an architecture that any >>>>>>> well meaning compiler can target, eg the Go compiler >>>>>>> >>>>>>> >>>>>>> https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/ >>>>>>> >>>>>>> (Among many articles for the last few years) >>>>>>> >>>>>>> Robert Burke >>>>>>> Beam Go Busybody >>>>>>> >>>>>>> On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey < >>>>>>> jenseng...@google.com> wrote: >>>>>>> >>>>>>>> Heh, my stage fright was so strong, I didn't realize that the talk >>>>>>>> was recorded. :) >>>>>>>> >>>>>>>> Steven, I'd love to chat about Wasm in Beam. This email is a bit >>>>>>>> rough. >>>>>>>> >>>>>>>> I haven't explored Wasm in Beam much since that talk. I think the >>>>>>>> most compelling use is in the portability of logic between data >>>>>>>> processing >>>>>>>> systems. Esp in the use of probabilistic data structures like Bloom >>>>>>>> Filters, Count-Min-Sketch, HyperLogLog, where it is nice to persist the >>>>>>>> data structure and use it on a different system. Like generating a >>>>>>>> bloom >>>>>>>> filter in Beam and using it inside of a BQ query w/o having to >>>>>>>> reimplement >>>>>>>> and test across many platforms. >>>>>>>> >>>>>>>> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8 >>>>>>>> exists, Wasm support exists for free unless the embedder goes out of >>>>>>>> their >>>>>>>> way to disable it. So it is supported in Deno/Node as well. In Python, >>>>>>>> Wasm >>>>>>>> support via Wasmtime <https://github.com/bytecodealliance/wasmtime> >>>>>>>> is really good. There are *many* options for execution environments, >>>>>>>> one >>>>>>>> of the downsides of passing through JS one is in string and number >>>>>>>> support(float/int64) issues, afaik. I could be wrong, maybe JS has >>>>>>>> fixed >>>>>>>> all this by now. >>>>>>>> >>>>>>>> The qualities in order of importance (for me) are >>>>>>>> >>>>>>>> 1. Portability, run the same code everywhere >>>>>>>> 2. Security, memory safety for the caller. Running Wasm inside >>>>>>>> of Python should never crash your Python interpreter. The >>>>>>>> capability model >>>>>>>> ensures that the Wasm module can only do what you allow it to >>>>>>>> 3. Performance (portable), compile once and run everywhere >>>>>>>> within some margin of native. Python makes this look good :) >>>>>>>> >>>>>>>> I think something worth exploring is moving opaque-ish Arrow >>>>>>>> objects around via Beam, so that Beam is now mostly in the control >>>>>>>> plane >>>>>>>> and computation happens in Wasm, this should reduce the serialization >>>>>>>> overhead and also get Python out of the datapath. >>>>>>>> >>>>>>>> I see someone exploring Wasm+Arrow here, >>>>>>>> https://github.com/domoritz/arrow-wasm >>>>>>>> >>>>>>>> Another possibly interesting avenue to explore is compiling command >>>>>>>> line programs to Wasi (WebAssembly System Interface), the POSIX like >>>>>>>> shim, >>>>>>>> so that they can be run inprocess without the fork/exec/pipe overhead >>>>>>>> of >>>>>>>> running a subprocess. A neat demo might be running something like >>>>>>>> Jq <https://stedolan.github.io/jq/> inside of a Beam job. >>>>>>>> >>>>>>>> Not to make Wasm sound like a Python only technology, it can be >>>>>>>> used via Java/JVM via >>>>>>>> >>>>>>>> - https://www.graalvm.org/22.1/reference-manual/wasm/ >>>>>>>> - https://github.com/kawamuray/wasmtime-java >>>>>>>> >>>>>>>> Sean >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <pabl...@google.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> adding Steven in case he didn't get the replies : ) >>>>>>>>> >>>>>>>>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins < >>>>>>>>> dpcoll...@google.com> wrote: >>>>>>>>> >>>>>>>>>> If we ever do anything with the JS runtime, this would seem to be >>>>>>>>>> the best place to run WASM. >>>>>>>>>> >>>>>>>>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette < >>>>>>>>>> bhule...@google.com> wrote: >>>>>>>>>> >>>>>>>>>>> FYI: @Sean Jensen-Grey <jenseng...@google.com> gave a talk back >>>>>>>>>>> in 2020 where he had integrated Rust with the Python SDK. I thought >>>>>>>>>>> he used >>>>>>>>>>> WebAssembly for that, but it looks like he used some other >>>>>>>>>>> approaches, and >>>>>>>>>>> his talk mentioned WebAssembly as future work. Not sure if that was >>>>>>>>>>> ever >>>>>>>>>>> explored. >>>>>>>>>>> >>>>>>>>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o >>>>>>>>>>> https://github.com/seanjensengrey/beam-rust-python-java >>>>>>>>>>> >>>>>>>>>>> Brian >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in >>>>>>>>>>>> the WebAssembly topic. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada < >>>>>>>>>>>> pabl...@google.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Would you open a pull request for it? Or at least share a >>>>>>>>>>>>> branch? : ) >>>>>>>>>>>>> Even if we don't want to merge it, it would be great to have a >>>>>>>>>>>>> PR as a way to showcase the work, its usefulness, and receive >>>>>>>>>>>>> comments on >>>>>>>>>>>>> this thread once we can see something more specific. >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum < >>>>>>>>>>>>> sjvanros...@google.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi folks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I had some spare time yesterday and thought it'd be fun to >>>>>>>>>>>>>> implement a transform which runs WebAssembly modules as a >>>>>>>>>>>>>> lightweight way >>>>>>>>>>>>>> to implement cross language transforms for languages which don't >>>>>>>>>>>>>> (yet) have >>>>>>>>>>>>>> a SDK implementation. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I've got a small proof of concept running in the Python SDK >>>>>>>>>>>>>> as a DoFn with Wasmer as the WebAssembly runtime and simple >>>>>>>>>>>>>> support for >>>>>>>>>>>>>> marshalling between the host and guest environment with the >>>>>>>>>>>>>> RowCoder. The >>>>>>>>>>>>>> module I've constructed is mostly useless, but demonstrates the >>>>>>>>>>>>>> host >>>>>>>>>>>>>> copying the encoded element into the guest's memory, the guest >>>>>>>>>>>>>> copying >>>>>>>>>>>>>> those bytes elsewhere in its linear memory buffer, the guest >>>>>>>>>>>>>> calling back >>>>>>>>>>>>>> to the host with the offset and size and the host copying and >>>>>>>>>>>>>> decoding from >>>>>>>>>>>>>> the guest's memory. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Any thoughts/interest? I'm not sure where I was going with >>>>>>>>>>>>>> this, since it was mostly just a "wouldn't it be cool if..." on >>>>>>>>>>>>>> a Monday >>>>>>>>>>>>>> afternoon, but I can see a few use cases for this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Steve >>>>>>>>>>>>>> >>>>>>>>>>>>>> Steven van Rossum | Strategic Cloud Engineer | >>>>>>>>>>>>>> sjvanros...@google.com | (+31) (0)6 21174069 >>>>>>>>>>>>>> <+31%206%2021174069> >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Google Netherlands B.V.* >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD >>>>>>>>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01* >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> *If you received this communication by mistake, please don't >>>>>>>>>>>>>> forward it to anyone else (it may contain confidential or >>>>>>>>>>>>>> privileged >>>>>>>>>>>>>> information), please erase all copies of it, including all >>>>>>>>>>>>>> attachments, and >>>>>>>>>>>>>> please let the sender know it went to the wrong person. Thanks.* >>>>>>>>>>>>>> >>>>>>>>>>>>>> *The above terms reflect a potential business arrangement, >>>>>>>>>>>>>> are provided solely as a basis for further discussion, and are >>>>>>>>>>>>>> not intended >>>>>>>>>>>>>> to be and do not constitute a legally binding obligation. No >>>>>>>>>>>>>> legally >>>>>>>>>>>>>> binding obligations will be created, implied, or inferred until >>>>>>>>>>>>>> an >>>>>>>>>>>>>> agreement in final form is executed in writing by all parties >>>>>>>>>>>>>> involved.* >>>>>>>>>>>>>> >>>>>>>>>>>>>