On Wed, Jul 13, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:

> First we'll want to choose whether we want to target Wasm, WASI or Wagi.
>

These terms are defined here
<https://www.fermyon.com/blog/wasm-wasi-wagi?gclid=CjwKCAjw2rmWBhB4EiwAiJ0mtVhiTuMZmy4bJSlk4nJj1deNX3KueomLgkG8JMyGeiHJ3FJRPpVn7BoCs58QAvD_BwE>
if anybody is confused as I am :)


> WASI adds a lot of simple things like access to a clock, random number
> generator, ... that would expand the scope of what transpiled code can do.
> It is debatable whether we'll want the power to run the transpiled code as
> a microservice. Using UDFs for XLang and UDFs and UDAFs for SQL as our
> expected use cases seem to make WASI the best choice. The issue is in the
> details as there is a hodgepodge of what language runtimes support and what
> are the limits of transpiling from a language to WebAssembly.
>

Agree that WASI seems like a good target since it gives access to
additional system resources/tooling.


>
> Assuming WASI then it breaks down to these two aspects:
> 1) Does the host language have a runtime?
> Java: https://github.com/wasmerio/wasmer-java
> Python: https://github.com/wasmerio/wasmer-python
> Go: https://github.com/wasmerio/wasmer-go
>
> 2) How good is compilation from source language to WebAssembly
> <https://github.com/appcypher/awesome-wasm-langs>?
> Java (very limited):
> Issues with garbage collection and the need to transpile/replace much of
> the VM's capabilities plus the large standard library that everyone uses
> causes a lot of challenges.
> JWebAssembly can do simple things like basic classes, strings, method
> calls. Should be able to compile trivial lambdas to Wasm. There are other
> choices but to my knowledge all are very limited.
>

That's unfortunate. But hopefully Java support will be implemented soon ?


>
> Python <https://pythondev.readthedocs.io/wasm.html> (quite good):
> Features CPython Emscripten browser CPython Emscripten node Pyodide
> subprocess (fork, exec) no no no
> threads no YES WIP
> file system no (only MEMFS) YES (Node raw FS) YES (IDB, Node, …)
> shared extension modules WIP WIP YES
> PyPI packages no no YES
> sockets ? ? ?
> urllib, asyncio no no WebAPI fetch / WebSocket
> signals no WIP YES
>
> Go (excellent): Native support in go compiler
>

Great. Could executing Go UDFs in Python x-lang transforms (for example,
Dataframe, RunInference, Python Map) be a good first target ?

Thanks,
Cham


>
> On Tue, Jul 12, 2022 at 5:51 PM Chamikara Jayalath via dev <
> dev@beam.apache.org> wrote:
>
>>
>>
>> On Wed, Jun 29, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:
>>
>>> I have had interest in integrating Wasm within Beam as well as I have
>>> had a lot of interest in improving language portability.
>>>
>>> Wasm has a lot of benefits over using docker containers to provide a
>>> place for code to execute. From experience implementing working on the
>>> Beam's portability layer and internal Flume knowledge:
>>> * encoding and decoding data is expensive, anything which ensures that
>>> in-memory representations for data being transferred from the host to the
>>> guest and back without transcoding/re-interpreting will be a big win.
>>> * reducing the amount of times we need to pass data between guest and
>>> host and back is important
>>>   * fusing transforms reduces the number of data passing points
>>>   * batching (row or columnar) data reduces the amount of times we need
>>> to pass data at each data passing point
>>> * there are enough complicated use cases (state & timers, large
>>> iterables, side inputs) where handling the trivial map/flatmap usecase will
>>> provide little value since it will prevent fusion
>>>
>>> I have been meaning to work on a prototype where we replace the current
>>> gRPC + docker path with one in which we use Wasm to execute a fused graph
>>> re-using large parts of the existing code base written to support
>>> portability.
>>>
>>
>> This sounds very interesting. Probably using Wasm to implement proper UDF
>> support for x-lang (for example, executing Python timestamp/watermark
>> functions provided through the Kafka Python x-lang wrapper on the Java
>> Kafka transform) will be a good first target ? My main question for this at
>> this point is whether Wasm has adequate support for existing SDKs that use
>> x-lang to implement this in a useful way.
>>
>> Thanks,
>> Cham
>>
>>
>>>
>>>
>>> On Fri, Jun 17, 2022 at 2:19 PM Brian Hulette <bhule...@google.com>
>>> wrote:
>>>
>>>> Re: Arrow - it's long been my dream to use Arrow for interchange in
>>>> Beam [1]. I'm trying to move us in that direction with
>>>> https://s.apache.org/batched-dofns (arrow is discussed briefly in the
>>>> Future Work section). This gives the Python SDK a concept of batches of
>>>> logical elements. My goal is Beam schemas + batches of logical elements ->
>>>> Arrow RecordBatches.
>>>>
>>>> The Batched DoFn infrastructure is stable as of the 2.40.0 release cut
>>>> and I'm currently working on adding what I'm calling a "BatchConverter" [2]
>>>> for Beam Rows -> Arrow RecordBatch. Once that's done it could be
>>>> interesting to experiment with a "WasmDoFn" that uses Arrow for 
>>>> interchange.
>>>>
>>>> Brian
>>>>
>>>> [1]
>>>> https://docs.google.com/presentation/d/1D9vigwYTCuAuz_CO8nex3GK3h873acmQJE5Ui8TFsDY/edit#slide=id.g608e662464_0_160
>>>> [2]
>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py
>>>>
>>>>
>>>> On Thu, Jun 16, 2022 at 10:55 AM Sean Jensen-Grey <
>>>> jenseng...@google.com> wrote:
>>>>
>>>>> Interesting.
>>>>>
>>>>> Robert, I was just served an ad for Redpanda when I searched for
>>>>> "golang wasm" :)
>>>>>
>>>>> The storage and execution grid systems are all embracing wasm in some
>>>>> way.
>>>>>
>>>>> https://redpanda.com/
>>>>> https://www.fluvio.io/
>>>>> https://temporal.io/ (Cadence fork by the Cadence folks, I met Maxim
>>>>> the lead at Temporal at the 2020 Wasm Summit)
>>>>> https://github.com/pachyderm/pachyderm no mention of wasm, yet.
>>>>>
>>>>> Keep the Wasm+Beam demos coming.
>>>>>
>>>>> Sean
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jun 16, 2022 at 4:23 AM Steven van Rossum <
>>>>> sjvanros...@google.com> wrote:
>>>>>
>>>>>> I caught up with all the replies through the web interface, but I
>>>>>> didn't have my list subscription set up correctly so my reply (TL;DR 
>>>>>> sample
>>>>>> code available at https://github.com/sjvanrossum/beam-wasm) didn't
>>>>>> come through until a bit later yesterday I think.
>>>>>>
>>>>>> Sean, I agree with your suggestion of Arrow as the interchange format
>>>>>> for Wasm transforms and it's something I thought about exploring when I 
>>>>>> was
>>>>>> adding serialization/deserialization of complex (meaning anything that's
>>>>>> not an integer or float in the context of Wasm) data types in the demo.
>>>>>> It's an unfortunate bit of overhead which could very well be solved with
>>>>>> Arrow and shared memory between Wasm modules.
>>>>>> I've seen Wasm transforms pop up in a few other places, notably in
>>>>>> streaming data platforms like Fluvio and Redpanda and they seem to incur
>>>>>> the same overhead when moving data into and out of the guest context so
>>>>>> maybe it's negligible, but I haven't done any serious benchmark yet to
>>>>>> validate that.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>> On Thu, Jun 16, 2022 at 3:04 AM Robert Burke <rob...@frantil.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Obligatory mention that WASM is basically an architecture that any
>>>>>>> well meaning compiler can target, eg the Go compiler
>>>>>>>
>>>>>>>
>>>>>>> https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/
>>>>>>>
>>>>>>> (Among many articles for the last few years)
>>>>>>>
>>>>>>> Robert Burke
>>>>>>> Beam Go Busybody
>>>>>>>
>>>>>>> On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey <
>>>>>>> jenseng...@google.com> wrote:
>>>>>>>
>>>>>>>> Heh, my stage fright was so strong, I didn't realize that the talk
>>>>>>>> was recorded. :)
>>>>>>>>
>>>>>>>> Steven, I'd love to chat about Wasm in Beam. This email is a bit
>>>>>>>> rough.
>>>>>>>>
>>>>>>>> I haven't explored Wasm in Beam much since that talk. I think the
>>>>>>>> most compelling use is in the portability of logic between data 
>>>>>>>> processing
>>>>>>>> systems. Esp in the use of probabilistic data structures like Bloom
>>>>>>>> Filters, Count-Min-Sketch, HyperLogLog, where it is nice to persist the
>>>>>>>> data structure and use it on a different system. Like generating a 
>>>>>>>> bloom
>>>>>>>> filter in Beam and using it inside of a BQ query w/o having to 
>>>>>>>> reimplement
>>>>>>>> and test across many platforms.
>>>>>>>>
>>>>>>>> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8
>>>>>>>> exists, Wasm support exists for free unless the embedder goes out of 
>>>>>>>> their
>>>>>>>> way to disable it. So it is supported in Deno/Node as well. In Python, 
>>>>>>>> Wasm
>>>>>>>> support via Wasmtime <https://github.com/bytecodealliance/wasmtime>
>>>>>>>> is really good.  There are *many* options for execution environments, 
>>>>>>>> one
>>>>>>>> of the downsides of passing through JS one is in string and number
>>>>>>>> support(float/int64) issues, afaik. I could be wrong, maybe JS has 
>>>>>>>> fixed
>>>>>>>> all this by now.
>>>>>>>>
>>>>>>>> The qualities in order of importance (for me) are
>>>>>>>>
>>>>>>>>    1. Portability, run the same code everywhere
>>>>>>>>    2. Security, memory safety for the caller. Running Wasm inside
>>>>>>>>    of Python should never crash your Python interpreter. The 
>>>>>>>> capability model
>>>>>>>>    ensures that the Wasm module can only do what you allow it to
>>>>>>>>    3. Performance (portable), compile once and run everywhere
>>>>>>>>    within some margin of native.  Python makes this look good :)
>>>>>>>>
>>>>>>>> I think something worth exploring is moving opaque-ish Arrow
>>>>>>>> objects around via Beam, so that Beam is now mostly in the control 
>>>>>>>> plane
>>>>>>>> and computation happens in Wasm, this should reduce the serialization
>>>>>>>> overhead and also get Python out of the datapath.
>>>>>>>>
>>>>>>>> I see someone exploring Wasm+Arrow here,
>>>>>>>> https://github.com/domoritz/arrow-wasm
>>>>>>>>
>>>>>>>> Another possibly interesting avenue to explore is compiling command
>>>>>>>> line programs to Wasi (WebAssembly System Interface), the POSIX like 
>>>>>>>> shim,
>>>>>>>> so that they can be run inprocess without the fork/exec/pipe overhead 
>>>>>>>> of
>>>>>>>> running a subprocess. A neat demo might be running something like
>>>>>>>> Jq <https://stedolan.github.io/jq/> inside of a Beam job.
>>>>>>>>
>>>>>>>> Not to make Wasm sound like a Python only technology, it can be
>>>>>>>> used via Java/JVM via
>>>>>>>>
>>>>>>>>    - https://www.graalvm.org/22.1/reference-manual/wasm/
>>>>>>>>    - https://github.com/kawamuray/wasmtime-java
>>>>>>>>
>>>>>>>> Sean
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <pabl...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> adding Steven in case he didn't get the replies : )
>>>>>>>>>
>>>>>>>>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <
>>>>>>>>> dpcoll...@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> If we ever do anything with the JS runtime, this would seem to be
>>>>>>>>>> the best place to run WASM.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <
>>>>>>>>>> bhule...@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> FYI: @Sean Jensen-Grey <jenseng...@google.com> gave a talk back
>>>>>>>>>>> in 2020 where he had integrated Rust with the Python SDK. I thought 
>>>>>>>>>>> he used
>>>>>>>>>>> WebAssembly for that, but it looks like he used some other 
>>>>>>>>>>> approaches, and
>>>>>>>>>>> his talk mentioned WebAssembly as future work. Not sure if that was 
>>>>>>>>>>> ever
>>>>>>>>>>> explored.
>>>>>>>>>>>
>>>>>>>>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>>>>>>>>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>>>>>>>>>
>>>>>>>>>>> Brian
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in
>>>>>>>>>>>> the WebAssembly topic.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <
>>>>>>>>>>>> pabl...@google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Would you open a pull request for it? Or at least share a
>>>>>>>>>>>>> branch? : )
>>>>>>>>>>>>> Even if we don't want to merge it, it would be great to have a
>>>>>>>>>>>>> PR as a way to showcase the work, its usefulness, and receive 
>>>>>>>>>>>>> comments on
>>>>>>>>>>>>> this thread once we can see something more specific.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>>>>>>>>>> sjvanros...@google.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I had some spare time yesterday and thought it'd be fun to
>>>>>>>>>>>>>> implement a transform which runs WebAssembly modules as a 
>>>>>>>>>>>>>> lightweight way
>>>>>>>>>>>>>> to implement cross language transforms for languages which don't 
>>>>>>>>>>>>>> (yet) have
>>>>>>>>>>>>>> a SDK implementation.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've got a small proof of concept running in the Python SDK
>>>>>>>>>>>>>> as a DoFn with Wasmer as the WebAssembly runtime and simple 
>>>>>>>>>>>>>> support for
>>>>>>>>>>>>>> marshalling between the host and guest environment with the 
>>>>>>>>>>>>>> RowCoder. The
>>>>>>>>>>>>>> module I've constructed is mostly useless, but demonstrates the 
>>>>>>>>>>>>>> host
>>>>>>>>>>>>>> copying the encoded element into the guest's memory, the guest 
>>>>>>>>>>>>>> copying
>>>>>>>>>>>>>> those bytes elsewhere in its linear memory buffer, the guest 
>>>>>>>>>>>>>> calling back
>>>>>>>>>>>>>> to the host with the offset and size and the host copying and 
>>>>>>>>>>>>>> decoding from
>>>>>>>>>>>>>> the guest's memory.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any thoughts/interest? I'm not sure where I was going with
>>>>>>>>>>>>>> this, since it was mostly just a "wouldn't it be cool if..." on 
>>>>>>>>>>>>>> a Monday
>>>>>>>>>>>>>> afternoon, but I can see a few use cases for this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Steve
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>>>>>>>>>> sjvanros...@google.com |  (+31) (0)6 21174069
>>>>>>>>>>>>>> <+31%206%2021174069>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Google Netherlands B.V.*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>>>>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *If you received this communication by mistake, please don't
>>>>>>>>>>>>>> forward it to anyone else (it may contain confidential or 
>>>>>>>>>>>>>> privileged
>>>>>>>>>>>>>> information), please erase all copies of it, including all 
>>>>>>>>>>>>>> attachments, and
>>>>>>>>>>>>>> please let the sender know it went to the wrong person. Thanks.*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *The above terms reflect a potential business arrangement,
>>>>>>>>>>>>>> are provided solely as a basis for further discussion, and are 
>>>>>>>>>>>>>> not intended
>>>>>>>>>>>>>> to be and do not constitute a legally binding obligation. No 
>>>>>>>>>>>>>> legally
>>>>>>>>>>>>>> binding obligations will be created, implied, or inferred until 
>>>>>>>>>>>>>> an
>>>>>>>>>>>>>> agreement in final form is executed in writing by all parties 
>>>>>>>>>>>>>> involved.*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>

Reply via email to