On Wed, Jun 29, 2022 at 9:31 AM Luke Cwik <lc...@google.com> wrote:

> I have had interest in integrating Wasm within Beam as well as I have had
> a lot of interest in improving language portability.
>
> Wasm has a lot of benefits over using docker containers to provide a place
> for code to execute. From experience implementing working on the Beam's
> portability layer and internal Flume knowledge:
> * encoding and decoding data is expensive, anything which ensures that
> in-memory representations for data being transferred from the host to the
> guest and back without transcoding/re-interpreting will be a big win.
> * reducing the amount of times we need to pass data between guest and host
> and back is important
>   * fusing transforms reduces the number of data passing points
>   * batching (row or columnar) data reduces the amount of times we need to
> pass data at each data passing point
> * there are enough complicated use cases (state & timers, large iterables,
> side inputs) where handling the trivial map/flatmap usecase will provide
> little value since it will prevent fusion
>
> I have been meaning to work on a prototype where we replace the current
> gRPC + docker path with one in which we use Wasm to execute a fused graph
> re-using large parts of the existing code base written to support
> portability.
>

This sounds very interesting. Probably using Wasm to implement proper UDF
support for x-lang (for example, executing Python timestamp/watermark
functions provided through the Kafka Python x-lang wrapper on the Java
Kafka transform) will be a good first target ? My main question for this at
this point is whether Wasm has adequate support for existing SDKs that use
x-lang to implement this in a useful way.

Thanks,
Cham


>
>
> On Fri, Jun 17, 2022 at 2:19 PM Brian Hulette <bhule...@google.com> wrote:
>
>> Re: Arrow - it's long been my dream to use Arrow for interchange in Beam
>> [1]. I'm trying to move us in that direction with
>> https://s.apache.org/batched-dofns (arrow is discussed briefly in the
>> Future Work section). This gives the Python SDK a concept of batches of
>> logical elements. My goal is Beam schemas + batches of logical elements ->
>> Arrow RecordBatches.
>>
>> The Batched DoFn infrastructure is stable as of the 2.40.0 release cut
>> and I'm currently working on adding what I'm calling a "BatchConverter" [2]
>> for Beam Rows -> Arrow RecordBatch. Once that's done it could be
>> interesting to experiment with a "WasmDoFn" that uses Arrow for interchange.
>>
>> Brian
>>
>> [1]
>> https://docs.google.com/presentation/d/1D9vigwYTCuAuz_CO8nex3GK3h873acmQJE5Ui8TFsDY/edit#slide=id.g608e662464_0_160
>> [2]
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py
>>
>>
>> On Thu, Jun 16, 2022 at 10:55 AM Sean Jensen-Grey <jenseng...@google.com>
>> wrote:
>>
>>> Interesting.
>>>
>>> Robert, I was just served an ad for Redpanda when I searched for "golang
>>> wasm" :)
>>>
>>> The storage and execution grid systems are all embracing wasm in some
>>> way.
>>>
>>> https://redpanda.com/
>>> https://www.fluvio.io/
>>> https://temporal.io/ (Cadence fork by the Cadence folks, I met Maxim
>>> the lead at Temporal at the 2020 Wasm Summit)
>>> https://github.com/pachyderm/pachyderm no mention of wasm, yet.
>>>
>>> Keep the Wasm+Beam demos coming.
>>>
>>> Sean
>>>
>>>
>>>
>>> On Thu, Jun 16, 2022 at 4:23 AM Steven van Rossum <
>>> sjvanros...@google.com> wrote:
>>>
>>>> I caught up with all the replies through the web interface, but I
>>>> didn't have my list subscription set up correctly so my reply (TL;DR sample
>>>> code available at https://github.com/sjvanrossum/beam-wasm) didn't
>>>> come through until a bit later yesterday I think.
>>>>
>>>> Sean, I agree with your suggestion of Arrow as the interchange format
>>>> for Wasm transforms and it's something I thought about exploring when I was
>>>> adding serialization/deserialization of complex (meaning anything that's
>>>> not an integer or float in the context of Wasm) data types in the demo.
>>>> It's an unfortunate bit of overhead which could very well be solved with
>>>> Arrow and shared memory between Wasm modules.
>>>> I've seen Wasm transforms pop up in a few other places, notably in
>>>> streaming data platforms like Fluvio and Redpanda and they seem to incur
>>>> the same overhead when moving data into and out of the guest context so
>>>> maybe it's negligible, but I haven't done any serious benchmark yet to
>>>> validate that.
>>>>
>>>> Regards,
>>>>
>>>> Steve
>>>>
>>>> On Thu, Jun 16, 2022 at 3:04 AM Robert Burke <rob...@frantil.com>
>>>> wrote:
>>>>
>>>>> Obligatory mention that WASM is basically an architecture that any
>>>>> well meaning compiler can target, eg the Go compiler
>>>>>
>>>>>
>>>>> https://www.bradcypert.com/an-introduction-to-targeting-web-assembly-with-golang/
>>>>>
>>>>> (Among many articles for the last few years)
>>>>>
>>>>> Robert Burke
>>>>> Beam Go Busybody
>>>>>
>>>>> On Wed, Jun 15, 2022, 2:04 PM Sean Jensen-Grey <jenseng...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Heh, my stage fright was so strong, I didn't realize that the talk
>>>>>> was recorded. :)
>>>>>>
>>>>>> Steven, I'd love to chat about Wasm in Beam. This email is a bit
>>>>>> rough.
>>>>>>
>>>>>> I haven't explored Wasm in Beam much since that talk. I think the
>>>>>> most compelling use is in the portability of logic between data 
>>>>>> processing
>>>>>> systems. Esp in the use of probabilistic data structures like Bloom
>>>>>> Filters, Count-Min-Sketch, HyperLogLog, where it is nice to persist the
>>>>>> data structure and use it on a different system. Like generating a bloom
>>>>>> filter in Beam and using it inside of a BQ query w/o having to 
>>>>>> reimplement
>>>>>> and test across many platforms.
>>>>>>
>>>>>> I have used Wasm in BQ, as BQ UDFs are driven by V8. Anywhere V8
>>>>>> exists, Wasm support exists for free unless the embedder goes out of 
>>>>>> their
>>>>>> way to disable it. So it is supported in Deno/Node as well. In Python, 
>>>>>> Wasm
>>>>>> support via Wasmtime <https://github.com/bytecodealliance/wasmtime>
>>>>>> is really good.  There are *many* options for execution environments, one
>>>>>> of the downsides of passing through JS one is in string and number
>>>>>> support(float/int64) issues, afaik. I could be wrong, maybe JS has fixed
>>>>>> all this by now.
>>>>>>
>>>>>> The qualities in order of importance (for me) are
>>>>>>
>>>>>>    1. Portability, run the same code everywhere
>>>>>>    2. Security, memory safety for the caller. Running Wasm inside of
>>>>>>    Python should never crash your Python interpreter. The capability 
>>>>>> model
>>>>>>    ensures that the Wasm module can only do what you allow it to
>>>>>>    3. Performance (portable), compile once and run everywhere within
>>>>>>    some margin of native.  Python makes this look good :)
>>>>>>
>>>>>> I think something worth exploring is moving opaque-ish Arrow objects
>>>>>> around via Beam, so that Beam is now mostly in the control plane and
>>>>>> computation happens in Wasm, this should reduce the serialization 
>>>>>> overhead
>>>>>> and also get Python out of the datapath.
>>>>>>
>>>>>> I see someone exploring Wasm+Arrow here,
>>>>>> https://github.com/domoritz/arrow-wasm
>>>>>>
>>>>>> Another possibly interesting avenue to explore is compiling command
>>>>>> line programs to Wasi (WebAssembly System Interface), the POSIX like 
>>>>>> shim,
>>>>>> so that they can be run inprocess without the fork/exec/pipe overhead of
>>>>>> running a subprocess. A neat demo might be running something like Jq
>>>>>> <https://stedolan.github.io/jq/> inside of a Beam job.
>>>>>>
>>>>>> Not to make Wasm sound like a Python only technology, it can be used
>>>>>> via Java/JVM via
>>>>>>
>>>>>>    - https://www.graalvm.org/22.1/reference-manual/wasm/
>>>>>>    - https://github.com/kawamuray/wasmtime-java
>>>>>>
>>>>>> Sean
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 15, 2022 at 9:35 AM Pablo Estrada <pabl...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> adding Steven in case he didn't get the replies : )
>>>>>>>
>>>>>>> On Wed, Jun 15, 2022 at 9:29 AM Daniel Collins <dpcoll...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> If we ever do anything with the JS runtime, this would seem to be
>>>>>>>> the best place to run WASM.
>>>>>>>>
>>>>>>>> On Tue, Jun 14, 2022 at 8:13 PM Brian Hulette <bhule...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> FYI: @Sean Jensen-Grey <jenseng...@google.com> gave a talk back
>>>>>>>>> in 2020 where he had integrated Rust with the Python SDK. I thought 
>>>>>>>>> he used
>>>>>>>>> WebAssembly for that, but it looks like he used some other 
>>>>>>>>> approaches, and
>>>>>>>>> his talk mentioned WebAssembly as future work. Not sure if that was 
>>>>>>>>> ever
>>>>>>>>> explored.
>>>>>>>>>
>>>>>>>>> https://www.youtube.com/watch?v=fZK_Tiu7q1o
>>>>>>>>> https://github.com/seanjensengrey/beam-rust-python-java
>>>>>>>>>
>>>>>>>>> Brian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 14, 2022 at 5:05 PM Ahmet Altay <al...@google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Adding @Lukasz Cwik <lc...@google.com> - he was interested in
>>>>>>>>>> the WebAssembly topic.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 14, 2022 at 3:09 PM Pablo Estrada <pabl...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Would you open a pull request for it? Or at least share a
>>>>>>>>>>> branch? : )
>>>>>>>>>>> Even if we don't want to merge it, it would be great to have a
>>>>>>>>>>> PR as a way to showcase the work, its usefulness, and receive 
>>>>>>>>>>> comments on
>>>>>>>>>>> this thread once we can see something more specific.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 14, 2022 at 3:05 PM Steven van Rossum <
>>>>>>>>>>> sjvanros...@google.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi folks,
>>>>>>>>>>>>
>>>>>>>>>>>> I had some spare time yesterday and thought it'd be fun to
>>>>>>>>>>>> implement a transform which runs WebAssembly modules as a 
>>>>>>>>>>>> lightweight way
>>>>>>>>>>>> to implement cross language transforms for languages which don't 
>>>>>>>>>>>> (yet) have
>>>>>>>>>>>> a SDK implementation.
>>>>>>>>>>>>
>>>>>>>>>>>> I've got a small proof of concept running in the Python SDK as
>>>>>>>>>>>> a DoFn with Wasmer as the WebAssembly runtime and simple support 
>>>>>>>>>>>> for
>>>>>>>>>>>> marshalling between the host and guest environment with the 
>>>>>>>>>>>> RowCoder. The
>>>>>>>>>>>> module I've constructed is mostly useless, but demonstrates the 
>>>>>>>>>>>> host
>>>>>>>>>>>> copying the encoded element into the guest's memory, the guest 
>>>>>>>>>>>> copying
>>>>>>>>>>>> those bytes elsewhere in its linear memory buffer, the guest 
>>>>>>>>>>>> calling back
>>>>>>>>>>>> to the host with the offset and size and the host copying and 
>>>>>>>>>>>> decoding from
>>>>>>>>>>>> the guest's memory.
>>>>>>>>>>>>
>>>>>>>>>>>> Any thoughts/interest? I'm not sure where I was going with
>>>>>>>>>>>> this, since it was mostly just a "wouldn't it be cool if..." on a 
>>>>>>>>>>>> Monday
>>>>>>>>>>>> afternoon, but I can see a few use cases for this.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Steve
>>>>>>>>>>>>
>>>>>>>>>>>> Steven van Rossum |  Strategic Cloud Engineer |
>>>>>>>>>>>> sjvanros...@google.com |  (+31) (0)6 21174069
>>>>>>>>>>>> <+31%206%2021174069>
>>>>>>>>>>>>
>>>>>>>>>>>> *Google Netherlands B.V.*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *Reg: Claude Debussylaan 34 15th floor, 1082 MD
>>>>>>>>>>>> Amsterdam34198589NETHERLANDSVAT / Tax ID:- 812788515 B01*
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *If you received this communication by mistake, please don't
>>>>>>>>>>>> forward it to anyone else (it may contain confidential or 
>>>>>>>>>>>> privileged
>>>>>>>>>>>> information), please erase all copies of it, including all 
>>>>>>>>>>>> attachments, and
>>>>>>>>>>>> please let the sender know it went to the wrong person. Thanks.*
>>>>>>>>>>>>
>>>>>>>>>>>> *The above terms reflect a potential business arrangement, are
>>>>>>>>>>>> provided solely as a basis for further discussion, and are not 
>>>>>>>>>>>> intended to
>>>>>>>>>>>> be and do not constitute a legally binding obligation. No legally 
>>>>>>>>>>>> binding
>>>>>>>>>>>> obligations will be created, implied, or inferred until an 
>>>>>>>>>>>> agreement in
>>>>>>>>>>>> final form is executed in writing by all parties involved.*
>>>>>>>>>>>>
>>>>>>>>>>>

Reply via email to