Hey everyone,

This is long-overdue, but I wanted to share a doc (
https://docs.google.com/document/d/1Rin_5Vm3qT1Mkb5PcHgTDrjXc3j0Admzi3fEGEHB2-4/edit?usp=sharing)
that outlines a new implementation of BatchElements in the Python SDK that
is state-aware, and therefore allows for batching elements across bundles
with dynamic batch sizing. The transform is actually already in the Beam
repo, as a prototype PR rapidly turned into the bulk of the work being
finished. This doc does, at the very least, outline some interesting
mechanics and nuances involved in putting this together.

The DoFn is largely targeted for use in the RunInference framework; however
it will be accessible as of 2.53.0 for general use through BatchElements as
well. The code itself is at
https://github.com/apache/beam/blob/5becfb8ed430fe9a317cd2ffded576fe2ab8e980/sdks/python/apache_beam/transforms/util.py#L651

Thanks,

Jack McCluskey

-- 


Jack McCluskey
SWE - DataPLS PLAT/ Dataflow ML
RDU
jrmcclus...@google.com

Reply via email to