While implementing a runner, we tried annotating a CombineByKey transform.
I noticed that the annotations for the CBK are then lost in the fusion
optimization stage when the CBK is broken into components. Is this
intentional?
I can put up a PR if this seems worth fixing (it is for us at least), ju
Ended up just filing a PR [1]
[1] https://github.com/apache/beam/pull/28489
On Fri, Sep 15, 2023 at 12:51 PM Joey Tran
wrote:
> While implementing a runner, we tried annotating a CombineByKey transform.
> I noticed that the annotations for the CBK are then lost in the fusion
> opt
Writing a runner and the first strategy for determining bundling size was
to just start with a bundle size of one and double it until we reach a size
that we expect to take some targets per-bundle runtime (e.g. maybe 10
minutes). I realize that this isn't the greatest strategy for high sized
cost t
.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.BatchElements
On Thu, Sep 21, 2023 at 7:23 PM Joey Tran wrote:
> Writing a runner and the first strategy for determining bundling size was
> to just start with a bundle size of one and doubl
then it is
> the runner's job to put as many calls to @ProcessElement as possible to
> amortize.
>
> Kenn
>
> On Fri, Sep 22, 2023 at 9:39 AM Joey Tran
> wrote:
>
>> Whoops, I typoed my last email. I meant to write "this isn't the
>> greatest st
> On Tue, Oct 3, 2023 at 9:15 PM Joey Tran
> wrote:
>
>> Not sure what that suggests
>>
>> On Tue, Oct 3, 2023, 6:24 PM XQ Hu via user wrote:
>>
>>> Looks like this is the current behaviour. If you have `t =
>>> beam.Filter(identity_filter)`, `t.l
;
> [1] Note that this applies to the fully qualified transform name, so the
> naming only has to be distinct within a composite transform (or at the top
> level--the pipeline itself is isomorphic to a single composite transform).
>
> On Wed, Oct 4, 2023 at 3:43 AM Joey Tran
> wro
7;s togglable
> with an option now. We should probably add the option to toggle Python too.
> (Unclear what the default should be, but this probably ties into
> re-thinking how pipeline update should work.)
>
> On Thu, Oct 5, 2023 at 4:58 AM Joey Tran
> wrote:
>
>> Makes
Bump on this. Sorry to pester - I'm trying to get a few teams to adopt
Apache Beam at my company and I'm trying to foresee parts of the API they
might find inconvenient.
If there's a conclusion to make the behavior similar to java, I'm happy to
put up a PR
On Thu, Oct 5, 2023
behavior similar to java, I'm happy
>> to put up a PR
>>
>> On Thu, Oct 5, 2023, 12:49 PM Joey Tran
>> wrote:
>>
>>> Is it really toggleable in Java? I imagine that if it's a toggle it'd be
>>> a very sticky toggle since i
uld then attribute old B_2's state
> to the new B_2 (and also possibly mis-direct any inflight messages). At
> least with the old, intersecting names we can detect this problem
> rather than silently give corrupt data.
>
>
> On Fri, Oct 13, 2023 at 7:15 AM Joey Tran
>
On Fri, Oct 13, 2023 at 1:18 PM Robert Bradshaw wrote:
> On Fri, Oct 13, 2023 at 10:08 AM Joey Tran
> wrote:
>
Are there places on the SDK side that expect unique labels? Or in
>> non-updateable runners?
>>
>
> That's a good question. The label eventually en
Hey all,
While writing a few pipelines, I was surprised by how few partitioners
there were in the python SDK. I wrote a couple that are pretty generic and
possibly generally useful. Just wanted to do a quick poll to see if they
seem useful enough to be in the sdk's library of transforms. If so, I
r PR.
>
> Thanks,
> Danny
>
> On Thu, Oct 19, 2023 at 10:06 AM Joey Tran
> wrote:
>
>> Hey all,
>>
>> While writing a few pipelines, I was surprised by how few partitioners
>> there were in the python SDK. I wrote a couple that are pretty generic and
>&
/github.com/apache/beam/blob/68e9c997a9085b0cb045238ae406d534011e7c21/sdks/python/apache_beam/transforms/combiners.py#L191
>
> On Thu, Oct 19, 2023 at 3:21 PM Joey Tran
> wrote:
>
>> Yes, both need to be small enough to fit into state.
>>
>> Yeah a percentage sam
ed
[1]
https://github.com/apache/beam/blob/e7a6405800a83dd16437b8b1b372e020e010a042/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java#L630
On Fri, Oct 13, 2023 at 1:32 PM Joey Tran wrote:
>
>
> On Fri, Oct 13, 2023 at 1:18 PM Robert Bradshaw
> wrote:
>
>> On Fri,
PR for top: https://github.com/apache/beam/pull/29106
On Mon, Oct 23, 2023 at 10:11 AM XQ Hu via dev wrote:
> +1 on this idea. Thanks!
>
> On Thu, Oct 19, 2023 at 3:40 PM Joey Tran
> wrote:
>
>> Yeah, I already implemented these partitioners for my use case (I just
>
Hi all,
I just had a workshop to demo beam for people at my company and there was a
bit of confusion about whether the beam python playground examples were
even working and it turned out they just got confused by all the runner
logging that is output.
Is this worth keeping? It seems like it'd be
> wrote:
>
>> +1 to at least setting the log level to higher than info. Some runner
>> logging (e.g. job started/done) may be useful.
>>
>> On Tue, Nov 14, 2023 at 9:37 AM Joey Tran
>> wrote:
>> >
>> > Hi all,
>> >
>> >
could try to make it
> crash and maybe find a stacktrace? Setting logging could like like so:
> https://github.com/apache/beam/blob/729c4de416b8252ec99f0a1253ac7af3023733df/sdks/python/apache_beam/examples/wordcount.py#L110
>
> On Wed, Nov 15, 2023 at 12:06 PM Joey Tran
> wrote:
>
Hey all,
We have a pretty big pipeline and while I was inspecting the stages, I
noticed there is less fusion than I expected. I suspect it has to do with
the heavy use of side inputs in our workflow. In the python sdk, I see that
side inputs are considered when determining whether two stages are f
but they principally boil down to
> shapes that look like this.
>
> Though this does not introduce a global barrier in streaming, there is
> still the analogous per window/watermark barrier that prevents fusion for
> the same reasons.
>
>
>
>
> On Thu, Dec 14, 2023 a
also uses Stateful transforms, or SplittableDoFns, those
> are usually relegated to the root of a fused stage, and avoids fusions with
> each other. That can also cause additional stages.
>
> If Beam adopted a rigorous notion of Key Preserving for transforms,
> multiple stateful tr
/manage hints as they like.
> Transform annotations might be an alternative, but how those are managed
> would be more SDK specific.
>
> On Fri, Dec 15, 2023, 5:21 AM Joey Tran wrote:
>
>> I figured out my issue. I thought side inputs were breaking up my
>> pipeline but after ex
ow them over time.
>
> On Fri, Dec 15, 2023 at 5:57 AM Joey Tran
> wrote:
>
>> Yeah I can confirm for the python runners (based on my reading of the
>> translations.py [1]) that only identical environments are merged together.
>>
>> The funny thing is that we _origi
Hey all,
Is there a particular reason we hard code "beam:runner:executable_stage:v1"
everywhere in the python SDK instead of putting it in common_urns?
t of the model so much as an implementation
> detail, but it likely does make sense to put somewhere common.
>
> On Wed, Dec 20, 2023 at 12:55 PM Joey Tran
> wrote:
>
>> Hey all,
>>
>> Is there a particular reason we hard code
>> "beam:runner:executab
I've been working with a few data types that are in practice
unpicklable and I've run into a couple issues stemming from the `Any` type
hint, which when used, will result in the PickleCoder getting used even if
there's a coder in the coder registry that matches the data element.
This was pretty un
non-obvious downstream issues.*
On Fri, Jan 5, 2024 at 12:05 PM Robert Bradshaw via dev
wrote:
> On Fri, Jan 5, 2024 at 7:38 AM Joey Tran
> wrote:
>
>> I've been working with a few data types that are in practice
>> unpicklable and I've run into a couple issu
Oh actually, overriding the fallback coder doesn't actually do anything
because the issue is not with the fallback coders in the registry but the
fastprimitivescoder's fallback coder
On Fri, Jan 5, 2024 at 12:42 PM Joey Tran wrote:
>
> I think my original message made it s
Hey all,
I was poking around and looking at `Distinct` and was confused about why it
was implemented the way it was.
Reproduced here:
@ptransform_fn
@typehints.with_input_types(T)
@typehints.with_output_types(T)
def Distinct(pcoll): # pylint: disable=invalid-name
"""Produces a PCollection cont
hine traffic, with this form the first
> worker will only emit [A, B] and the second [B, C] and only the B
> needs to be deduplicated post-shuffle.
>
> Wouldn't hurt to have a comment to that effect there.
>
> https://beam.apache.org/documentation/programming-guide/#combine
>
>
hem back to the full
groupbykey.
Thanks for the clarification!
On Fri, Jan 26, 2024 at 12:03 PM Robert Bradshaw
wrote:
> On Fri, Jan 26, 2024 at 8:43 AM Joey Tran
> wrote:
> >
> > Hmm, I think I might still be missing something. CombinePerKey is made
> up of "GBK() | Co
Hey all,
I've been really trying to use Playground for educating new Beam users but
it feels like there's something missing. A lot of examples (e.g. Multiple
ParDo Outputs) for at least the python API don't seem to do anything
observable. For example, the Multiple ParDo Outputs example writes to a
g to? I checked a few
> examples and usually we use beam.Map(print) to display some output values.
>
> On Wed, Feb 7, 2024 at 8:55 PM Joey Tran
> wrote:
>
>> Hey all,
>>
>> I've been really trying to use Playground for educating new Beam users
>> but it
4.0.
>
> On Thu, Feb 8, 2024, 7:18 AM Joey Tran wrote:
>
>> Here's two:
>>
>> https://play.beam.apache.org/?path=SDK_PYTHON_MultipleOutputPardo&sdk=python
>> https://play.beam.apache.org/?path=SDK_PYTHON_WordCount&sdk=python
>>
>> Also, how
Hey all,
I'm trying to get a beam python SDK dev environment going but I'm a bit
stuck. I'm just settings things up with a virtual env as specified in the
docs[1], but `pip install -e .[gcp,test]` ends with a clang error:
```
clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DND
quot;"""""""""""""""""""""""""""""""""""""""""""""""""
andrey.devyat...@akvelon.com> wrote:
> Hi Joey,
>
> Thanks for reaching out! I see that your changes haven't been deploed yet,
> so I've triggered the corresponding job and Playground will be updated soon.
>
> Thanks,
> Andrey
>
>
>
> *From: *Joey Tran
Hey all,
Using an identity function for FlatMap comes up more often than using
FlatMap without an identity function. Would it make sense to use the
identity function as a default?
> wrote:
> Hi, you can use beam.Flatten() instead.
>
> On Thu, Mar 21, 2024 at 10:55 AM Joey Tran
> wrote:
>
>> Hey all,
>>
>> Using an identity function for FlatMap comes up more often than using
>> FlatMap without an identity function. Would it make sense to use the
>> identity function as a default?
>>
>>
>>
>>
latMap a default arg of lambda x: x is an interesting idea. The
>> only downside I see is a less clear error if one forgets to provide this
>> (now mandatory) parameter, but maybe that's low enough to be worth the
>> convenience?
>>
>> On Thu, Mar 21, 2024 at 12:02
bmission time in Python if you pass a single
> PCollection to Flatten. The scenario you describe concerns a one-element
> list.
>
> On Thu, Mar 21, 2024, 13:43 Joey Tran wrote:
>
>> I think it'd be quite surprising if beam.Flatten would become equivalent
>> to FlatMap
;
> seems a bit error prone.
>
>
> On Thu, Mar 21, 2024 at 2:23 PM Joey Tran
> wrote:
>
>> Ah, I misunderstood your original suggestion then. That makes sense then.
>> I have already seen someone get a little confused about the names and
>> surprised that Flatten does
Hi,
I've been banging my head trying to get a dev environment working. I gave
up trying to get a local python environment working after I got some weird
clang errors and proto generation issues so I've been trying to just use
the docker container by running `bash start-build-env.sh` but I'm runni
t all. I
> would remove that 'go get' line.
>
> There's a different issue at play here too since it was written for
> pre-module Go in mind. I'm unfamiliar with that script though.
>
> I'll take a proper look in a few hours.
>
> On Fri, Mar 22, 2024, 5
I posted a PoC PR [1] for fixing deferred side inputs with combiners in the
python SDK. Would someone be willing to take a look at it?
I have it working but could use some feedback on where to take it next. It
looks like bundle processor combiner operations don't currently support
side inputs [2]
I think I might be doing something silly with my environment.
I'm trying to lint using tox in a dev container, but running tox ends with
this error:
```
(env) jtran@[Beam Build Env.]:~/beam {flatmapdefault} ]
$ tox
File "/usr/lib/python3/dist-packages/tox/reporter.py", line 32, in
__init__
Yeah that was the tox command I was running
On Fri, Apr 5, 2024, 4:37 PM XQ Hu via dev wrote:
>
> https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-LintandFormattingChecks
>
> This generally works well. Have you checked this?
>
> On Fri, Apr 5, 2024 at
are relevant.
>- proposed approach and considered alternatives
>- any runner-specific considerations.
>
> Thanks,
> Valentyn
>
> On Fri, Mar 29, 2024 at 5:06 AM Joey Tran
> wrote:
>
>> I posted a PoC PR [1] for fixing deferred side inputs with combiners in
>&
Hey all,
We're running into a fairly common situation where we get a TypeCheckError
when we supply an int value where a float parameter is expected.
It's easy enough to work around this on a case-by-case basis by just
casting ints to floats, but this is an easy thing to forget to do, which
means
There are a handful of bugs with the interactive examples on the Beam docs
page
https://beam.apache.org/documentation/transforms/python/elementwise/flatmap/#example-3-flatmap-without-a-function
https://beam.apache.org/documentation/transforms/python/elementwise/map/
(example 8)
I wonder if it mig
Hey all,
I have a python runner I've written and I'm debugging the case when a SDK
worker crashes. Currently my driver/ runner starts the SDK worker and then
pushes data / instructions through the data and control connections, but
this just silently waits forever if the worker has actually crashed
Ah I realize now that `standard_optimize_phases` is not actually currently
used by the fn runner so these changes don't effectively do anything
On Sun, Sep 29, 2024 at 3:19 PM Joey Tran wrote:
> I took a crack at trying to replace GBK+CombineValues with CBK so then it
> doesn't
I took a crack at trying to replace GBK+CombineValues with CBK so then it
doesn't matter what the user chooses to do.
https://github.com/apache/beam/pull/32592
On Fri, Sep 27, 2024 at 4:32 PM Joey Tran wrote:
> Hmm, it makes sense from a runner optimization perspective, but I think
https://beam.apache.org/releases/pydoc/current/search.html?q=with_exception_handling&check_keywords=yes&area=default
This page for example doesn't seem to load or for a couple people I asked
to try it. Just wanted to report.
Would anyone be up to take a look at the approach here? I think the
standard translation phases are currently used by the portable runner so
the portable runner tests might indicate it works?
On Sun, Sep 29, 2024, 3:19 PM Joey Tran wrote:
> I took a crack at trying to replace
om the status message, so I don't think there's value in another
> RPC.
>
> Grpc possibly has something built in as well, if only for connection
> monitoring.
>
> On Fri, Oct 4, 2024, 10:46 AM Joey Tran wrote:
>
>> Hey all,
>>
>> I have a python runner
> tells you it is alive and doing work as needed. I had do that via the
> control channel.
>
> Also, I am very curious about this new runner.. mind sharing?
>
> On Fri, Oct 4, 2024 at 1:46 PM Joey Tran
> wrote:
>
>> Hey all,
>>
>> I have a python runn
Hey all,
Is there an RPC for cancelling a processbundlerequest? e.g. if a bundle is
taking a while and you want it to timeout and start a different bundle?
If the bundle is a single unsplittable element though a split request won't
do anything, right?
On Wed, Oct 30, 2024, 6:45 PM Robert Bradshaw via dev
wrote:
> No, but you can ask it to split ASAP.
>
> On Wed, Oct 30, 2024 at 2:32 PM Joey Tran
> wrote:
> >
> > Hey
Looks like this is working with Beam 2.59.0
>> https://beam.apache.org/releases/pydoc/2.59.0/search.html?q=with_exception_handling&check_keywords=yes&area=default
>>
>> But it is broken on current and 2.60.0.
>>
>> On Fri, Oct 18, 2024 at 3:39 PM Joey Tran
>> wr
The gRPC worker handlers in the Python SDK don't define any helpers for
using the status channel right? If I write these for my runner (in the same
vein as how the other gRPC servers are implemented), is this contributing
upstream?
On Fri, Oct 4, 2024 at 2:33 PM Joey Tran wrote:
>
Hey all,
The ToList python example seems broken. Not sure if there's an issue for
this already. Hopefully it's an easy fix (thanks @Efrem Braun
for the report)
https://beam.apache.org/documentation/transforms/python/aggregation/tolist/
Best,
Joey
--
Joey Tran | Staff
I added it just yesterday :)
On Tue, Sep 24, 2024, 6:45 PM XQ Hu via dev wrote:
> Please check this:
> https://github.com/apache/beam/blob/master/website/ADD_LOGO.md
>
> On Tue, Sep 24, 2024 at 6:05 PM Ahmet Altay via dev
> wrote:
>
>> Hi Hope,
>>
>> Just checking: Were you able to add your log
create a pull request when we decide.
>
> We expect to have a decision in October.
>
> I also want to see the company logo on the Beam website.
>
> Thank you :)
>
> 2024년 9월 25일 (수) 오전 10:10, Joey Tran 님이 작성:
>
>> I added it just yesterday :)
>>
>> On Tu
Thoughts @dev for a `GroupedValues` version of
combiners? Named `PerGroup` or `PerGroupedValues`?
On Fri, Sep 27, 2024 at 2:57 PM Valentyn Tymofieiev
wrote:
>
> On Fri, Sep 27, 2024 at 11:13 AM Joey Tran
> wrote:
>
>> Ah! That is exactly the kind of primitive I was looki
s/portability/fn_api_runner/translations.py#L953
>
> https://github.com/apache/beam/blob/release-2.48.0/sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py#L1161
>
>
> I don't remember what the status is of enabling this by default (IIRC,
> they're condit
ough, if you get it
> working for yourself. (Tag lostluck in GitHub).
>
> I think if you search the dev list, you might find previous discussions on
> the topic of the dev container.
>
> Robert Burke
>
>
> On Thu, Sep 19, 2024, 2:17 PM Joey Tran wrote:
>
>> Hi
-o pipefail -c go get
github.com/linkedin/goavro/v2" did not complete successfully: exit code: 1
Any tips?
Cheers,
--
Joey Tran
[image: Schrödinger, Inc.] <https://schrodinger.com/>
cython isn't installed in the virtualenv. Running `pip uninstall cython`
resulted in:
`WARNING: Skipping cython as it is not installed.`
On Thu, Sep 19, 2024 at 7:05 PM Robert Bradshaw via dev
wrote:
>
>
> On Thu, Sep 19, 2024 at 3:43 PM Joey Tran
> wrote:
>
>> Ah
g-out the cython dep in pyproject.toml might
> remove cythonization too.
>
> On Thu, Sep 19, 2024 at 4:16 PM Joey Tran
> wrote:
>
>> cython isn't installed in the virtualenv. Running `pip uninstall cython`
>> resulted in:
>> `WARNING: Skipping cython as it is
here because I was wondering if the same line of thought
might've lead to the removal of PObject (just out of curiosity)
On Fri, Jan 31, 2025 at 11:57 AM Kenneth Knowles wrote:
>
>
> On Fri, Jan 31, 2025 at 11:20 AM Joey Tran
> wrote:
>
>> Is there an equivalent
nt of PObject. Given that the Beam API
>>> needed to work with the windowing model, we needed something like a PObject
>>> that could be windowed. This is what PCollectionView provides.
>>>
>>
>> +1, I'd forgotten about the windowing complexities as well.
>>
>
I'm trying to work with windows and I was hoping my current batch workflows
could work transparently with and without windows (I know "without windows"
really means with GlobalWindows), but any batch workflow that uses a
`CombineGlobally` doesn't work without using `without_defaults` otherwise
it g
I read the FlumeJava paper and I was just curious what happened to
PObjects. They seem like a useful construct. Do they exist in the java SDK
in some version still? Or were they done away with because they made
pipeline optimization more difficult?
Is there any kind of multiprocess parallelism built into the Python sdk
worker? In other words, is there a way for my runner to start a worker and
have it has multiple cores instead of having one worker per core? I thought
I saw this capability somewhere but now that I look I can only see the
pipel
I heard mention that there is a flatten unzipping optimization implemented
by some runners. I didn't see that in the python optimizations
in translations.py[1]. Just curious what this optimization is?
I think I get the general gist in that you dont necessarily need to combine
the input pcollection
25 at 4:09 PM Robert Bradshaw via dev
wrote:
> On Mon, Jan 27, 2025 at 1:00 PM Joey Tran
> wrote:
> >
> > I heard mention that there is a flatten unzipping optimization
> implemented by some runners. I didn't see that in the python optimizations
> in translatio
ecause I had reached my limit
> in dealing with the Graph at the time.
> >
> > The Python SDK approach to the optimizations is very set theory based,
> which isn't as natural (to me at least) for handling the flatten unzipping.
> >
> > Not impossible,
nd doesn't take into account the size of the data being
> materialized).
whoa! If the optimization takes into account the size of the data being
materialized, does that mean it happens at runtime?
On Mon, Jan 27, 2025 at 5:47 PM Robert Bradshaw wrote:
> On Mon, Jan 27, 2025 at 2:29
Got an access denied on the doc. Is intended to be shared publicly?
On Mon, Mar 17, 2025 at 6:17 PM Robert Bradshaw via dev
wrote:
> I've been thinking a bit about productionizing yaml pipelines, and a
> large part of that involves being able to write and run tests.
>
> I've put my thoughts up a
erate over it twice.
I'm considering implementing CombineFn execution with my runner such
that only two accumulators are ever loaded into memory at a time. Not sure
if CombineFn contract supports this though.
On Fri, Mar 14, 2025 at 8:20 AM Joey Tran wrote:
> My intuition says no says al
Naive question, but why is beam upgrading to cloudpickle?
I saw this doc:
https://docs.google.com/document/d/1G5Q0ckX5sKQRQD1yEkLCPQL7N6B-AL9Cb1p0zlOOfQU/edit?tab=t.0
Is the main reason because cloudpickle is more actively maintained?
On Mon, Apr 28, 2025 at 6:51 PM Claudius van der Merwe
wrot
AM Robert Bradshaw wrote:
> On Tue, Apr 29, 2025 at 7:51 PM Joey Tran
> wrote:
> >
> > Does cloudpickle make --save_main_session unnecessary? As in, will more
> transforms defined in __main__ "just work"?
>
> Yes. Or at least it "just works" much more oft
sion and runtime.
> - previously, some bugs and feature requests Beam requested in dill took
> a long time to be implemented and released.
>
> [1] https://lists.apache.org/thread/dvxvclhok0fx48955x6szvw4kotxh87n
> [2] https://github.com/apache/beam/issues/22893#issuecomment-150235
Looking forward to this example. I never knew about this transform :-)
On Sun, Apr 27, 2025, 1:52 PM XQ Hu via dev wrote:
> If you create a PR, the precommit website stage GCS workflow should be
> triggered. You should be able to view this with your PR. Example:
> https://github.com/apache/beam/
Not currently
On Sat, May 10, 2025, 12:48 AM Reuven Lax wrote:
> Does this work with nested fields? Can you specify Input_field="a.b.c"?
>
> On Fri, May 9, 2025 at 7:18 PM Joey Tran
> wrote:
>
>> Sure!
>>
>> Given a DoFn that has...
>>
I've written a `SchemadParDo(input_field: str, output_field, dofn:DoFn)`
transform for more easily writing a Schemad transform given a DoFn.
Is this something worth upstreaming into the Beam Python SDK? I wrote it to
make it easier to convert our current set of dofn's into schemad dofns for
use wi
t;,
output_field="word"))
And it'd produce Row(word="hello", id="id") and Row(word=""world", id="id")
On Fri, May 9, 2025, 9:57 PM Reuven Lax via dev wrote:
> Can you explain a bit how SchemadParDo works?
>
> On Fri, May 9, 2
dofn's
that implement buffers.
On Mon, May 12, 2025 at 11:30 AM Robert Bradshaw wrote:
> On Mon, May 12, 2025 at 8:25 AM Joey Tran
> wrote:
>
>> and preserve rather than elide the input field (at least as an option).
>>>
>>
>> Oh yes, that is act
ds.
That also doesn't feel like the greatest solution.
Are there any other ideas floating around?
On Mon, May 12, 2025 at 3:08 PM Joey Tran wrote:
> Does this handle the full DoFn spec (e.g. bundle start/finish, WIndowFn
>>> params, state and timers, etc.)?
>>
>> It
e to adapt a
> DoFn to apply to a single field of a Row? This certainly seems to
> have value. I might suggest a syntax like
>
> schema_pcoll | DoToField(SomeDoFn(), input_field="element",
> output_field="word")
>
> and preserve rather than elide the input
On Tue, Jun 17, 2025 at 7:52 PM Robert Bradshaw wrote:
> On Tue, Jun 17, 2025 at 3:57 PM Robert Burke wrote:
> >
> > +1
> >
> > You should be able to use the Prism runner to implement this locally.
> >
> > Prism passes the full suite of java MultimapState tests, and will ensure
> the implementat
ts.
>
> I'd vote we disable this for Prism and can take that on as part of
> enabling prism as the default runner [1].
>
> [1] WIP - https://github.com/apache/beam/pull/34612
>
> On Wed, Jun 4, 2025 at 10:48 AM Joey Tran
> wrote:
>
>> Hey all,
>>
>&g
Hey all,
We recently upgraded to Beam 2.63 from 2.50. After the upgrade, our unit
tests testing our runner saw a 4x-5x performance hit. It turned out it was
because for every pipeline run, the default `PipelineRunner.run_pipeline
invokes `PIpelineRunner.default_environment`[1] which eventually res
On Wed, Jul 16, 2025 at 2:24 AM Chamikara Jayalath via dev <
dev@beam.apache.org> wrote:
>
>
> On Tue, Jul 15, 2025 at 5:32 PM Joey Tran
> wrote:
>
>> "Environment" has become almost-but-not-quite assumed to be a docker
>>> container image specifica
Hey all,
@Jack McCluskey 's great talk on python type
hinting at the beam summit taught me that the "trivially" inferred types
are only used if the beam Map/FlatMap/Filter/ParDo functions aren't already
type hinted. This seems like it could be a waste of information. For
example, the following pip
``
def row_identity(inp_row: beam.Row) -> beam.Row:
return inp_row
(p
| Create([Row(x=1, y=2)])
| Map(row_identity)
```
Would be able to infer the row schema automatically rather than resorting
to using the user-typed and general `beam.Row`.
That said, I do agree that the user confusi
On Tue, Jul 1, 2025 at 2:37 PM Danny McCormick via dev
wrote:
> I think it is probably reasonable to automate this when a GPU resource
> hint is used. I think we still need to expose this as a config option for
> the ML containers (and it is the same with distroless) since it is pretty
> difficul
1 - 100 of 105 matches
Mail list logo