Dataproc+Flink+Beam Portable Runner Tutorial

2021-07-09 Thread Joey Tran
Hello! I'm trying to just demo Beam/Flink and I tried following the instructions with Google's Dataproc but I get a bunch of errors ranging from jackson dependency issues to some issue about "No container id". Does anyone know if these dataproc instructions[1] are complete? I ran through it prett

Re: Dataproc+Flink+Beam Portable Runner Tutorial

2021-07-09 Thread Joey Tran
heck out a patch for the issue, such as >>>> https://github.com/apache/beam/pull/14953 >>>> 3. Build the Flink runner using command "./gradlew >>>> :runners:flink:1.13:job-server:shadowJar" >>>> 4. Use the outputted Flink runner jar in your P

Re: Dataproc+Flink+Beam Portable Runner Tutorial

2021-07-09 Thread Joey Tran
uld be > "runners/flink/1.13/job-server/build/libs/beam-runners-flink-1.13-job-server-2.31.0-SNAPSHOT.jar" > (or similar depending on your Beam/Flink version choices). > > On Fri, Jul 9, 2021 at 1:43 PM Joey Tran > wrote: > >> Hi all, >> >> Thank yo

Re: Dataproc+Flink+Beam Portable Runner Tutorial

2021-07-12 Thread Joey Tran
inkle here is that support > for Flink 1.9 was dropped, so 2.29.0 is the last release that includes > support. So you will have to build from that branch or earlier. > > On Mon, Jul 12, 2021 at 11:13 AM Joey Tran > wrote: > >> Ah so sorry Kyle, I attached the wrong logs...

Re: Dataproc+Flink+Beam Portable Runner Tutorial

2021-07-13 Thread Joey Tran
mation. > > Beam+Flink+Dataproc isn't unheard of, but Java is definitely more common > than Python (and simpler to operate). And overall Dataflow is usually the > preferred way to run Beam on GCP. > > On Tue, Jul 13, 2021 at 7:27 AM Joey Tran > wrote: > >> I f

Getting Started With Implementing a Runner

2023-06-22 Thread Joey Tran
Hi Beam community! I'm interested in trying to implement a runner with my company's execution environment but I'm struggling to get started. I've read the docs page on implementing a runner but it's quite high level. Anyone hav

Re: Getting Started With Implementing a Runner

2023-06-22 Thread Joey Tran
lack has a link on the Beam Contact Us page > <https://beam.apache.org/community/contact-us/>, and I'd highly recommend > routing questions towards the developer mailing (d...@beam.apache.org) > list rather than the user one for runner implementation things. > > Thanks, >

Re: Getting Started With Implementing a Runner

2023-06-23 Thread Joey Tran
> >> Then, depending on answers, I’d suggest to take as an example one of the >> most similar Beam runners and use it as a more detailed source of >> information along with Beam runner doc mentioned before. >> >> — >> Alexey >> >> On 22 Jun 2023, a

Re: Getting Started With Implementing a Runner

2023-06-23 Thread Joey Tran
Fri, Jun 23, 2023 at 4:02 PM Robert Bradshaw wrote: > On Fri, Jun 23, 2023 at 11:15 AM Joey Tran > wrote: > >> Thanks all for the responses! >> >> If Beam Runner Authoring Guide is rather high-level for you, then, at >>> fist, I’d suggest to answer t

Re: Getting Started With Implementing a Runner

2023-07-09 Thread Joey Tran
s, > Cham > > On Fri, Jun 23, 2023 at 1:57 PM Robert Bradshaw via user < > user@beam.apache.org> wrote: > >> >> >> On Fri, Jun 23, 2023 at 1:43 PM Joey Tran >> wrote: >> >>> Totally doable by one pers

Re: Getting Started With Implementing a Runner

2023-07-13 Thread Joey Tran
ul 10, 2023 at 1:07 PM Robert Bradshaw wrote: > On Sun, Jul 9, 2023 at 9:22 AM Joey Tran > wrote: > >> Working on this on and off now and getting some pretty good traction. >> >> One thing I'm a little worried about is all the classes that are marked >> "in

Re: Getting Started With Implementing a Runner

2023-07-21 Thread Joey Tran
Could you let me know when you update it? I would be interested in rereading after the rewrite. Thanks! Joey On Fri, Jul 14, 2023 at 4:38 PM Robert Bradshaw wrote: > I'm taking an action item to update that page, as it is *way* out of date. > > On Thu, Jul 13, 2023 at 6:5

Options for visualizing the pipeline DAG

2023-08-31 Thread Joey Tran
For example, I might have a large codebase that's used to construct and run a pipeline, and in this case I don't think any of those three solutions would be very easy to use to visualize my pipeline (though I could be wrong) Best, Joey -- Joey Tran | Senior Developer Il | AutoDesi

Re: Options for visualizing the pipeline DAG

2023-09-01 Thread Joey Tran
ctrunner> > without an extension. Go has a dot runner > <https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.50.0/go/pkg/beam/runners/dot> > that produces a visual representation of a pipeline. Java has a similar dot > renderer <https://mehmandarov.com/apache-beam-pipeline-graph/

"Decorator" pattern for PTramsforms

2023-09-15 Thread Joey Tran
Is there a way to extend already defined PTransforms? My question is probably better illustrated with an example. Let's say I have a PTransform that generates a very variable number of outputs. I'd like to "wrap" that PTransform such that if it ever creates more than say 1,000 outputs, then I just

Re: "Decorator" pattern for PTramsforms

2023-09-15 Thread Joey Tran
more suitable solutions for that. > > [1] > https://beam.apache.org/documentation/programming-guide/#composite-transforms > [2] > https://beam.apache.org/releases/javadoc/2.50.0/org/apache/beam/sdk/transforms/Top.html > > — > Alexey > > On 15 Sep 2023, at 14:

Re: "Decorator" pattern for PTramsforms

2023-09-15 Thread Joey Tran
prepended to the problematic consuming PTransform > as well. > > - Robert > > > > On Fri, Sep 15, 2023 at 8:13 AM Joey Tran > wrote: > >> I'm aware of composite transforms and of the distributed nature of >> PTransforms. I'm not suggesting limiti

[QUESTION] Why no auto labels?

2023-09-30 Thread Joey Tran
After writing a few pipelines now, I keep getting tripped up from accidentally have duplicate labels from using multiple of the same transforms without labeling them. I figure this must be a common complaint, so I was just curious, what the rationale behind this design was? My naive thought off the

Re: [QUESTION] Why no auto labels?

2023-10-01 Thread Joey Tran
7;t care about update, Beam can auto generate these > names for you! When you call PCollection.apply (if using BeamJava), simply > omit the name parameter and Beam will auto generate a unique name for you. > > Reuven > > On Sat, Sep 30, 2023 at 11:54 AM Joey Tran > wrote: >

Re: [QUESTION] Why no auto labels?

2023-10-02 Thread Joey Tran
n the Python SDK, you can simply use pcollection | map_fn, instead > of pcollection | 'Map' >> map_fn. > > See an example here > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/cookbook/group_with_coder.py#L100-L116 > > > On Sun, Oct 1, 2023

[PYTHON] Yapf configurations to prevent workflow mangling

2023-10-02 Thread Joey Tran
knobs recommended for mitigating this? Best, Joey -- Joey Tran | Senior Developer Il | AutoDesigner TL *he/him* [image: Schrödinger, Inc.] <https://schrodinger.com/>

Re: [QUESTION] Why no auto labels?

2023-10-03 Thread Joey Tran
Not sure what that suggests On Tue, Oct 3, 2023, 6:24 PM XQ Hu via user wrote: > Looks like this is the current behaviour. If you have `t = > beam.Filter(identity_filter)`, `t.label` is defined as > `Filter(identity_filter)`. > > On Mon, Oct 2, 2023 at 9:25 AM Joey Tran > wro

Re: [PYTHON] Yapf configurations to prevent workflow mangling

2023-10-04 Thread Joey Tran
have a different version installed? > > Best, > Ahmed > > On Mon, Oct 2, 2023 at 12:59 PM Joey Tran > wrote: > >> Does anyone have any recommendations on how to get yapf to play nicely >> with beam workflows? Left to its own, it absolutely destroys the >> readab

Re: [PYTHON] Yapf configurations to prevent workflow mangling

2023-10-04 Thread Joey Tran
e google style to do what I want, just wanted to update in case anyone happens to already have a solution On Wed, Oct 4, 2023 at 9:37 AM Joey Tran wrote: > Huh, the default yapf settings _do_ seem to play nicely with beam after > all. I tried creating a bare dir and created a wordcount.p

Re: [QUESTION] Why no auto labels?

2023-10-05 Thread Joey Tran
; > [1] Note that this applies to the fully qualified transform name, so the > naming only has to be distinct within a composite transform (or at the top > level--the pipeline itself is isomorphic to a single composite transform). > > On Wed, Oct 4, 2023 at 3:43 AM Joey Tran > wro

Re: [QUESTION] Why no auto labels?

2023-10-05 Thread Joey Tran
7;s togglable > with an option now. We should probably add the option to toggle Python too. > (Unclear what the default should be, but this probably ties into > re-thinking how pipeline update should work.) > > On Thu, Oct 5, 2023 at 4:58 AM Joey Tran > wrote: > >> Makes

Re: [QUESTION] Why no auto labels?

2023-10-10 Thread Joey Tran
Bump on this. Sorry to pester - I'm trying to get a few teams to adopt Apache Beam at my company and I'm trying to foresee parts of the API they might find inconvenient. If there's a conclusion to make the behavior similar to java, I'm happy to put up a PR On Thu, Oct 5, 2023

Re: [QUESTION] Why no auto labels?

2023-10-13 Thread Joey Tran
behavior similar to java, I'm happy >> to put up a PR >> >> On Thu, Oct 5, 2023, 12:49 PM Joey Tran >> wrote: >> >>> Is it really toggleable in Java? I imagine that if it's a toggle it'd be >>> a very sticky toggle since i

Re: [QUESTION] Why no auto labels?

2023-10-13 Thread Joey Tran
uld then attribute old B_2's state > to the new B_2 (and also possibly mis-direct any inflight messages). At > least with the old, intersecting names we can detect this problem > rather than silently give corrupt data. > > > On Fri, Oct 13, 2023 at 7:15 AM Joey Tran >

Re: [QUESTION] Why no auto labels?

2023-10-13 Thread Joey Tran
On Fri, Oct 13, 2023 at 1:18 PM Robert Bradshaw wrote: > On Fri, Oct 13, 2023 at 10:08 AM Joey Tran > wrote: > Are there places on the SDK side that expect unique labels? Or in >> non-updateable runners? >> > > That's a good question. The label eventually en

Advanced Composite Transform Documentation

2023-10-19 Thread Joey Tran
For the python SDK, is there somewhere where we document more "advance" composite transform operations? e.g. I've been stumbling with questions like "How do I use a transform that expects a PBegin in a composite transform", "What's the proper way to return multiple output pcollections?", "What's th

Re: [QUESTION] Why no auto labels?

2023-10-20 Thread Joey Tran
ed [1] https://github.com/apache/beam/blob/e7a6405800a83dd16437b8b1b372e020e010a042/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java#L630 On Fri, Oct 13, 2023 at 1:32 PM Joey Tran wrote: > > > On Fri, Oct 13, 2023 at 1:18 PM Robert Bradshaw > wrote: > >> On Fri,

[python] Side Inputs to CombineGlobally

2024-03-08 Thread Joey Tran
In the python SDK, should we be able to supply side inputs to CombineGlobally? I created an example here that fails at the pipeline translation stage https://play.beam.apache.org/?sdk=python&shared=vjM_k2TvNrf It fails with ``` File "/Users/jtran/builds/2024-2/build/internal/lib/python3.11/site

Re: Transform Pattern Question

2024-10-12 Thread Joey Tran
On Sat, 12 Oct 2024 at 08:51, XQ Hu via user >> wrote: >> >>> This sounds like what CDC (Change Data Capture) typically does, which >>> usually runs as a streaming pipeline. >>> >>> On Fri, Oct 11, 2024 at 3:51 PM Joey Tran >>> wrote: >

Re: Transform Pattern Question

2024-10-15 Thread Joey Tran
e > these external libraries flexible and testable. > > On Sat, Oct 12, 2024, 12:31 PM Joey Tran > wrote: > >> Yes. But this is a hypothetical, there could also be many operations you >> might want to do with the initial data. >> >> On Sat, Oct 12, 2024, 1:47 P

Transform Pattern Question

2024-10-11 Thread Joey Tran
? Should they have included a `SquareRoot.WithKey()` transform that ignores a key? This feels like it'd be a common pattern but how to approach it feels awkward, not sure if I'm missing something obvious so thought I'd ask the group. Cheers, Joey -- Joey Tran | Staff Developer |

Re: Globally, PerKey, and... PerGrouped?

2024-09-27 Thread Joey Tran
aven't thought too much about this but from looking at > https://github.com/apache/beam/blob/c2c640f8c33071d5bb3e854e82c554c03a0bc851/sdks/python/apache_beam/transforms/combiners.py#L90, > I could see us adding Mean.GroupedValues or Mean.PerGroupedValues there. > > > On Fri, Sep 2

Re: Globally, PerKey, and... PerGrouped?

2024-09-27 Thread Joey Tran
ombiner) On Fri, Sep 27, 2024 at 1:31 PM Valentyn Tymofieiev via user < user@beam.apache.org> wrote: > > > On Fri, Sep 27, 2024 at 8:35 AM Joey Tran > wrote: > >> Hey all, >> >> Just curious if this pattern comes up for others and if people have >> wor

Globally, PerKey, and... PerGrouped?

2024-09-27 Thread Joey Tran
ouped_nums | Mean.PerGrouped()) keyed_counts (grouped_nums | Count.PerGrouped()) ``` But these "PerGrouped" variants don't actually currently exist. Does anyone else run into this pattern often? I might be missing an obvious pattern here. -- Joey Tran | Staff Developer | AutoDesi

Non-time based windowing

2025-01-31 Thread Joey Tran
I have some use cases where I have some global-ish context I'd like to partition my pipeline by but that aren't based on time. Does it seem reasonable to use windowing to encapsulate this kind of global context anyways? Contrived example, imagine I have a workflow for figuring out the highest scor

Re: Non-time based windowing

2025-02-07 Thread Joey Tran
I'll try my code on a more recent release to see if it works. Thanks for pointing that fix out On Fri, Feb 7, 2025 at 3:58 PM Joey Tran wrote: > It is not different! I am using beam version 2.50 and it looks like the > changes associated with that test were in 2.56. Were custom w

Re: Non-time based windowing

2025-02-07 Thread Joey Tran
It is not different! I am using beam version 2.50 and it looks like the changes associated with that test were in 2.56. Were custom window types not supported before then? On Fri, Feb 7, 2025 at 3:44 PM Robert Bradshaw wrote: > On Thu, Feb 6, 2025 at 8:39 AM Joey Tran > wrote: > >

Re: Non-time based windowing

2025-02-06 Thread Joey Tran
>> >> In practice, there's no design / implementation / API / protocol for >> windows with a notion of completeness that is not event time. But IIRC in >> early Spark Runner (and maybe today?) the checking of window completeness >> was literally just querying state (

[python] Does python SDK support typechecking of NewTypes?

2025-01-24 Thread Joey Tran
Hey all, Does the python sdk support type checking with python NewTypes? e.g. if you specify `UserId = NewType("UserId", str)`, and a transform expects UserIDs, will it fail if gets a pcollection of strs? My tests say no, but I just wanted to double check. If not, are there plans to?

Re: [python] Does python SDK support typechecking of NewTypes?

2025-01-24 Thread Joey Tran
Nevermind, I see now it is not implemented and it is tracked by this issue: https://github.com/apache/beam/issues/20076 On Fri, Jan 24, 2025 at 5:16 PM Joey Tran wrote: > Hey all, > > Does the python sdk support type checking with python NewTypes? e.g. if > you specify `UserId = New

Re: CombinePerKey with hot key fanout

2025-01-17 Thread Joey Tran
; bucket). This will result in increased shuffling and lower efficiency. > > Best, > > Jan > > On 1/16/25 22:01, Joey Tran wrote: > > Hi, > > > > I've read the documentation for CombinePerKey with hot key fanout and > > I think I understand it at a hi

CombinePerKey with hot key fanout

2025-01-16 Thread Joey Tran
Hi, I've read the documentation for CombinePerKey with hot key fanout and I think I understand it at a high level (split up and combine sharded keys before merging all values in one key) but I'm confused by the parameter that this method takes and how it affects the behavior of the transform. Is

Re: [python] is merge_accumulators called with stream of accumulators?

2025-03-14 Thread Joey Tran
My intuition says no says all the accumulators will just grouped together into a single key/accumulators element and that entire element will get loaded into memory, but not sure if there's any kind of special magic to handle this On Tue, Mar 11, 2025 at 11:16 PM Joey Tran wrote: >

Re: Beam YAML is great!

2025-04-29 Thread Joey Tran
for Beam 3.0 as well. > > On Tue, Apr 29, 2025 at 7:39 PM Ahmet Altay via user > wrote: > >> Great to hear and thank you for the feedback Joey! >> >> Would you be interested in publishing a case study on Beam's website? We >> will all very much ap

Beam YAML is great!

2025-04-29 Thread Joey Tran
forms, and error_handling). Just wanted to give some positive feedback. Thanks all who worked on it! Joey Tran | Staff Developer | AutoDesigner TL *he/him* [image: Schrödinger, Inc.] <https://schrodinger.com/>

Beam YAML Side Inputs?

2025-05-01 Thread Joey Tran
Are side inputs supported with Beam YAML? Is there a plan to support them if not or will they never be supported? Best, Joey -- Joey Tran | Staff Developer | AutoDesigner TL *he/him* [image: Schrödinger, Inc.] <https://schrodinger.com/>

[python] Beam Education Material for Workshops

2025-03-08 Thread Joey Tran
Hey all, We're starting to adopt Beam more widely amongst our engineers so we're trying to put together a workshop to teach Beam as it's proven a little bit difficult for some developers to get started on their own. Just wanted to see if there are any slide decks in the community for this kind of

Re: [python] Beam Education Material for Workshops

2025-03-11 Thread Joey Tran
ot particular links for you but I hope they can give you some >> places to check out. >> >> >> >> On Sat, Mar 8, 2025 at 12:45 PM Joey Tran >> wrote: >> >>> Hey all, >>> >>> We're starting to adopt Beam more widely amongst

[python] is merge_accumulators called with stream of accumulators?

2025-03-11 Thread Joey Tran
Hey all, Just wondering if CombineFn.merge_accumulators is called with a lazy iterator of accumulators or if it's expected to be called with all the accumulators in memory Best, Joey

[Python] Proper way to type dofns with multiple output tags?

2025-05-09 Thread Joey Tran
Is it to just type it based on the main output? def process(self, x) -> str: yield "x" yield TaggedOutput("numbers", 5)

Re: [Python] Proper way to type dofns with multiple output tags?

2025-05-09 Thread Joey Tran
d for > multiple-ouput Fns (though I think perhaps Jack was looking into this?) > > On Fri, May 9, 2025 at 2:40 PM Joey Tran > wrote: > >> Is it to just type it based on the main output? >> def process(self, x) -> str: >> yield "x" >> yield TaggedOutput("numbers", 5) >> >>

Twister2 Runner

2025-06-27 Thread Joey Tran
Hi, I was just wondering if twister2 and the twister2runner are still maintained or used? The twister2 repo [1] hasnt seen a change in 5 years and the twister2 runner directory has had a similarly low level of activity [2] [1]https://github.com/cylondata/twister2 [2] https://github.com/apache/be