Re: Options for visualizing the pipeline DAG

2023-09-01 Thread Danny McCormick via user
Hey Joey, Dataflow and Beam playground are 2 options as you mentioned, locally many SDKs have local runner options with a visual component. For example, in Python you can use the interactive runner with the apache-beam-jupyterlab-sidepanel extension

Re: Issue with growing state/checkpoint size

2023-09-01 Thread Sachin Mittal
Yes a very high and non deterministic cardinality can make the stored state of join operation unbounded. In my case we know the cardinality and it was not very high so we could go with a lookup based approach using redis to enrich the stream and avoid joins. On Wed, Aug 30, 2023 at 5:04 AM Ruben

Re: Issue with growing state/checkpoint size

2023-09-01 Thread Ruben Vargas
Ohh I see That makes sense. Wondering if there is an strategy for my use case, where I have an ID unique per pair of messages Thanks for all your help! On Fri, Sep 1, 2023 at 6:51 AM Sachin Mittal wrote: > Yes a very high and non deterministic cardinality can make the stored > state of join op

Re: Issue with growing state/checkpoint size

2023-09-01 Thread Byron Ellis via user
Depends on why you're using a fan-out approach in the first place. You might actually be better off doing all the work at the same time. On Fri, Sep 1, 2023 at 6:43 AM Ruben Vargas wrote: > Ohh I see > > That makes sense. Wondering if there is an strategy for my use case, where > I have an ID un

Re: Options for visualizing the pipeline DAG

2023-09-01 Thread Joey Tran
Perfect, `pipeline_graph` python module in the stack overflow post [1] was exactly what I was looking for. The dependencies I'm working with are a bit heavyweight and likely difficult to install into a notebook, so I was looking for something I could do on my local machine. Thanks! Joey [1] - htt

Re: Options for visualizing the pipeline DAG

2023-09-01 Thread Robert Bradshaw via user
You can also use Python's RenderRunner, e.g. python -m apache_beam.examples.wordcount --output out.txt \ --runner=apache_beam.runners.render.RenderRunner \ --render_output=pipeline.svg This also has an interactive mode, triggered by passing --port=N (where 0 can be used to pick an unuse

Re: Options for visualizing the pipeline DAG

2023-09-01 Thread Robert Bradshaw via user
(As an aside, I think all of these options would make for a great blog post if anyone is interested in authoring one of those...) On Fri, Sep 1, 2023 at 9:26 AM Robert Bradshaw wrote: > You can also use Python's RenderRunner, e.g. > > python -m apache_beam.examples.wordcount --output out.txt \