Re: Keeping keys in a state for a very very long time (keys expiry unknown)

2020-04-06 Thread Reza Ardeshir Rokni
Are you able to make use of the following pattern? Store StateA-metadata until no activity for Duration X, you can use a Timer to check this, then expire the value, but store in an external system. If you get a record that does want this value after expiry, call out to the external system and stor

Re: Keeping keys in a state for a very very long time (keys expiry unknown)

2020-04-06 Thread Reza Ardeshir Rokni
, Apr 6, 2020 at 5:37 PM Reza Ardeshir Rokni > wrote: > >> Are you able to make use of the following pattern? >> >> Store StateA-metadata until no activity for Duration X, you can use a >> Timer to check this, then expire the value, but store in an >> external

Re: Keeping keys in a state for a very very long time (keys expiry unknown)

2020-04-06 Thread Reza Ardeshir Rokni
Since I wanted to have a long window size > per key and since we can't use state with session windows, I am using a > sliding window for let's say 72 hrs which starts every hour. > > Thanks a lot Reza for your input. > > Regards > Mohil > > On Mon, Apr 6,

Re: Non-trivial joins examples

2020-05-03 Thread Reza Ardeshir Rokni
A couple of things that are really nice here, 1- Domain specific (CTR in your example). We may find that eventually it's not possible / practical to build out generic joins for all situations. But with the primitives available in Beam and good 'patterns' domain specific joins could be added for di

Re: PubSub latency for Beam pipeline on Flink runner

2020-05-12 Thread Reza Ardeshir Rokni
Hi Vincent, Did you mean <=3000 or did you want that to be <=3? Cheers Reza On Fri, 8 May 2020 at 04:23, Vincent Domingues < vincent.doming...@dailymotion.com> wrote: > Hi all, > > We are trying to work with Beam on Flink runner to consume PubSub messages. > > We are facing latency issue ev

Re: Re: Join daily update Bigquery table with pubsub topic message

2020-05-23 Thread Reza Ardeshir Rokni
If things fit in memory please have a look at the following pattern: https://beam.apache.org/documentation/patterns/side-inputs/#slowly-updating-global-window-side-inputs Note there is a nicer API coming for this pattern, https://issues.apache.org/jira/browse/BEAM-9650 On Sun, 24 May 2020 at

Re: Limiting elements on streaming pipelines

2020-11-03 Thread Reza Ardeshir Rokni
Hi, You may want to use more than one element in your Create to start the FlatMap process as with a runner that does Fusion , the code will end up only being able to parallelize to 1. So make use of a Create wi

Re: Session window ad sideinput

2020-12-22 Thread Reza Ardeshir Rokni
Hi, Why the need for session windows? Could you make use of a Global Window for the side input, as per the following pattern: https://beam.apache.org/documentation/patterns/side-inputs/ Cheers Reza On Tue, 22 Dec 2020 at 01:17, Manninger, Matyas wrote: > Dear Beam users, > > I am writing a

Re: Session window ad sideinput

2021-01-04 Thread Reza Ardeshir Rokni
hanks for the suggestion, that is the solution I was going for but > unfortunately PeriodicImpulse has some bugs. I also posted a question about > that in this mail list but no success there so far so I am looking for > alternatives. > > On Tue, 22 Dec 2020 at 12:36, Reza Ardeshir

Re: Looping timer, global windows, and direct runner

2021-01-12 Thread Reza Ardeshir Rokni
Are you making use of TestStream for your Unit test? On Wed, 13 Jan 2021 at 00:27, Raman Gupta wrote: > Your reply made me realize I removed the condition from my local copy of > the looping timer that brings the timer forward if it encounters an earlier > element later in the stream: > > if (cu

Re: Looping timer, global windows, and direct runner

2021-01-13 Thread Reza Ardeshir Rokni
Hi, Late data, in general, can be problematic for the looping timer pattern, as well as other use cases, where the arrival of data in the @process function creates a EventTime domain Timer. The solution you have, which essentially passes it through, is a nice option, another solution would be to d

How to isSet() check for a Timer

2018-11-02 Thread Reza Ardeshir Rokni
Hi, I have a need to set an alarm in both the Element DoFn as well as the OnTimer code block. I need to ensure that I do not overwrite a already set timer, is there a way to check if a Timer has been set? One thought was to use a ValueState which I can manually set / unset as part of the operatio

Re: Inferring Csv Schemas

2018-11-30 Thread Reza Ardeshir Rokni
Hi Joe, You may find some of the info in this blog of interest, its based on streaming pipelines but useful ideas. https://cloud.google.com/blog/products/gcp/how-to-handle-mutating-json-schemas-in-a-streaming-pipeline-with-square-enix Cheers Reza On Thu, 29 Nov 2018 at 06:53, Joe Cullen wrote

Re: Joining bounded and unbounded data not working using non-global window

2018-12-10 Thread Reza Ardeshir Rokni
Hi, A couple of thoughts; 1- If the amount of data in Hbase that you need to join with is small and does not change, could you use a Side Input? If it does change you could try making use of pattern slowly changing lookup cache (ref below). 2- If the amount of data is large, would a direct hbase

OnTimer / OnProcess timing

2018-12-14 Thread Reza Ardeshir Rokni
Hi, I believe a bug in my timeseries code is because of something I missed in the sequence of OnTimer / ProcessElement when in stream mode. If a timer has been set at the lower boundary of a window and elements arrive in that windows keyed state, which will fire first? The @OnTimer or @ProcessEle

Re: OnTimer / OnProcess timing

2018-12-15 Thread Reza Ardeshir Rokni
rs > are > fired when they are ready. It is best not to make assumptions based on > when > elements arrive which belong to the same window. However, you can be sure > that > timers fire after they become eligible. > > Thanks, > Max > > On 14.12.18 10:43, Reza Ar

Re: GroupByKey and number of workers

2019-01-02 Thread Reza Ardeshir Rokni
Hi Mohamed, I believe this is related to fusion which is a feature of some of the runners, you will be able to find more information on fusion on: https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#fusion-optimization Cheers Reza On Thu, 3 Jan 2019 at 04:09, Mohamed Haseeb wro

Re: Beam Summits!

2019-01-04 Thread Reza Ardeshir Rokni
Hi, Are there any other folk here based out of Singapore, or APAC in general? Cheers Reza On Fri, 4 Jan 2019 at 04:39, Austin Bennett wrote: > Hi Matthias, etc, > > Trying to get thoughts on formalizing a process for getting proposals > together. I look forward to the potential day that there

Re: clear State using business logic

2022-12-09 Thread Reza Ardeshir Rokni
Have you explored processing time timers? https://beam.apache.org/releases/javadoc/2.43.0/org/apache/beam/sdk/state/TimeDomain.html#PROCESSING_TIME On Wed, 23 Nov 2022 at 13:46, Sigalit Eliazov wrote: > Hi all, > > the flow in our pipeline is: > > 1. read event X from kafka. open fixed window o

Re: [Question] state scope to only key

2023-01-26 Thread Reza Ardeshir Rokni
Hi, For these types of use cases, folks will generally make use of the Global Window which is -/+ inf and Timers. Some key considerations when using the Global Window: 1- GC is not done by the system as the window will never close. 2- There are no order guarantees, so you will often need to make

Re: [E] Re: [Question] state scope to only key

2023-01-26 Thread Reza Ardeshir Rokni
certain max > count, the newer events are skipped until the state is cleared after the > throttle period. I am looking at something similar to a stateful keyed > parDo so that all events of same key go to the same worker (assuming state > is local to worker as in flink) > > On

Re: [E] Re: [Question] state scope to only key

2023-01-26 Thread Reza Ardeshir Rokni
PS for the elements that flow through when < x you will need to add a data driven trigger to after the global window. On Thu, 26 Jan 2023 at 20:11, Reza Ardeshir Rokni wrote: > So it sounds like the timestamp of the event is not important here? If > that is correct then order is not