It’s more like I have a window and I create a side input for that window.
Once I am done with processing the window I want to discard that side input
and create a new ones for subsequent windows.
On Tue, May 15, 2018 at 21:54 Harshvardhan Agrawal <
harshvardhan.ag...@gmail.com> wrote:
> In my cas
In my case since I am performing a lookup for some reference data that can
change periodically I can’t really have a global window. I would want to
update/re-lookup data per window.
On Tue, May 15, 2018 at 21:06 Raghu Angadi wrote:
> You are applying windowing to 'billingDataPairs' in the exampl
You are applying windowing to 'billingDataPairs' in the example above. Side
input pairs with all the main input windows that exactly match or
completely fall within the side input window. Common use case is a side
input defined in default global window and it matches all the main input
windows.
Thank you for your help!
On Tue, May 15, 2018 at 20:05 Lukasz Cwik wrote:
> Yes, that is correct.
>
> On Tue, May 15, 2018 at 4:40 PM Harshvardhan Agrawal <
> harshvardhan.ag...@gmail.com> wrote:
>
>> Got it.
>>
>> Since I am not applying any windowing strategy to the side input, does
>> beam au
Yes, that is correct.
On Tue, May 15, 2018 at 4:40 PM Harshvardhan Agrawal <
harshvardhan.ag...@gmail.com> wrote:
> Got it.
>
> Since I am not applying any windowing strategy to the side input, does
> beam automatically pickup the windowing strategy for the side inputs from
> the main input? By t
Got it.
Since I am not applying any windowing strategy to the side input, does beam
automatically pickup the windowing strategy for the side inputs from the
main input? By that I mean the scope of the side input would be a per
window one and it would be different for every window. Is that correct?
Using deduplicate + side inputs will allow you to have a consistent view of
the account information for the entire window which can be nice since it
gives consistent processing semantics but using a simple in memory cache to
reduce the amount of lookups will likely be much easier to debug and
simpl
Thanks Raghu!
Lukasz,
Do you think lookups would be a better option than side inputs in my case?
On Tue, May 15, 2018 at 16:33 Raghu Angadi wrote:
> It should work. I think you need apply Distinct before looking up account
> info :
> billingDataPairs.apply(Keys.create()).apply(Distinct.create
It should work. I think you need apply Distinct before looking up account
info :
billingDataPairs.apply(Keys.create()).apply(Distinct.create()).apply("LookupAccounts",
...).
Note that all of the accounts are stored in single in-memory map. It should
be small enough for that.
On Tue, May 15, 2018 a
Well ideally, I actually made the example a little easy. In the actual
example I have multiple reference datasets. Say, I have a tuple of Account
and Product as the key. The reason we don’t do the lookup in the DoFn
directly is that we don’t want to lookup the data for the same account or
same prod
Is there a reason you don't want to read the accounting information within
the DoFn directly from the datastore, it seems like that would be your
simplest approach.
On Tue, May 15, 2018 at 12:43 PM Harshvardhan Agrawal <
harshvardhan.ag...@gmail.com> wrote:
> Hi,
>
> No we don’t receive any such
Hi,
No we don’t receive any such information from Kafka.
The account information in the external store does change. Every time we
have a change in the account information we will have to recompute all the
billing info. Our source systems will make sure that they publish messages
for those account
For each BillingModel you receive over Kafka, how "fresh" should the
account information be?
Does the account information in the external store change?
On Tue, May 15, 2018 at 11:22 AM Harshvardhan Agrawal <
harshvardhan.ag...@gmail.com> wrote:
> Hi,
>
> We have certain billing data that arrives
Hi,
We have certain billing data that arrives to us from Kafka. The billing
data is in json and it contains an account ID. In order for us to generate
the final report we need to use some account data associated with the
account id and is stored in an external database.
It is possible that we get
Hi,
I am currently working on building a pipeline using Apache Beam on top of
Flink. One of the things we observed was that the DAG that gets created on
the Flink UI doesn't contain the names we gave to the PTransforms. As a
result of that it becomes hard to debug the pipeline or even understand t
Hi Guys,
I am currently in the process of developing a pipeline using Apache Beam
with Flink as an execution engine. As a part of the process I read data
from Kafka and perform a bunch of transformations that involve joins,
aggregations as well as lookups to an external DB.
The idea is that we wa
Also,
3. with spark 2.x structured streaming , if we want to switch across
different modes like from micro-batching to continuous streaming mode, how
it can be done while using Beam?
These are some of the initial questions which I am not able to understand
currently.
Regards,
Chandan
On Tue, M
Hi Everyone,
I have just started exploring and understanding Apache Beam for new project
in my firm.
In particular, we have to take decision whether to implement our product
over spark streaming (as spark batch is already in our eco system) or
should we use Beam over spark runner to have future lib
18 matches
Mail list logo