Hi everybody,

I would like to hear your thoughts on which technique would be used in
Apache Beam for the following problem:

*Problem definition*:

I have two streams of data, one with pageviews of users, and another with
requests of the users. They share the key session_id which describes the
users session, but each have other additional data.

The goal is to append the number of pageviews in a session to the requests
of that session. That means, I want to have a stream of data that has every
request together with the number of pageviews before the request. It
suffices to have the pageviews of lets say the last 5 minutes, and it is
not important to have all the pageviews, if there is late data. There
should only be low latency on receiving the requests.

What would be the appropriate technique? Side inputs? CoGroupByKey? Here
are my first attempts:
https://stackoverflow.com/questions/65625961/windowed-joins-in-apache-beam
Kind regards,
Hendrik Gruß

----
Hendrik Gruß
Data Engineer
Diginet Gmbh & Co. KG

-- 


Pixum und artboxONE sind geschützte Marken der Diginet GmbH & Co. KG - 
Industriestr.161 - 50999 Köln

Fon: +49 (2236) 886-0 - Fax: +49 (2236) 88 
66 99 Sitz Köln, HRA 25531, Umsatzsteuer-ID: DE-209867661,

Komplementärin: 
Diginet Management GmbH, Sitz Köln, HRB 69766, Geschäftsführer: Daniel 
Attallah, Oliver Thomsen

----------------------------------------------------

Pixum hat die beste 
Bildqualität. Ausgezeichnet von der Stiftung Warentest -  Jetzt mehr 
erfahren: www.pixum.de/testsiege <http://www.pixum.de/testsiege>

----------------------------------------------------

Außergewöhnliche 
Kunstwerke - modern und bezahlbar: Jetzt artboxONE entdecken:  
<http://www.artboxone.de/>www.artboxone.de <http://www.artboxone.de>

 

Reply via email to