Hi everybody, I would like to hear your thoughts on which technique would be used in Apache Beam for the following problem:
*Problem definition*: I have two streams of data, one with pageviews of users, and another with requests of the users. They share the key session_id which describes the users session, but each have other additional data. The goal is to append the number of pageviews in a session to the requests of that session. That means, I want to have a stream of data that has every request together with the number of pageviews before the request. It suffices to have the pageviews of lets say the last 5 minutes, and it is not important to have all the pageviews, if there is late data. There should only be low latency on receiving the requests. What would be the appropriate technique? Side inputs? CoGroupByKey? Here are my first attempts: https://stackoverflow.com/questions/65625961/windowed-joins-in-apache-beam Kind regards, Hendrik Gruß ---- Hendrik Gruß Data Engineer Diginet Gmbh & Co. KG -- Pixum und artboxONE sind geschützte Marken der Diginet GmbH & Co. KG - Industriestr.161 - 50999 Köln Fon: +49 (2236) 886-0 - Fax: +49 (2236) 88 66 99 Sitz Köln, HRA 25531, Umsatzsteuer-ID: DE-209867661, Komplementärin: Diginet Management GmbH, Sitz Köln, HRB 69766, Geschäftsführer: Daniel Attallah, Oliver Thomsen ---------------------------------------------------- Pixum hat die beste Bildqualität. Ausgezeichnet von der Stiftung Warentest - Jetzt mehr erfahren: www.pixum.de/testsiege <http://www.pixum.de/testsiege> ---------------------------------------------------- Außergewöhnliche Kunstwerke - modern und bezahlbar: Jetzt artboxONE entdecken: <http://www.artboxone.de/>www.artboxone.de <http://www.artboxone.de>