… well yes and no:

  *   If the second table is a small table used for enrichment, you can also 
mark it as broadcast table, but I don’t know how to do that on table API
  *   If the second table has significant data and significant update, the you 
need to configure watermarking/event time semantics on the second table as well
  *   The logic is this:
     *   Your join operator only generates output windows once the event time 
passes by the end of the time window
     *   The event time/watermark time of you join operator is the minimum 
watermark time of all inputs
     *   Because your second table does not emit watermark, it’s watermark time 
 remains at Long.MinValue, hence also the operator time stays there
  *   Another way to make progress is, in case your second table does not 
update watermarks/data often enough, to mark the source with an idle watermark 
generator in which case it is rendered as ‘timeless’ and does not prevent time 
progress in your join operator
     *   Again, not sure how to configure this

Ancora cari saluti

Thias





From: Eugenio Marotti <ing.eugenio.maro...@gmail.com>
Sent: Thursday, September 21, 2023 2:35 PM
To: Schwalbe Matthias <matthias.schwa...@viseca.ch>
Cc: user@flink.apache.org
Subject: Re: Window aggregation on two joined table

Hi Matthias,

No the second table doesn’t have an event time and a watermark specified. In 
order for the window to work do I need a watermark also on the second table?

Thanks
Eugenio


Il giorno 21 set 2023, alle ore 13:45, Schwalbe Matthias 
<matthias.schwa...@viseca.ch<mailto:matthias.schwa...@viseca.ch>> ha scritto:

Ciao Eugenio,

I might be mistaken, but did you specify the event time for the second table 
like you did for the first table (watermark(….))?
I am no so acquainted with table api (doing more straight data stream api 
work), but I assume this join and windowing should be by event time.

What do you think?

Cari saluti

Thias


From: Eugenio Marotti 
<ing.eugenio.maro...@gmail.com<mailto:ing.eugenio.maro...@gmail.com>>
Sent: Thursday, September 21, 2023 8:56 AM
To: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Window aggregation on two joined table

Hi,

I’m trying to execute a window aggregation on two joined table from two Kafka 
topics (upsert fashion), but I get no output. Here’s the code I’m using:

This is the first table from Kafka with an event time watermark on ‘data_fine’ 
attribute:


final TableDescriptor phasesDurationsTableDescriptor = 
TableDescriptor.forConnector("upsert-kafka")
       .schema(Schema.newBuilder()
             .column("id_fascicolo", DataTypes.BIGINT().notNull())
             .column("nrg", DataTypes.STRING())
             .column("giudice", DataTypes.STRING())
             .column("oggetto", DataTypes.STRING())
             .column("codice_oggetto", DataTypes.STRING())
             .column("ufficio", DataTypes.STRING())
             .column("sezione", DataTypes.STRING())
             .column("fase_completata", DataTypes.BOOLEAN())
             .column("fase", DataTypes.STRING().notNull())
             .column("durata", DataTypes.BIGINT())
             .column("data_inizio", DataTypes.TIMESTAMP_LTZ(3))
             .column("data_fine", DataTypes.TIMESTAMP_LTZ(3))
             .watermark("data_inizio", "data_inizio - INTERVAL '1' SECOND")
             .primaryKey("id_fascicolo", "fase")
             .build())
       .option(KafkaConnectorOptions.TOPIC, 
List.of("sicid.processor.phases-durations"))
       .option(KafkaConnectorOptions.PROPS_BOOTSTRAP_SERVERS, KAFKA_HOST)
       .option(KafkaConnectorOptions.KEY_FORMAT, "json")
       .option(KafkaConnectorOptions.VALUE_FORMAT, "json")
       .build();
tEnv.createTable("PhasesDurations_Kafka", phasesDurationsTableDescriptor);
Table phasesDurationsTable = tEnv.from("PhasesDurations_Kafka”);

Here’s the second table:

final TableDescriptor averageJudgeByPhaseReportTableDescriptor = 
TableDescriptor.forConnector("upsert-kafka")
       .schema(Schema.newBuilder()
             .column("giudice", DataTypes.STRING().notNull())
             .column("fase", DataTypes.STRING().notNull())
             .column("media_mobile", DataTypes.BIGINT())
             .primaryKey("giudice", "fase")
             .build())
       .option(KafkaConnectorOptions.TOPIC, 
List.of("sicid.processor.average-judge-by-phase-report"))
       .option(KafkaConnectorOptions.PROPS_BOOTSTRAP_SERVERS, KAFKA_HOST)
       .option(KafkaConnectorOptions.KEY_FORMAT, "json")
       .option(KafkaConnectorOptions.VALUE_FORMAT, "json")
       .option(KafkaConnectorOptions.PROPS_GROUP_ID, 
"average-judge-by-phase-report")
       .build();
tEnv.createTable("AverageJudgeByPhaseReport_Kafka", 
averageJudgeByPhaseReportTableDescriptor);
Table averageJudgeByPhaseReportTable = 
tEnv.from("AverageJudgeByPhaseReport_Kafka");

Table renamedAverageJudgeByPhaseReportTable = averageJudgeByPhaseReportTable
       .select(
             $("giudice").as("giudice_media"),
             $("fase").as("fase_media"),
             $("media_mobile")
       );



And here’s the code I’m experimenting with:

phasesDurationsTable
       .join(renamedAverageJudgeByPhaseReportTable)
       .where($("giudice").isEqual($("giudice_media")))
       .window(Tumble.over(lit(30).days()).on($("data_inizio")).as("w"))
       .groupBy(
             $("giudice"),
             $("w")
       )
       .select(
             $("giudice")
       )
       .execute().print();



Am I doing something wrong?
Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.

Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.

Reply via email to