Hi,

I am using the flink version 1.7.1 and flink-mongodb-sql-connector version 
1.0.1-1.17.

Below is the data pipeline flow.

Source 1 (Kafka topic using Kafka connector) -> Window Aggregation  (legacy 
grouped window aggregation) -> Sink (Kafka topic using upsert-kafka connector) 
-> Sink(Mongdb collection).


I noticed a couple of issues with the mongodb-sink connector.

Issue 1: Mongo sink delays event write as soon as a tombstone message is 
received.

When a new key comes in, the aggregation is made and the result is written 
successfully to kafka topic and also to mongo db immediately.

Another new key comes in, the aggregation is made, the result is available in 
mongodb.

But when a key repeats for the first time, the aggregation is made, the results 
are written to kafka topic. You get 1 delete record and 1 latest record for the 
key in the topic.  The data for the key is deleted in the mongodb but, the 
latest record for the key is not inserted to mongodb.

When a new key or another key comes in, the previous record latest key is 
inserted to mongo db.

The same pattern exists for subsequent records.

There is always a delay of one event as soon as a tombstone record is found by 
the connector.

Issue 2: Mongo sink waits for new record to write previous records.

I have a upsert-kafka topic filled that has already some events.
I start a new upsert-kafka to mongo db sink job.

I expect all the data from the topic to be loaded to mongodb right away.
But instead, only the first record is written to mongo db.
The rest of the records don’t arrive in mongodb  until a  new event is written 
to kafka topic.
The new event that was written is delayed until the next event arrives.

I am not able to understand this behaviour.
This doesn’t feel like an expected behaviour.
Can someone please advice if I am missing something or an issue exists.

Regards,
Harish.





Reply via email to