Hi Team,
I'm using keyBy to maintain field-based ordering across tasks, but I'm
encountering data skewness among the task slots. I have 96 task slots, and
I'm sending data with 500 distinct keys used in keyBy. While reviewing the
Flink UI, I noticed that a few task slots are underutilized while ot
Hi Users,
Any clues on configurable regex path for FilesSource/Filesystem connector
for stream APIs is appreciated.
Regards,
Amogh.
On Thu, 15 Aug, 2024, 11:18 amogh joshi, wrote:
> Hi,
>
> I am building a pretty straightforward processing pipeline as described
> below, using *DataStream* *AP
Hi Christian,
As I didn’t find a well-documented reference to MultipleInputStreamOperator,
let me sketch it (it’s Scala code, easy enough to be transformed to Java):
* Implements 4 inputs , the second is a broadcast stream
* ACHEP, CEIEP, V1EP, AMAP are typedef of the respective input typ
Ok. This makes sense.
I will try it out.
One more question in terms of performance which of the two connector would
scan the existing collection faster. Say existing collection has 10 million
records and in terms of storage size it is 1GB.
Thanks
Sachin
On Fri, 16 Aug 2024 at 4:09 PM, Jiabao Sun
Yes, you can use flink-connector-mongodb-cdc to process both existing and new
data.
See
https://nightlies.apache.org/flink/flink-cdc-docs-release-3.1/docs/connectors/flink-sources/mongodb-cdc/#startup-reading-position
Best,
Jiabao
On 2024/08/16 10:26:55 Sachin Mittal wrote:
> Hi Jiabao,
> My u
Hi Jiabao,
My usecase is that when I start my flink job it should load and process all
the existing data in a collection and also wait and process any new data
that comes along the way.
As I notice that flink-connector-mongodb would process all the existing
data, so do I still need this connector o
Hi Sachin,
flink-connector-mongodb supports batch reading and writing to MongoDB, similar
to flink-connector-jdbc, while flink-connector-mongodb-cdc supports streaming
MongoDB changes.
If you need to stream MongoDB changes, you should use
flink-connector-mongodb-cdc.
You can refer to the fol
Hi,
I have a scenario where I load a collection from MongoDB inside Flink using
flink-connector-mongodb.
What I additionally want is any future changes (insert/updates) to that
collection is also streamed inside my Flink Job.
What I was thinking of is to use a CDC connector to stream data to my Fl
Context
We have recently upgraded from Flink 1.13.6 to Flink 1.19. We consume data
from ~ 40k Kafka topic partitions in some environments. We are using
aligned checkpoints. We set state.storage.fs.memory-threshold: 500kb.
Problem
At the point when the state for operator using
topic-partition-off