Hi Minglei, 1. Not sure if you are asking for a specific problem, but IMO the main challenge is that there are many different ways (and meanings) to join two streams. The required semantics always depend on the concrete use case. If you want to perform an simple equality join with SQL semantics, you typically need to fully materialize both streams which is often too expensive. Often, you want to add some kind of time-based predicate which allows to evaluate the join more efficiently (both in terms of state and computation). Flink's DataStream API adds a new "interval" join with the upcoming 1.6.0 release. Flink's Table API offers a few more built-in joins.
2. Flink provides exactly-once consistency for application state using checkpoints and resettable sources. Of course checkpointing dose not come for free and can be quite expensive, but typically the difference to at-least-once checkpointing is not too large. The real challenge though is end-to-end exactly-once which requires sophisticated sink connectors. Again, the complexity depends on the concrete use case. The stricter the guarantees, the more expensive the application becomes. Best, Fabian 2018-07-30 11:59 GMT+02:00 zhangminglei <18717838...@163.com>: > Hi, I would like to ask 2 questions. > > 1. Currently, what is the problem of flink join ? And what is the > essential difference between batch join and stream join ? > 2. What are the shortcomings of current exactly-once ? > > Thanks > minglei. >