Thanks Shixiong, read in documentation as well that duplicates might exist
because of task retries.
On Mon, 1 Apr 2019 at 9:43 PM, Shixiong(Ryan) Zhu
wrote:
> The Kafka source doesn’t support transaction. You may see partial data or
> duplicated data if a Spark task fails.
>
> On Wed, Mar 27, 20
The Kafka source doesn’t support transaction. You may see partial data or
duplicated data if a Spark task fails.
On Wed, Mar 27, 2019 at 1:15 AM hemant singh wrote:
> We are using spark batch to write Dataframe to Kafka topic. The spark
> write function with write.format(source = Kafka).
> Does
We are using spark batch to write Dataframe to Kafka topic. The spark write
function with write.format(source = Kafka).
Does spark provide similar guarantee like it provides with saving dataframe
to disk; that partial data is not written to Kafka i.e. full dataframe is
saved or if job fails no data