Thanks Shixiong, read in documentation as well that duplicates might exist because of task retries.
On Mon, 1 Apr 2019 at 9:43 PM, Shixiong(Ryan) Zhu <shixi...@databricks.com> wrote: > The Kafka source doesn’t support transaction. You may see partial data or > duplicated data if a Spark task fails. > > On Wed, Mar 27, 2019 at 1:15 AM hemant singh <hemant2...@gmail.com> wrote: > >> We are using spark batch to write Dataframe to Kafka topic. The spark >> write function with write.format(source = Kafka). >> Does spark provide similar guarantee like it provides with saving >> dataframe to disk; that partial data is not written to Kafka i.e. full >> dataframe is saved or if job fails no data is written to Kafka topic. >> >> Thanks. >> > -- > > Best Regards, > Ryan >