I figured out why. We are not persisting the data at the end of
.load(). Thus, every operation like count() is going back to Kafka
for the data again.
On Fri, Mar 1, 2019 at 10:10 AM Kristopher Kane wrote:
>
> We are using the assign API to do batch work with Spark and Kafka.
> What I'm seeing
We are using the assign API to do batch work with Spark and Kafka.
What I'm seeing is the Spark executor work happening in the back
ground and constantly polling the same data over and over until the
main thread commits the offsets.
Is the below a blocking operation?
Dataset df = spark.read().f