Hi,
I am running a steaming job with 4 executors and 16 cores so that each
executor has two cores to work with. The input Kafka topic has 4 partitions.
With this given configuration I was expecting MapWithStateRDD to be evenly
distributed across all executors, how ever I see that it uses only two
Hi,
I have two applications : App1 and App2.
On a single cluster I have to spawn 5 instances os App1 and 1 instance of
App2.
What would be the best way to send data from the 5 App1 instances to the
single App2 instance ?
Right now I am using Kafka to send data from one spark application to the
s
Hi,
I am relatively new to Spark and am using updateStateByKey() operation to
maintain state in my Spark Streaming application. The input data is coming
through a Kafka topic.
1. I want to understand how are DStreams partitioned?
2. How does the partitioning work with mapWithState() or
u