Hi,
I started Spark Streaming job with 96 executors which reads from 96 Kafka
partitions and applies mapWithState on the incoming DStream.
Why would it cache only 77 partitions? Do I have to allocate more memory?
Currently each executor gets 10 GB and it is not clear why it can't cache all
96
Good day everyone!
Have you tried to de-duplicated records based on Avro generated classes? These
classes extend SpecificRecord which has equals and hashCode implementation,
although when i try to use .distinct on my PairRDD (both key and value are Avro
classes), it eliminates records which are