Hi, I am running a Spark Streaming application which reads from a Kinesis stream and processes data. The application is run on EMR. Recently, we tried moving from Java's inbuilt serializer to Kryo serializer. To quantify the performance improvement, I tried pumping 30000 input records to the application over a period of 5 minutes. Based on the task deserialization time, I have the following data. Using Java serializer- Median 3 ms, Mean 8.21 ms Using Kryo serializer- Median 4 ms, Mean 9.64 ms
Here, we see that Kryo serializer is slower than Java serializer. Looking for some advice regarding items that I might have missed taking into account. Please let me know if more information is needed. Thanks, Rajkiran