As I tested in local mode,with or without partitionBy, the throughputs are quit different, I saw partitionBy dramatically low down the throughput. That surprised me, as I thought partitionBy may cause higher latency but should do nothing with throughput. Is that normal or I should check my test?