Re: Speeding up CoGroup in batch job

2020-09-17 Thread Ken Krugler
Hi Robert, Thanks for the input. I did increase the amount of managed memory, and confirmed that both SSDs (on each slave) are being used for temp data. I haven’t been able to figure out why the server CPU usage is low, but I did notice that it fluctuated from very low (10%) on up to 95+%, with

Re: Speeding up CoGroup in batch job

2020-09-11 Thread Robert Metzger
Hi Ken, Some random ideas that pop up in my head: - make sure you use data types that are efficient to serialize, and cheap to compare (ideally use primitive types in TupleN or POJOs) - Maybe try the TableAPI batch support (if you have time to experiment). - optimize memory usage on the TaskManage