It is just a goal… however I would not tune the no of regions or region size yet. Simply specify gc algorithm and max heap size. Try to tune other options only if there is a need, only one at at time (otherwise it is difficult to determine cause/effects) and have a performance testing framework in place to be able to measure differences. Do you need those large heaps in Spark? Why not split the tasks further to have more tasks with less memory ? I understand that each job is different and there can be reasons for it, but I often try to just use the defaults and then tune individual options. I try to also avoid certain extreme values (of course there are cases when they are needed). Especially often when upgrading from one Spark version to another then I find out it is then often better to work with a Spark job with default settings, because Spark itself has improved/changed how it works. To reduce the needed heap you can try to increase the number of tasks ( see here https://spark.apache.org/docs/latest/configuration.html) spark.executor.cores (to a few) and spark.sql.shuffle.partitions (default is 200 - you can try how much it brings to change it to 400 etc). and reduce spark.executor.memory Am 10.12.2023 um 02:33 schrieb Faiz Halde <haldef...@gmail.com>:
|
- Spark on Java 17 Faiz Halde
- RE: Spark on Java 17 Luca Canali
- Re: Spark on Java 17 Faiz Halde
- Re: Spark on Java 17 Jörn Franke
- Re: Spark on Java 17 Jörn Franke