Hi all,
I have the following spark configuration
spark.app.name=Test
spark.cassandra.connection.host=127.0.0.1
spark.cassandra.connection.keep_alive_ms=5000
spark.cassandra.connection.port=10000
spark.cassandra.connection.timeout_ms=30000
spark.cleaner.ttl=3600
spark.default.parallelism=4
spark.master=local[2]
spark.ui.enabled=false
spark.ui.showConsoleProgress=false
Because I am setting spark.default.parallelism to 4, I was expecting
only 4 spark partitions. But it looks like it is not the case
When I do the following
df.foreachPartition { partition =>
val groupedPartition = partition.toList.grouped(3).toList
println("Grouped partition " + groupedPartition)
}
There are too many print statements with empty list at the top. Only
the relevant partitions are at the bottom. Is there a way to control
number of partitions?
Regards,
Noorul
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]