I try to train a big model.
I have 40 million instances and 50 million feature set, and it is sparse.
I am using 40 executors with 20 GB each + driver with 40 GB. The number of
data partitions is 5000, the treeAggregate depth is 4, the
spark.kryoserializer.buffer.max is 2016m, the spark.driver.maxR
I'm working on large scale logistic regression for ctr prediction, and when
user interaction for feature engineer, driver OOM. For detail, I interact
among userid(one-hot, 30w dimension, sparse) and base features(60
dimensions, dense), driver memory is set to 40g.
So, I try to debug from remote, a