Hello all,
I am running the Spark recommendation algorithm in MLlib and I have been
studying its output with various model configurations. Ideally I would like to
be able to run one job that trains the recommendation model with many different
configurations to try to optimize for performance. A sample code in python is
copied below.
The issue I have is that each new model which is trained caches a set of RDDs
and eventually the executors run out of memory. Is there any way in Pyspark to
unpersist() these RDDs after each iteration? The names of the RDDs which I
gather from the UI is:
itemInBlocks
itemOutBlocks
Products
ratingBlocks
userInBlocks
userOutBlocks
users
I am using Spark 1.3. Thank you for any help!
Regards,
Jonathan
data_train, data_cv, data_test = data.randomSplit([99,1,1], 2)
functions = [rating] #defined elsewhere
ranks = [10,20]
iterations = [10,20]
lambdas = [0.01,0.1]
alphas = [1.0,50.0]
results = []
for ratingFunction, rank, numIterations, m_lambda, m_alpha in
itertools.product( functions, ranks, iterations, lambdas, alphas ):
#train model
ratings_train = data_train.map(lambda l: Rating( l.user, l.product,
ratingFunction(l) ) )
model = ALS.trainImplicit( ratings_train, rank, numIterations,
lambda_=float(m_lambda), alpha=float(m_alpha) )
#test performance on CV data
ratings_cv = data_cv.map(lambda l: Rating( l.uesr, l.product,
ratingFunction(l) ) )
auc = areaUnderCurve( ratings_cv, model.predictAll )
#save results
result = ",".join(str(l) for l in
[ratingFunction.__name__,rank,numIterations,m_lambda,m_alpha,auc])
results.append(result)
________________________________________________________
The information contained in this e-mail is confidential and/or proprietary to
Capital One and/or its affiliates and may only be used solely in performance of
work or services for Capital One. The information transmitted herewith is
intended only for use by the individual or entity to which it is addressed. If
the reader of this message is not the intended recipient, you are hereby
notified that any review, retransmission, dissemination, distribution, copying
or other use of, or taking of any action in reliance upon this information is
strictly prohibited. If you have received this communication in error, please
contact the sender and delete the material from your computer.