Re: Long-Running Spark application doesn't clean old shuffle data correctly

2019-07-20 Thread Aayush Ranaut
This is the job of ContextCleaner. There are few a property that you can tweak to see if that helps:  spark.cleaner.periodicGC.interval spark.cleaner.referenceTracking spark.cleaner.referenceTracking.blocking.shuffle Regards Prathmesh Ranaut > On Jul 21, 2019, at 11:36 AM, Prathmesh Ranaut

Long-Running Spark application doesn't clean old shuffle data correctly

2019-07-20 Thread Alex Landa
Hi, We are running a long running Spark application ( which executes lots of quick jobs using our scheduler ) on Spark stand-alone cluster 2.4.0. We see that old shuffle files ( a week old for example ) are not deleted during the execution of the application, which leads to out of disk space error

How to get loss per iteration in Spark MultilayerPerceptronClassificationModel?

2019-07-20 Thread Shamshad Ansari
Hello All, Apache Spark ML's LogisticRegressionModel has summary().objectHistory() method. Is there any such method available for MultilayerPerceptronClassificationModel? If not, what’s a way to get loss per iteration? Any help is greatly appreaciated. Thank you.

Re: Spark SaveMode

2019-07-20 Thread Mich Talebzadeh
JDBC read from Oracle table requires Oracle jdbc driver ojdbc6.jar or higher. ojdbc6.jar works for 11 and 12c added as --jars /ojdbc6.jar Example with parallel read (4 connections) to Oracle with ID being your PK in Oracle table var _ORACLEserver= "jdbc:oracle:thin:@rhes564:1521:mydb12" var _use