possible cause: same TeraGen job sometimes slow and sometimes fast

2017-10-18 Thread Gil Vernik
I performed a series of TeraGen jobs via spark-submit ( each job generated equal size dataset into different S3 buckets ) I noticed that some jobs were fast and some were slow. Slow jobs always had many log prints like DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1.0, runningTasks: 1 ( o

Spark-13979: issues with hadoopConf

2016-07-02 Thread Gil Vernik
Hello, Any ideas about this one https://issues.apache.org/jira/browse/SPARK-13979 ? Does others see the same issues? Thanks Gil.

new object store driver for Spark

2016-03-22 Thread Gil Vernik
We recently released an object store connector for Spark. https://github.com/SparkTC/stocator Currently this connector contains driver for the Swift based object store ( like SoftLayer or any other Swift cluster ), but it can easily support additional object stores. There is a pending patch to s

problems with my code that stimulate task failure - task is not resubmitted

2016-03-05 Thread Gil Vernik
I have some code that stimulates task failure in the speculative mode. The code i compile to jar and execute with ./bin/spark-submit --class com.test.SparkTest --jars --driver-memory 2g --executor-memory 1g --master local[4] --conf spark.speculation=true --conf spark.task.maxFailures=4 SparkTes

disable log4j for spark-shell

2014-08-03 Thread Gil Vernik
Hi, I would like to run spark-shell without any INFO messages printed. To achieve this I edited /conf/log4j.properties and added line log4j.rootLogger=OFF that suppose to disable all logging. However, when I run ./spark-shell I see the message 4/08/03 16:02:15 INFO SecurityManager: Using Spar