date:20150812

Re: PySpark on PyPi

2015-08-12 Thread quasiben

I've help to build a conda installable spark packages in the past. You can an older recipe here: https://github.com/conda/conda-recipes/tree/master/spark And I've been updating packages here: https://anaconda.org/anaconda-cluster/spark `conda install -c anaconda-cluster spark` The above shoul

Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-12 Thread Imran Rashid

yikes. Was this a one-time thing? Or does it happen consistently? can you turn on debug logging for o.a.s.scheduler (dunno if it will help, but maybe ...) On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das wrote: > Hi > > My Spark job (running in local[*] with spark 1.4.1) reads data from a > thrift

Re: Intermittent timeout failure org/apache/spark/sql/hive/thriftserver/CliSuite.scala

2015-08-12 Thread Reynold Xin

Thanks for finding this. Should we just switch to Java's process library for now? On Wed, Aug 12, 2015 at 1:30 AM, Tim Preece wrote: > I was just debugging an intermittent timeout failure in the testsuite > CliSuite.scala > > I traced it down to a timing window in the Scala library class > sys.

Spark 1.2.2 build problem with Hive 0.12, bringing in wrong version of avro-mapred

2015-08-12 Thread java8964

Hi, This email is sent to both dev and user list, just want to see if someone familiar with Spark/Maven build procedure can provide any help. I am building Spark 1.2.2 with the following command: mvn -Phadoop-2.2 -Dhadoop.version=2.2.0 -Phive -Phive-0.12.0 The spark-assembly-1.2.2-hadoop2.2.0.jar

Does Spark optimization might miss to run transformation?

2015-08-12 Thread Eugene Morozov

Hi! I’d like to complete action (store / print smth) inside of transformation (map or mapPartitions). This approach has some flaws, but there is a question. Might it happen that Spark will optimise (RDD or DataFrame) processing so that my mapPartitions simply won’t happen? -- Eugene Morozov fa

Re: possible issues with listing objects in the HadoopFSrelation

2015-08-12 Thread Gil Vernik

Hi Cheng, Thanks a lot for responding to it. I still miss some points in the Efficiency and i would be very thankful if you will expand it little bit more. As i see it, both HadoopFSRelation and FileInputFormat.listStatus perform lists and eventually both calls to FileSystem.listStatus method. F

Intermittent timeout failure org/apache/spark/sql/hive/thriftserver/CliSuite.scala

2015-08-12 Thread Tim Preece

I was just debugging an intermittent timeout failure in the testsuite CliSuite.scala I traced it down to a timing window in the Scala library class sys.process.ProcessImpl.scala. Sometimes the input pipe to a process becomes 'None' before the process has had a chance to read any input at all.

Re: possible issues with listing objects in the HadoopFSrelation

2015-08-12 Thread Cheng Lian

Hi Gil, Sorry for the late reply and thanks for raising this question. The file listing logic in HadoopFsRelation is intentionally made different from Hadoop FileInputFormat. Here are the reasons: 1. Efficiency: when computing RDD partitions, FileInputFormat.listStatus() is called on the dri

Re: PySpark on PyPi

Re: Spark runs into an Infinite loop even if the tasks are completed successfully

Re: Intermittent timeout failure org/apache/spark/sql/hive/thriftserver/CliSuite.scala

Spark 1.2.2 build problem with Hive 0.12, bringing in wrong version of avro-mapred

Does Spark optimization might miss to run transformation?

Re: possible issues with listing objects in the HadoopFSrelation

Intermittent timeout failure org/apache/spark/sql/hive/thriftserver/CliSuite.scala

Re: possible issues with listing objects in the HadoopFSrelation

8 matches

Site Navigation

Mail list logo

Footer information