I've help to build a conda installable spark packages in the past. You can
an older recipe here:
https://github.com/conda/conda-recipes/tree/master/spark
And I've been updating packages here:
https://anaconda.org/anaconda-cluster/spark
`conda install -c anaconda-cluster spark`
The above shoul
yikes.
Was this a one-time thing? Or does it happen consistently? can you turn
on debug logging for o.a.s.scheduler (dunno if it will help, but maybe ...)
On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das
wrote:
> Hi
>
> My Spark job (running in local[*] with spark 1.4.1) reads data from a
> thrift
Thanks for finding this. Should we just switch to Java's process library
for now?
On Wed, Aug 12, 2015 at 1:30 AM, Tim Preece wrote:
> I was just debugging an intermittent timeout failure in the testsuite
> CliSuite.scala
>
> I traced it down to a timing window in the Scala library class
> sys.
Hi, This email is sent to both dev and user list, just want to see if someone
familiar with Spark/Maven build procedure can provide any help.
I am building Spark 1.2.2 with the following command:
mvn -Phadoop-2.2 -Dhadoop.version=2.2.0 -Phive -Phive-0.12.0
The spark-assembly-1.2.2-hadoop2.2.0.jar
Hi!
I’d like to complete action (store / print smth) inside of transformation (map
or mapPartitions). This approach has some flaws, but there is a question. Might
it happen that Spark will optimise (RDD or DataFrame) processing so that my
mapPartitions simply won’t happen?
--
Eugene Morozov
fa
Hi Cheng,
Thanks a lot for responding to it.
I still miss some points in the Efficiency and i would be very thankful if
you will expand it little bit more.
As i see it, both HadoopFSRelation and FileInputFormat.listStatus perform
lists and eventually both calls to FileSystem.listStatus method.
F
I was just debugging an intermittent timeout failure in the testsuite CliSuite.scala
I traced it down to a timing window in the Scala library class sys.process.ProcessImpl.scala. Sometimes the input pipe to a process becomes 'None' before the process has had a chance to read any input at all.
Hi Gil,
Sorry for the late reply and thanks for raising this question. The file
listing logic in HadoopFsRelation is intentionally made different from
Hadoop FileInputFormat. Here are the reasons:
1. Efficiency: when computing RDD partitions,
FileInputFormat.listStatus() is called on the dri