Hi Gil,
Sorry for the late reply and thanks for raising this question. The file
listing logic in HadoopFsRelation is intentionally made different from
Hadoop FileInputFormat. Here are the reasons:
1. Efficiency: when computing RDD partitions,
FileInputFormat.listStatus() is called on the dri
I was just debugging an intermittent timeout failure in the testsuite CliSuite.scala
I traced it down to a timing window in the Scala library class sys.process.ProcessImpl.scala. Sometimes the input pipe to a process becomes 'None' before the process has had a chance to read any input at all.
Hi Cheng,
Thanks a lot for responding to it.
I still miss some points in the Efficiency and i would be very thankful if
you will expand it little bit more.
As i see it, both HadoopFSRelation and FileInputFormat.listStatus perform
lists and eventually both calls to FileSystem.listStatus method.
F
Hi!
I’d like to complete action (store / print smth) inside of transformation (map
or mapPartitions). This approach has some flaws, but there is a question. Might
it happen that Spark will optimise (RDD or DataFrame) processing so that my
mapPartitions simply won’t happen?
--
Eugene Morozov
fa
Hi, This email is sent to both dev and user list, just want to see if someone
familiar with Spark/Maven build procedure can provide any help.
I am building Spark 1.2.2 with the following command:
mvn -Phadoop-2.2 -Dhadoop.version=2.2.0 -Phive -Phive-0.12.0
The spark-assembly-1.2.2-hadoop2.2.0.jar
Thanks for finding this. Should we just switch to Java's process library
for now?
On Wed, Aug 12, 2015 at 1:30 AM, Tim Preece wrote:
> I was just debugging an intermittent timeout failure in the testsuite
> CliSuite.scala
>
> I traced it down to a timing window in the Scala library class
> sys.
yikes.
Was this a one-time thing? Or does it happen consistently? can you turn
on debug logging for o.a.s.scheduler (dunno if it will help, but maybe ...)
On Tue, Aug 11, 2015 at 8:59 AM, Akhil Das
wrote:
> Hi
>
> My Spark job (running in local[*] with spark 1.4.1) reads data from a
> thrift
I've help to build a conda installable spark packages in the past. You can
an older recipe here:
https://github.com/conda/conda-recipes/tree/master/spark
And I've been updating packages here:
https://anaconda.org/anaconda-cluster/spark
`conda install -c anaconda-cluster spark`
The above shoul