2016-04-24 13:38 GMT+02:00 Stefan Falk :
> sc.parallelize(cfile.toString()
> .split("\n"), 1)
Try `sc.textFile(pathToFile)` instead.
>java.io.IOException: Broken pipe
>at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>at sun.nio.ch.SocketDispatcher.write(SocketD
2016-03-29 11:25 GMT+02:00 Robert Schmidtke :
> Is there a meaningful way for me to find out what exactly is going wrong
> here? Any help and hints are greatly appreciated!
Maybe a version mismatch between the jars on the cluster?
---
2016-03-24 11:09 GMT+01:00 Shishir Anshuman :
> I am using two Slaves to run the ALS algorithm. I am saving the predictions
> in a textfile using :
> saveAsTextFile(path)
>
> The predictions is getting stored on the slaves but I want the predictions
> to be saved on the Master.
Yes, that is e
2016-03-24 9:54 GMT+01:00 Max Schmidt :
> we're using with the java-api (1.6.0) a ScheduledExecutor that
continuously
> executes a SparkJob to a standalone cluster.
I'd recommend Scala.
> After each job we close the JavaSparkContext and create a new one.
Why do that? You can happily reuse it. Pret
I'd try `brew install spark` or `apache-spark` and see where that gets
you. https://github.com/Homebrew/homebrew
2016-03-04 21:18 GMT+01:00 Aida :
> Hi all,
>
> I am a complete novice and was wondering whether anyone would be willing to
> provide me with a step by step guide on how to install Spar
2016-02-15 14:02 GMT+01:00 Sun, Rui :
> On computation, RRDD launches one R process for each partition, so there
> won't be thread-safe issue
>
> Could you give more details on your new environment?
Running on EC2, I start the executors via
/usr/bin/R CMD javareconf -e "/usr/lib/spark/sbin/
2016-02-15 4:35 GMT+01:00 Sun, Rui :
> Yes, JRI loads an R dynamic library into the executor JVM, which faces
> thread-safe issue when there are multiple task threads within the executor.
>
> I am thinking if the demand like yours (calling R code in RDD
> transformations) is much desired, we may
Hello
I'm currently running R code in an executor via JRI. Because R is
single-threaded, any call to R needs to be wrapped in a
`synchronized`. Now I can use a bit more than one core per executor,
which is undesirable. Is there a way to tell spark that this specific
application (or even specific U
The occasional type error if the casting goes wrong for whatever reason.
2016-01-19 1:22 GMT+08:00 Michael Armbrust :
> What error?
>
> On Mon, Jan 18, 2016 at 9:01 AM, Simon Hafner wrote:
>>
>> And for deserializing,
>> `sqlContext.read.parquet("path/to/parquet
combining the classes
> in Spark 2.0 to remove this awkwardness.
>
> On Tue, Jan 12, 2016 at 11:20 PM, Simon Hafner
> wrote:
>>
>> What's the proper way to write DataSets to disk? Convert them to a
>> DataFrame and use the writers there?
>>
>> ---
What's the proper way to write DataSets to disk? Convert them to a
DataFrame and use the writers there?
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
lved the problem.
>
> On Fri, Oct 16, 2015 at 9:54 AM, Simon Hafner wrote:
>>
>> Fresh clone of spark 1.5.1, java version "1.7.0_85"
>>
>> build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean
>> package
>>
>> [error] bad symbol
2015-11-03 23:20 GMT+01:00 Ionized :
> TypeUtils.getInterpretedOrdering currently only supports AtomicType and
> StructType. Is it possible to add support for UserDefinedType as well?
Yes, make a PR to spark.
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/
2015-11-03 20:26 GMT+01:00 xenocyon :
> I want to save an mllib model to disk, and am trying the model.save
> operation as described in
> http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#examples:
>
> model.save(sc, "myModelPath")
>
> But after running it, I am unable to find
2015-11-03 20:07 GMT+01:00 Sebastian Kuepers
:
> Hey,
>
> with collect() RDDs elements are send as a list back to the driver.
>
> If have a 4 node cluster (based on Mesos) in a datacenter and I have my
> local dev machine.
>
> I work with a small 200MB dataset just for testing during development ri
Fresh clone of spark 1.5.1, java version "1.7.0_85"
build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
[error] bad symbolic reference. A signature in WebUI.class refers to
term eclipse
[error] in package org which is not available.
[error] It may be completely missing
Hi everyone
is it possible to return multiple values with an udaf defined in spark
1.5.0? The documentation [1] mentions
abstract def dataType: DataType
The DataType of the returned value of this UserDefinedAggregateFunction.
so it's only possible to return a single value. Should I use ArrayType
I have 20 nodes via EC2 and an application that reads the data via
wholeTextFiles. I've tried to copy the data into hadoop via
copyFromLocal, and I get
14/11/24 02:00:07 INFO hdfs.DFSClient: Exception in
createBlockOutputStream 172.31.2.209:50010 java.io.IOException: Bad
connect ack with firstBadL
I've tried to set the log4j logger to warn only via log4j properties file in
cat src/test/resources/log4j.properties
log4j.logger.org.apache.spark=WARN
or in sbt via
javaOptions += "-Dlog4j.logger.org.apache.spark=WARN"
But the logger still gives me INFO messages to stdout when I run my tests v
I tried using shapeless HLists as data storage for data inside spark.
Unsurprisingly, it failed. The deserialization isn't well-defined because of
all the implicits used by shapeless. How could I make it work?
Sample Code:
/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apac
20 matches
Mail list logo