from:"Simon Hafner"

Fwd: Saving large textfile

2016-04-24 Thread Simon Hafner

2016-04-24 13:38 GMT+02:00 Stefan Falk : > sc.parallelize(cfile.toString() > .split("\n"), 1) Try `sc.textFile(pathToFile)` instead. >java.io.IOException: Broken pipe >at sun.nio.ch.FileDispatcherImpl.write0(Native Method) >at sun.nio.ch.SocketDispatcher.write(SocketD

Re: StreamCorruptedException during deserialization

2016-03-29 Thread Simon Hafner

2016-03-29 11:25 GMT+02:00 Robert Schmidtke : > Is there a meaningful way for me to find out what exactly is going wrong > here? Any help and hints are greatly appreciated! Maybe a version mismatch between the jars on the cluster? ---

Re: Output is being stored on the clusters (slaves).

2016-03-24 Thread Simon Hafner

2016-03-24 11:09 GMT+01:00 Shishir Anshuman : > I am using two Slaves to run the ALS algorithm. I am saving the predictions > in a textfile using : > saveAsTextFile(path) > > The predictions is getting stored on the slaves but I want the predictions > to be saved on the Master. Yes, that is e

Re: No active SparkContext

2016-03-24 Thread Simon Hafner

2016-03-24 9:54 GMT+01:00 Max Schmidt : > we're using with the java-api (1.6.0) a ScheduledExecutor that continuously > executes a SparkJob to a standalone cluster. I'd recommend Scala. > After each job we close the JavaSparkContext and create a new one. Why do that? You can happily reuse it. Pret

Re: Installing Spark on Mac

2016-03-04 Thread Simon Hafner

I'd try `brew install spark` or `apache-spark` and see where that gets you. https://github.com/Homebrew/homebrew 2016-03-04 21:18 GMT+01:00 Aida : > Hi all, > > I am a complete novice and was wondering whether anyone would be willing to > provide me with a step by step guide on how to install Spar

Re: Running synchronized JRI code

2016-02-15 Thread Simon Hafner

2016-02-15 14:02 GMT+01:00 Sun, Rui : > On computation, RRDD launches one R process for each partition, so there > won't be thread-safe issue > > Could you give more details on your new environment? Running on EC2, I start the executors via /usr/bin/R CMD javareconf -e "/usr/lib/spark/sbin/

Re: Running synchronized JRI code

2016-02-15 Thread Simon Hafner

2016-02-15 4:35 GMT+01:00 Sun, Rui : > Yes, JRI loads an R dynamic library into the executor JVM, which faces > thread-safe issue when there are multiple task threads within the executor. > > I am thinking if the demand like yours (calling R code in RDD > transformations) is much desired, we may

Running synchronized JRI code

2016-02-14 Thread Simon Hafner

Hello I'm currently running R code in an executor via JRI. Because R is single-threaded, any call to R needs to be wrapped in a `synchronized`. Now I can use a bit more than one core per executor, which is undesirable. Is there a way to tell spark that this specific application (or even specific U

Re: Serializing DataSets

2016-01-19 Thread Simon Hafner

The occasional type error if the casting goes wrong for whatever reason. 2016-01-19 1:22 GMT+08:00 Michael Armbrust : > What error? > > On Mon, Jan 18, 2016 at 9:01 AM, Simon Hafner wrote: >> >> And for deserializing, >> `sqlContext.read.parquet("path/to/parquet

Re: Serializing DataSets

2016-01-18 Thread Simon Hafner

combining the classes > in Spark 2.0 to remove this awkwardness. > > On Tue, Jan 12, 2016 at 11:20 PM, Simon Hafner > wrote: >> >> What's the proper way to write DataSets to disk? Convert them to a >> DataFrame and use the writers there? >> >> ---

Serializing DataSets

2016-01-12 Thread Simon Hafner

What's the proper way to write DataSets to disk? Convert them to a DataFrame and use the writers there? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Compiling spark 1.5.1 fails with scala.reflect.internal.Types$TypeError: bad symbolic reference.

2015-12-16 Thread Simon Hafner

lved the problem. > > On Fri, Oct 16, 2015 at 9:54 AM, Simon Hafner wrote: >> >> Fresh clone of spark 1.5.1, java version "1.7.0_85" >> >> build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean >> package >> >> [error] bad symbol

Re: Support Ordering on UserDefinedType

2015-11-03 Thread Simon Hafner

2015-11-03 23:20 GMT+01:00 Ionized : > TypeUtils.getInterpretedOrdering currently only supports AtomicType and > StructType. Is it possible to add support for UserDefinedType as well? Yes, make a PR to spark. https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/

Fwd: Where does mllib's .save method save a model to?

2015-11-03 Thread Simon Hafner

2015-11-03 20:26 GMT+01:00 xenocyon : > I want to save an mllib model to disk, and am trying the model.save > operation as described in > http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#examples: > > model.save(sc, "myModelPath") > > But after running it, I am unable to find

Fwd: collect() local faster than 4 node cluster

2015-11-03 Thread Simon Hafner

2015-11-03 20:07 GMT+01:00 Sebastian Kuepers : > Hey, > > with collect() RDDs elements are send as a list back to the driver. > > If have a 4 node cluster (based on Mesos) in a datacenter and I have my > local dev machine. > > I work with a small 200MB dataset just for testing during development ri

Compiling spark 1.5.1 fails with scala.reflect.internal.Types$TypeError: bad symbolic reference.

2015-10-16 Thread Simon Hafner

Fresh clone of spark 1.5.1, java version "1.7.0_85" build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package [error] bad symbolic reference. A signature in WebUI.class refers to term eclipse [error] in package org which is not available. [error] It may be completely missing

udaf with multiple return values in spark 1.5.0

2015-09-06 Thread Simon Hafner

Hi everyone is it possible to return multiple values with an udaf defined in spark 1.5.0? The documentation [1] mentions abstract def dataType: DataType The DataType of the returned value of this UserDefinedAggregateFunction. so it's only possible to return a single value. Should I use ArrayType

wholeTextFiles on 20 nodes

2014-11-23 Thread Simon Hafner

I have 20 nodes via EC2 and an application that reads the data via wholeTextFiles. I've tried to copy the data into hadoop via copyFromLocal, and I get 14/11/24 02:00:07 INFO hdfs.DFSClient: Exception in createBlockOutputStream 172.31.2.209:50010 java.io.IOException: Bad connect ack with firstBadL

log4j logging control via sbt

2014-11-05 Thread Simon Hafner

I've tried to set the log4j logger to warn only via log4j properties file in cat src/test/resources/log4j.properties log4j.logger.org.apache.spark=WARN or in sbt via javaOptions += "-Dlog4j.logger.org.apache.spark=WARN" But the logger still gives me INFO messages to stdout when I run my tests v

Spark with HLists

2014-10-29 Thread Simon Hafner

I tried using shapeless HLists as data storage for data inside spark. Unsurprisingly, it failed. The deserialization isn't well-defined because of all the implicits used by shapeless. How could I make it work? Sample Code: /* SimpleApp.scala */ import org.apache.spark.SparkContext import org.apac

Fwd: Saving large textfile

Re: StreamCorruptedException during deserialization

Re: Output is being stored on the clusters (slaves).

Re: No active SparkContext

Re: Installing Spark on Mac

Re: Running synchronized JRI code

Re: Running synchronized JRI code

Running synchronized JRI code

Re: Serializing DataSets

Re: Serializing DataSets

Serializing DataSets

Re: Compiling spark 1.5.1 fails with scala.reflect.internal.Types$TypeError: bad symbolic reference.

Re: Support Ordering on UserDefinedType

Fwd: Where does mllib's .save method save a model to?

Fwd: collect() local faster than 4 node cluster

Compiling spark 1.5.1 fails with scala.reflect.internal.Types$TypeError: bad symbolic reference.

udaf with multiple return values in spark 1.5.0

wholeTextFiles on 20 nodes

log4j logging control via sbt

Spark with HLists

20 matches

Site Navigation

Mail list logo

Footer information