nyone know of a solution compatible with Spark 1.4 or 1.5?
Thanks again!
From: Reynold Xin
Date: Friday, September 4, 2015 at 5:19 PM
To: Eron Wright
Cc: "dev@spark.apache.org"
Subject: Re: (Spark SQL) partition-scoped UDF
Can you say more about your transformer?
This is a good idea
of a solution compatible with Spark 1.4 or 1.5?
Thanks again!
From: Reynold Xin
Date: Friday, September 4, 2015 at 5:19 PM
To: Eron Wright
Cc: "dev@spark.apache.org"
Subject: Re: (Spark SQL) partition-scoped UDF
Can you say more about your transformer?
This is a good idea, and ind
ormed in batch
for efficiency to amortize some overhead. How may I accomplish this?
One option appears to be to invoke DataFrame::mapPartitions, yielding an RDD
that is then converted back to a DataFrame. Unsure about the viability or
consequences of that.
Thanks!Eron Wright
but
reiterating it here.
Thanks,
Eron Wright
I filed an issue due to an issue I see with PrunedScan, that causes sub-optimal
performance in ML pipelines.
Sorry if the issue is already known.
Having tried a few approaches to working with large binary files with Spark ML,
I prefer loading the data into a vector-type column from a relation
The deeplearning4j project provides neural net algorithms for Spark ML. You
may consider it sample code for extending Spark with new ML algorithms.
http://deeplearning4j.org/sparkmlhttps://github.com/deeplearning4j/deeplearning4j/tree/master/deeplearning4j-scaleout/spark/dl4j-spark-ml
-Eron
> D
Options include:use 'spark.driver.host' and 'spark.driver.port' setting to
stabilize the driver-side endpoint. (ref)use host networking for your
container, i.e. "docker run --net=host ..."use yarn-cluster mode (see
SPARK-5162)
Hope this helps,Eron
Date: Wed, 10 Jun 2015 13:43:04 -0700
Subject:
have Spark working with
multiple GPUs on AWS and we're looking forward to optimizations that will speed
neural net training even more.
Eron Wright
Contributor | deeplearning4j.org
I saw something like this last night, with a similar message. Is this what
you’re referring to?
[error]
org.deeplearning4j#dl4j-spark-ml;0.0.3.3.4.alpha1-SNAPSHOT!dl4j-spark-ml.jar
origin location must be absolute:
file:/Users/eron/.m2/repository/org/deeplearning4j/dl4j-spark-ml/0.0.3.3.4.alp
Hello,
I'm working on SPARK-7400 for DataFrame support for PortableDataStream, i.e.
the data type associated with the RDD from sc.binaryFiles(...).
Assuming a patch is available soon, what is the likelihood of inclusion in
Spark 1.4?
Thanks
10 matches
Mail list logo