Cool! Help to improve, there exists a typo 'avaragingFunction' :)
Thanks for sharing.
2015-10-17 5:55 GMT+08:00 Reynold Xin :
> Thanks for sharing, Bill.
>
>
> On Fri, Oct 16, 2015 at 2:06 PM, Bill Bejeck wrote:
>>
>> All,
>>
>> I just did a post on adding groupByKeyToList and groupByKeyUnique u
The decrease in running time from N=6 to N=7 makes some sense to me; that
should be when tree aggregation kicks in. I'd call it an improvement to
run in the same ~13sec increasing from N=6 to N=9.
"Does this mean that for 5 nodes with treeaggreate of depth 1 it will take
5*3.1~15.5 seconds?"
-->
I just saw this happening:
[info] - map stage submission with multiple shared stages and failures ***
FAILED *** (566 milliseconds)
[info] java.lang.IndexOutOfBoundsException: 2
[info] at
scala.collection.mutable.ResizableArray$class.apply(ResizableArray.scala:43)
[info] at scala.collection.
Yes, remember that your bandwidth is the maximum number of bytes per second
that can be shipped to the driver. So if you've got 5 blocks that size, then it
looks like you're basically saturating the network.
Aggregation trees help for many partitions/nodes and butterfly mixing can help
use all
Thanks for bringing this up! We need to add PMML export methods to the
spark.ml API. I just made a JIRA for tracking that:
https://issues.apache.org/jira/browse/SPARK-11171
Joseph
On Thu, Oct 15, 2015 at 2:58 AM, Fazlan Nazeem wrote:
> Ok It turns out I was using the wrong LinearRegressionMod
I was using jdk 1.7 and maven version is the same as pom file.
᚛ |(v1.5.1)|$ java -version
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
Using build/sbt still fail the same with -Denforcer.skip, with mv
Have you set MAVEN_OPTS with the following ?
-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m
Cheers
On Sat, Oct 17, 2015 at 2:35 PM, Chester Chen wrote:
> I was using jdk 1.7 and maven version is the same as pom file.
>
> ᚛ |(v1.5.1)|$ java -version
> java version "1.7.0_51"
> Java(T
Hi, it'd be great to share your implementation with the community. I'd
recommend:
(1) Share it immediately by creating a Spark package:
http://spark-packages.org/
You can use this helper package to create your own:
http://spark-packages.org/package/databricks/sbt-spark-package
After you create a
Yes, I have tried MAVEN_OPTS with
-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m
-Xmx4g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m
-Xmx2g -XX:MaxPermSize=1g -XX:ReservedCodeCacheSize=512m
None of them works. All failed with the same error.
thanks
On Sat, Oct 17, 2015
I have a streaming application that reads from Kakfa (direct stream) and then
write parquet files. It is a pretty simple app that gets a Kafka direct
stream (8 partitions) and then calls `stream.foreachRdd` and then stores to
parquet using a Dataframe. Batch intervals are set to 10 seconds. During
Nicholas Chammas wrote
> The funny thing is that Spark seems to accept this only if the value of
> SPARK_MASTER_IP is a DNS name and not an IP address.
>
> When I provide an IP address, I get errors in the log when starting the
> master:
>
> 15/10/15 01:47:31 ERROR NettyTransport: failed to bind
Hi,
I noticed that when you checkpoint a given RDD, it results in performing the
action twice as I can see 2 jobs being executed in the Spark UI.
Example:
val logFile = "/data/pagecounts"
sc.setCheckpointDir("/checkpoints")
val logData = sc.textFile(logFile, 2)
val as = logData.filter(line => lin
12 matches
Mail list logo