Re: R - Scala interface used in Spark?

2015-06-26 Thread Vasili I. Galchin
thx Reynold! Vasya On Fri, Jun 26, 2015 at 7:03 PM, Reynold Xin wrote: > Take a look at this for Python: > > https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals > > > On Fri, Jun 26, 2015 at 6:06 PM, Reynold Xin wrote: > >> You doing something for Haskell?? >> >> On Fri, Jun 26

Re: R - Scala interface used in Spark?

2015-06-26 Thread Reynold Xin
Take a look at this for Python: https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals On Fri, Jun 26, 2015 at 6:06 PM, Reynold Xin wrote: > You doing something for Haskell?? > > On Fri, Jun 26, 2015 at 5:21 PM, Vasili I. Galchin > wrote: > >> How about Python?? >> >> On Friday,

Re: R - Scala interface used in Spark?

2015-06-26 Thread Reynold Xin
You doing something for Haskell?? On Fri, Jun 26, 2015 at 5:21 PM, Vasili I. Galchin wrote: > How about Python?? > > On Friday, June 26, 2015, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> We don't use the rscala package in SparkR -- We have an in built R-JVM >> bridge that i

Re: R - Scala interface used in Spark?

2015-06-26 Thread Vasili I. Galchin
How about Python?? On Friday, June 26, 2015, Shivaram Venkataraman wrote: > We don't use the rscala package in SparkR -- We have an in built R-JVM > bridge that is customized to work with various deployment modes. You can > find more details in my Spark Summit 2015 talk. > > Thanks > Shivaram >

Re: R - Scala interface used in Spark?

2015-06-26 Thread Shivaram Venkataraman
You can see the slides, video at https://spark-summit.org/2015/events/sparkr-the-past-the-present-and-the-future/ On Fri, Jun 26, 2015 at 5:19 PM, Vasili I. Galchin wrote: > Url plese !! URL. Please of ypur work. > > > On Friday, June 26, 2015, Shivaram Venkataraman < > shiva...@eecs.berkeley.e

Re: R - Scala interface used in Spark?

2015-06-26 Thread Vasili I. Galchin
Url plese !! URL. Please of ypur work. On Friday, June 26, 2015, Shivaram Venkataraman wrote: > We don't use the rscala package in SparkR -- We have an in built R-JVM > bridge that is customized to work with various deployment modes. You can > find more details in my Spark Summit 2015 talk. > >

Re: R - Scala interface used in Spark?

2015-06-26 Thread Shivaram Venkataraman
We don't use the rscala package in SparkR -- We have an in built R-JVM bridge that is customized to work with various deployment modes. You can find more details in my Spark Summit 2015 talk. Thanks Shivaram On Fri, Jun 26, 2015 at 3:19 PM, Vasili I. Galchin wrote: > A friend sent the below: >

Re: Time is ugly in Spark Streaming....

2015-06-26 Thread Emrehan Tüzün
On Fri, Jun 26, 2015 at 12:30 PM, Sea <261810...@qq.com> wrote: > Hi, all > I find a problem in spark streaming, when I use the time in function > foreachRDD... I find the time is very interesting. > val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, > StringDecoder](s

R - Scala interface used in Spark?

2015-06-26 Thread Vasili I. Galchin
A friend sent the below: http://cran.r-project.org/web/packages/rscala/index.html Is this the "glue" between R and Scala that is used in Spark? Vasili

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-26 Thread Ted Yu
Pardon. During earlier test run, I got: ^[[32mStreamingContextSuite:^[[0m ^[[32m- from no conf constructor^[[0m ^[[32m- from no conf + spark home^[[0m ^[[32m- from no conf + spark home + env^[[0m ^[[32m- from conf with settings^[[0m ^[[32m- from existing SparkContext^[[0m ^[[32m- from existing Spa

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-26 Thread Ted Yu
I got the following when running test suite: [INFO] compiler plugin: BasicArtifact(org.scalamacros,paradise_2.10.4,2.0.1,null) ^[[0m[^[[0minfo^[[0m] ^[[0mCompiling 2 Scala sources and 1 Java source to /home/hbase/spark-1.4.1/streaming/target/scala-2.10/test-classes...^[[0m ^[[0m[^[[31merror^[[0m]

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-26 Thread Patrick Wendell
Hey Tom - no one voted on this yet, so I need to keep it open until people vote. But I'm not aware of specific things we are waiting for. Anyone else? - Patrick On Fri, Jun 26, 2015 at 7:10 AM, Tom Graves wrote: > So is this open for vote then or are we waiting on other things? > > Tom > > > > O

Re: [VOTE] Release Apache Spark 1.4.1

2015-06-26 Thread Tom Graves
So is this open for vote then or are we waiting on other things? Tom On Thursday, June 25, 2015 10:32 AM, Andrew Ash wrote: I would guess that many tickets targeted at 1.4.1 were set that way during the tail end of the 1.4.0 voting process as people realized they wouldn't make the

?????? Time is ugly in Spark Streaming....

2015-06-26 Thread Sea
Yes, I make it. -- -- ??: "Gerard Maas";; : 2015??6??26??(??) 5:40 ??: "Sea"<261810...@qq.com>; : "user"; "dev"; : Re: Time is ugly in Spark Streaming Are you sharing the SimpleDateFormat instance? This looks a lo

RE: [SparkScore]Performance portal for Apache Spark - WW26

2015-06-26 Thread Huang, Jie
Thanks. In general, we can see a stable trend in Spark master branch and latest release. And we are also considering to add more benchmarks/workloads into this automation perf tool. Any comment and feedback is warmly welcomed. Thank you && Best Regards, Grace (Huang Jie) From: Nan Zhu [mailto:

Re: [SparkScore]Performance portal for Apache Spark - WW26

2015-06-26 Thread Nan Zhu
Thank you, Jie! Very nice work! -- Nan Zhu http://codingcat.me On Friday, June 26, 2015 at 8:17 AM, Huang, Jie wrote: > Correct. Your calculation is right! > > We have been aware of that kmeans performance drop also. According to our > observation, it is caused by some unbalanced execut

RE: [SparkScore]Performance portal for Apache Spark - WW26

2015-06-26 Thread Huang, Jie
Correct. Your calculation is right! We have been aware of that kmeans performance drop also. According to our observation, it is caused by some unbalanced executions among different tasks. Even we used the same test data between different versions (i.e., not caused by the data skew). And the c

Re: [SparkScore]Performance portal for Apache Spark - WW26

2015-06-26 Thread Nan Zhu
Hi, Jie, Thank you very much for this work! Very helpful! I just would like to confirm that I understand the numbers correctly: if we take the running time of 1.2 release as 100s 9.1% - means the running time is 109.1 s? -4% - means it comes 96s? If that’s the true meaning of the numbers, w

Re: [SQL] codegen on wide dataset throws StackOverflow

2015-06-26 Thread Peter Rudenko
I'm using spark-1.4.0. Sure will try to make steps to reproduce and file a JIRA ticket. Thanks, Peter Rudenko On 2015-06-26 11:14, Josh Rosen wrote: Which Spark version are you using? Can you file a JIRA for this issue? On Thu, Jun 25, 2015 at 6:35 AM, Peter Rudenko mailto:petro.rude...@gma

Re: Time is ugly in Spark Streaming....

2015-06-26 Thread Gerard Maas
Are you sharing the SimpleDateFormat instance? This looks a lot more like the non-thread-safe behaviour of SimpleDateFormat (that has claimed many unsuspecting victims over the years), than any 'ugly' Spark Streaming. Try writing the timestamps in millis to Kafka and compare. -kr, Gerard. On Fri,

Time is ugly in Spark Streaming....

2015-06-26 Thread Sea
Hi, all I find a problem in spark streaming, when I use the time in function foreachRDD... I find the time is very interesting. val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topicsSet) dataStream.map(x => createGroup(x._2, dimensio

Re: [SQL] codegen on wide dataset throws StackOverflow

2015-06-26 Thread Josh Rosen
Which Spark version are you using? Can you file a JIRA for this issue? On Thu, Jun 25, 2015 at 6:35 AM, Peter Rudenko wrote: > Hi, i have a small but very wide dataset (2000 columns). Trying to > optimize Dataframe pipeline for it, since it behaves very poorly comparing > to rdd operation. > W

Re: Spark for distributed dbms cluster

2015-06-26 Thread Akhil Das
Which distributed database are you referring here? Spark can connect with almost all those databases out there (You just need to pass the Input/Output Format classes or there are a bunch of connectors also available). Thanks Best Regards On Fri, Jun 26, 2015 at 12:07 PM, louis.hust wrote: > Hi,

Re: External Shuffle service over yarn

2015-06-26 Thread Aaron Davidson
A second advantage is that it allows individual Executors to go into GC pause (or even crash) and still allow other Executors to read shuffle data and make progress, which tends to improve stability of memory-intensive jobs. On Thu, Jun 25, 2015 at 11:42 PM, Sandy Ryza wrote: > Hi Yash, > > One