Re: Gradient Descent with large model size

2015-10-14 Thread Joseph Bradley
For those numbers of partitions, I don't think you'll actually use tree aggregation. The number of partitions needs to be over a certain threshold (>= 7) before treeAggregate really operates on a tree structure: https://github.com/apache/spark/blob/9808052b5adfed7dafd6c1b3971b998e45b2799a/core/src

Re: [SQL] Memory leak with spark streaming and spark sql in spark 1.5.1

2015-10-14 Thread Shixiong Zhu
Thanks for reporting it Terry. I submitted a PR to fix it: https://github.com/apache/spark/pull/9132 Best Regards, Shixiong Zhu 2015-10-15 2:39 GMT+08:00 Reynold Xin : > +dev list > > On Wed, Oct 14, 2015 at 1:07 AM, Terry Hoo wrote: > >> All, >> >> Does anyone meet memory leak issue with spark

Re: SPARK_MASTER_IP actually expects a DNS name, not IP address

2015-10-14 Thread Ted Yu
Some old bits: http://stackoverflow.com/questions/28162991/cant-run-spark-1-2-in-standalone-mode-on-mac http://stackoverflow.com/questions/29412157/passing-hostname-to-netty FYI On Wed, Oct 14, 2015 at 7:10 PM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > I’m setting the Spark master

Should enforce the uniqueness of field name in DataFrame ?

2015-10-14 Thread Jeff Zhang
Currently seems DataFrame doesn't enforce the uniqueness of field name. So it is possible to have same fields in DataFrame. It usually happens after join especially self-join. Although user can rename the column names before join, or rename the column names after join (DataFrame#withColunmRenamed i

SPARK_MASTER_IP actually expects a DNS name, not IP address

2015-10-14 Thread Nicholas Chammas
I’m setting the Spark master address via the SPARK_MASTER_IP environment variable in spark-env.sh, like spark-ec2 does . The funny thing is that Spark seems to accept this

RE: Understanding code/closure shipment to Spark workers‏

2015-10-14 Thread Arijit
Hi Xiao, Thank you very much for the pointers. I looked into the part of the code. I now understand how the main method is invoked. Still not clear how is the code distributed to the executors. Is it the whole jar or some serialized object. I was expecting to see the part of the code where the

Re: [Streaming] join events in last 10 minutes

2015-10-14 Thread Renyi Xiong
Hi TD, The scenario here is to let events from topic1 wait a fixed 10 minutes for events with same key from topic2 to come and left outer join them by the key does the query do what is expected? if not, what is the right way to achieve this? thanks, Renyi. On Tue, Oct 13, 2015 at 5:14 PM, Danie

Gradient Descent with large model size

2015-10-14 Thread Ulanov, Alexander
Dear Spark developers, I have noticed that Gradient Descent is Spark MLlib takes long time if the model is large. It is implemented with TreeAggregate. I've extracted the code from GradientDescent.scala to perform the benchmark. It allocates the Array of a given size and the aggregates it: val

Strange spark problems among different versions

2015-10-14 Thread zhaoxia
Hi. I try to run the Spark Pi on the cluster, some strange errors happen and I do not know what cause the error. Although I have posted this error to the user@spark, I think it may be not a simple configuration error and the developers may know it well. When I am using the hadoop2.6 and spark-

Re: Status of SBT Build

2015-10-14 Thread Patrick Wendell
Jakob this is now being tested by our harness. I've created a JIRA for the issue, if you want to take a stab at fixing these, that would be great: https://issues.apache.org/jira/browse/SPARK-0 - Patrick On Wed, Oct 14, 2015 at 12:20 PM, Patrick Wendell wrote: > Hi Jakob, > > There is a tem

Re: Status of SBT Build

2015-10-14 Thread Patrick Wendell
Hi Jakob, There is a temporary issue with the Scala 2.11 build in SBT. The problem is this wasn't previously covered by our automated tests so it broke without us knowing - this has been actively discussed on the dev list in the last 24 hours. I am trying to get it working in our test harness toda

Status of SBT Build

2015-10-14 Thread Jakob Odersky
Hi everyone, I've been having trouble building Spark with SBT recently. Scala 2.11 doesn't work and in all cases I get large amounts of warnings and even errors on tests. I was therefore wondering what the official status of spark with sbt is? Is it very new and still buggy or unmaintained and "f

If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-14 Thread Reynold Xin
Can you reply to this email and provide us with reasons why you disable it? Thanks.

Re: [SQL] Memory leak with spark streaming and spark sql in spark 1.5.1

2015-10-14 Thread Reynold Xin
+dev list On Wed, Oct 14, 2015 at 1:07 AM, Terry Hoo wrote: > All, > > Does anyone meet memory leak issue with spark streaming and spark sql in > spark 1.5.1? I can see the memory is increasing all the time when running > this simple sample: > > val sc = new SparkContext(conf) >

Re: Is "mllib" no longer Experimental?

2015-10-14 Thread Patrick Wendell
I would tend to agree with this approach. We should audit all @Experimenetal labels before the 1.6 release and clear them out when appropriate. - Patrick On Wed, Oct 14, 2015 at 2:13 AM, Sean Owen wrote: > Someone asked, is "ML pipelines" stable? I said, no, most of the key > classes are still

Contributing Receiver based Low Level Kafka Consumer from Spark-Packages to Apache Spark Project

2015-10-14 Thread Dibyendu Bhattacharya
Hi, I have raised a JIRA ( https://issues.apache.org/jira/browse/SPARK-11045) to track the discussion but also mailing dev group for your opinion. There are some discussions already happened in Jira and love to hear what others think. You can directly comment against the Jira if you wish. This ka

Is "mllib" no longer Experimental?

2015-10-14 Thread Sean Owen
Someone asked, is "ML pipelines" stable? I said, no, most of the key classes are still marked @Experimental, which matches my expression that things may still be subject to change. But then, I see that MLlib classes, which are de facto not seeing much further work and no API change, are also mostl