Re: Feedback: Feature request

2015-08-27 Thread Manish Amde
Hi James, It's a good idea. A JSON format is more convenient for visualization though a little inconvenient to read. How about toJson() method? It might make the mllib api inconsistent across models though. You should probably create a JIRA for this. CC: dev list -Manish > On Aug 26, 2015,

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-27 Thread Krishna Sankar
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 42:36 min mvn clean package -Pyarn -Phadoop-2.6 -DskipTests 2. Tested pyspark, mllib 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4. KMea

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-27 Thread Reynold Xin
Marcelo - please submit a patch anyway. If we don't include it in this release, it will go into 1.5.1. On Thu, Aug 27, 2015 at 4:56 PM, Marcelo Vanzin wrote: > On Thu, Aug 27, 2015 at 4:42 PM, Marcelo Vanzin > wrote: > > The Windows issue Sen raised could be considered a regression / > > blo

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-27 Thread Marcelo Vanzin
On Thu, Aug 27, 2015 at 4:42 PM, Marcelo Vanzin wrote: > The Windows issue Sen raised could be considered a regression / > blocker, though, and it's a one line fix. If we feel that's important, > let me know and I'll put up a PR against branch-1.5. Looks like Josh just found a blocker, so maybe w

Re: Opening up metrics interfaces

2015-08-27 Thread Thomas Dudziak
+1. I'd love to simply define a timer in my code (maybe metrics-scala ?) using Spark's metrics registry. Also maybe switch to the newer version (io.dropwizard.metrics) ? On Thu, Aug 27, 2015 at 4:42 PM, Reynold Xin wrote: > I'd like this to happen, but it hasn't been super high priority on > any

Re: Opening up metrics interfaces

2015-08-27 Thread Reynold Xin
I'd like this to happen, but it hasn't been super high priority on anybody's mind. There are a couple things that could be good to do: 1. At the application level: consolidate task metrics and accumulators. They have substantial overlap, and from high level should just be consolidated. Maybe ther

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-27 Thread Marcelo Vanzin
+1. I tested the "without hadoop" binary package and ran our internal tests on it with dynamic allocation both on and off. The Windows issue Sen raised could be considered a regression / blocker, though, and it's a one line fix. If we feel that's important, let me know and I'll put up a PR against

Opening up metrics interfaces

2015-08-27 Thread Atsu Kakitani
Hi, I was wondering if there are any plans to open up the API for Spark's metrics system. I want to write custom sources and sinks, but these interfaces aren't public right now. I saw that there was also an issue open for this (https://issues.apache.org/jira/browse/SPARK-5630), but it hasn't been

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-27 Thread Sen Fang
Agree on the line fix. I'm submitting from Windows to YARN running on Linux. I imagine that this isn't that uncommon especially for developers working in corporate setting. On Thu, Aug 27, 2015 at 12:52 PM Marcelo Vanzin wrote: > Are you just submitting from Windows or are you also running YARN

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-27 Thread Marcelo Vanzin
Are you just submitting from Windows or are you also running YARN on Windows? If the former, I think the only fix that would be needed is this line (from that same patch): https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L434 I don't believ

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-27 Thread saurfang
Nevermind. It looks like this has been fixed in https://github.com/apache/spark/pull/8053 but didn't make the cut? Even though the associated JIRA is targeted for 1.6, I was able to submit to YARN from Windows without a problem with 1.4. I'm wondering if this fix will be merged to 1.5 branch. Let m

Re: [VOTE] Release Apache Spark 1.5.0 (RC2)

2015-08-27 Thread saurfang
Compiled on Windows with YARN and HIVE. However I got exception when submitting application to YARN due to: java.net.URISyntaxException: Illegal character in opaque part at index 2: D:\TEMP\spark-b32c5b5b-a9fa-4cfd-a233-3977588d4092\__spark_conf__1960856096319316224.zip at java.net.URI$Pa

Re: High Availability of Spark Driver

2015-08-27 Thread Steve Loughran
On 27 Aug 2015, at 08:42, Ashish Rawat mailto:ashish.ra...@guavus.com>> wrote: Hi Patrick, As discussed in another thread, we are looking for a solution to the problem of lost state on Spark Driver failure. Can you please share Spark’s long term strategy for resolving this problem. <-- Origi

Re: Maven issues with 1.5-RC

2015-08-27 Thread Steve Loughran
what happens if you edit your env to remove mvn 3.0.4, and unset any MAVEN_HOME? On 26 Aug 2015, at 16:08, Chris Freeman mailto:cfree...@alteryx.com>> wrote: Currently trying to compile 1.5-RC2 (from https://github.com/apache/spark/commit/727771352855dbb780008c449a877f5aaa5fc27a) and running i

Re: Building with sbt "impossible to get artifacts when data has not been loaded"

2015-08-27 Thread Jacek Laskowski
On Wed, Aug 26, 2015 at 11:23 PM, Holden Karau wrote: > Has anyone else run into "impossible to get artifacts when data has not been > loaded. IvyNode = org.scala-lang#scala-library;2.10.3" during hive/update > when building with sbt. Working around it is pretty simple (just add it as a > dependen

Re: High Availability of Spark Driver

2015-08-27 Thread Ashish Rawat
If anyone else is also facing similar problems and have figured out a good workaround within the current design, then please share. Regards, Ashish From: Ashish Rawat mailto:ashish.ra...@guavus.com>> Date: Thursday, 27 August 2015 1:12 pm To: "dev@spark.apache.org"

FW: High Availability of Spark Driver

2015-08-27 Thread Ashish Rawat
Hi Patrick, As discussed in another thread, we are looking for a solution to the problem of lost state on Spark Driver failure. Can you please share Spark's long term strategy for resolving this problem. <-- Original Mail Content Below --> We have come across the problem of Spark Applications