Re: Graphical display of metrics on application UI page

2015-04-21 Thread Akhil Das
​There were some PR's about graphical representation with D3.js, you can possibly see it on the github. Here's a few of them https://github.com/apache/spark/pulls?utf8=%E2%9C%93&q=d3​ Thanks Best Regards On Wed, Apr 22, 2015 at 8:08 AM, Punyashloka Biswal wrote: > Dear Spark devs, > > Would peo

Re: python/run-tests fails at spark master branch

2015-04-21 Thread Saisai Shao
Hi Hrishikesh, Seems the behavior of Kafka-assembly is a little different when using Maven to sbt. The assembly jar name and location is different while using `mvn package`. This is a actually bug, I'm fixing this now. Thanks Jerry 2015-04-22 13:37 GMT+08:00 Hrishikesh Subramonian < hrishikesh.

Re: python/run-tests fails at spark master branch

2015-04-21 Thread Hrishikesh Subramonian
Hi, The /python/run-tests/ executes successfully after I ran /'build/sbt assembly/' command. But the tests fail if I run it after /'mvn -Dskiptests clean package'/ command. Why does it run in /sbt assembly/ and not in/mvn package/? -- Hrishikesh On Wednesday 22 April 2015 07:38 AM, Saisai S

Graphical display of metrics on application UI page

2015-04-21 Thread Punyashloka Biswal
Dear Spark devs, Would people find it useful to have a graphical display of metrics (such as duration, GC time, etc) on the application UI page? Has anybody worked on this before? Punya

Re: python/run-tests fails at spark master branch

2015-04-21 Thread Saisai Shao
Hi Hrishikesh, Now we add Kafka unit test for python which relies on Kafka assembly jar, so you need to run `sbt assembly` or mvn `package` at first to get an assemble jar. 2015-04-22 1:15 GMT+08:00 Marcelo Vanzin : > On Tue, Apr 21, 2015 at 1:30 AM, Hrishikesh Subramonian > wrote: > > > Run

Re: Can't find postgresql jdbc driver when using external datasource

2015-04-21 Thread JaeSung Jun
Thanks Felix, It worked with spark class path variable as follows : SPARK_CLASSPATH=postgresql-9.3-1102-jdbc41.jar I think it should be working with driver class path. Thanks Jason On 21 April 2015 at 22:27, Felix C wrote: > It works with --driver-class-path? > > Please see > https://eradiatin

Re: [discuss] new Java friendly InputSource API

2015-04-21 Thread Soren Macbeth
I'm also super interested in this. Flambo (our clojure DSL) wraps the java api and it would be great to have this. On Tue, Apr 21, 2015 at 4:10 PM, Reynold Xin wrote: > It can reuse. That's a good point and we should document it in the API > contract. > > > On Tue, Apr 21, 2015 at 4:06 PM, Punya

Re: [discuss] new Java friendly InputSource API

2015-04-21 Thread Reynold Xin
It can reuse. That's a good point and we should document it in the API contract. On Tue, Apr 21, 2015 at 4:06 PM, Punyashloka Biswal wrote: > Reynold, thanks for this! At Palantir we're heavy users of the Java APIs > and appreciate being able to stop hacking around with fake ClassTags :) > > Re

Re: [discuss] new Java friendly InputSource API

2015-04-21 Thread Punyashloka Biswal
Reynold, thanks for this! At Palantir we're heavy users of the Java APIs and appreciate being able to stop hacking around with fake ClassTags :) Regarding this specific proposal, is the contract of RecordReader#get intended to be that it returns a fresh object each time? Or is it allowed to mutate

Re: Is spark-ec2 for production use?

2015-04-21 Thread Nicholas Chammas
Nate, could you point us to an example of how one would use Big Top as a "more production-ish" replacement for spark-ec2? I look a look at the project page , but couldn't find any usage examples. Perhaps we can link to them from the spark-ec2 docs. Regarding te

RE: Is spark-ec2 for production use?

2015-04-21 Thread nate
Several of the Bigtop folks got together last week at ApacheCon, this was popular topic for next enhancements with spark related components after getting 1.0 out the door. Some leading topics were: -deployment of spark specific clusters -spark standalone, hdfs -spark over yarn, hdfs

RE: Spark build time

2015-04-21 Thread Alex
If you are using MVN there are some parameters (MAVEN_OPTS) which need to be set in order to give the underlying environment enough memory. See the instructions here: https://spark.apache.org/docs/latest/building-spark.html -Original Message- From: "Reynold Xin" Sent: ‎4/‎21/‎2015 4:21

[discuss] new Java friendly InputSource API

2015-04-21 Thread Reynold Xin
I created a pull request last night for a new InputSource API that is essentially a stripped down version of the RDD API for providing data into Spark. Would be great to hear the community's feedback. Spark currently has two de facto input source API: 1. RDD 2. Hadoop MapReduce InputFormat Neithe

Re: Spark build time

2015-04-21 Thread Reynold Xin
It runs tons of integration tests. I think most developers just let Jenkins run the full suite of them. On Tue, Apr 21, 2015 at 12:54 PM, Olivier Girardot wrote: > Hi everyone, > I was just wandering about the Spark full build time (including tests), > 1h48 seems to me quite... spacious. What's

Spark build time

2015-04-21 Thread Olivier Girardot
Hi everyone, I was just wandering about the Spark full build time (including tests), 1h48 seems to me quite... spacious. What's taking most of the time ? Is the build mainly integration tests ? Is there any roadmap or jiras dedicated to that we can chip in ? Regards, Olivier.

Re: Spark Streaming updatyeStateByKey throws OutOfMemory Error

2015-04-21 Thread Olivier Girardot
Hi Sourav, Can you post your updateFunc as well please ? Regards, Olivier. Le mar. 21 avr. 2015 à 12:48, Sourav Chandra a écrit : > Hi, > > We are building a spark streaming application which reads from kafka, does > updateStateBykey based on the received message type and finally stores into >

Re: Spark 1.2.2 prebuilt release for Hadoop 2.4 didn't get deployed

2015-04-21 Thread Olivier Girardot
Thanks Patrick ! I'll update https://registry.hub.docker.com/u/ogirardot/spark-docker-shell/ when you're done. Regards, Olivier. Le mar. 21 avr. 2015 à 20:47, Patrick Wendell a écrit : > Good catch Olivier - I'll take care of it. Tracking this on SPARK-7027. > > On Tue, Apr 21, 2015 at 6:06 AM

Re: Is spark-ec2 for production use?

2015-04-21 Thread Shivaram Venkataraman
I'm not sure its exactly easy to define 'production' use. One thing we could stress is that spark-ec2 is meant to be run manually (i.e. it outputs errors, asks for prompts etc.) and that automating it is not in our scope right now. Shivaram On Tue, Apr 21, 2015 at 12:05 PM, Nicholas Chammas < nic

Re: Is spark-ec2 for production use?

2015-04-21 Thread Patrick Wendell
It could be a good idea to document this a bit. The original goals were to give people an easy way to get started with Spark and also to provide a consistent environment for our own experiments and benchmarking of Spark at the AMPLab. Over time I've noticed a huge amount of scope increase in terms

Is spark-ec2 for production use?

2015-04-21 Thread Nicholas Chammas
Is spark-ec2 intended for spinning up production Spark clusters? I think the answer is no. However, the docs for spark-ec2 very much leave that possibility open, and indeed I see many people asking questions or opening issues that stem from

Re: [pyspark] Drop __getattr__ on DataFrame

2015-04-21 Thread Reynold Xin
I replied on JIRA. Let's move the discussion there. On Tue, Apr 21, 2015 at 8:13 AM, Karlson wrote: > I think the __getattr__ method should be removed from the DataFrame API in > pyspark. > > May I draw the Python folk's attention to the issue > https://issues.apache.org/jira/browse/SPARK-7035

Re: Spark 1.2.2 prebuilt release for Hadoop 2.4 didn't get deployed

2015-04-21 Thread Patrick Wendell
Good catch Olivier - I'll take care of it. Tracking this on SPARK-7027. On Tue, Apr 21, 2015 at 6:06 AM, Olivier Girardot wrote: > Hi everyone, > It seems the some of the Spark 1.2.2 prebuilt versions (I tested mainly for > Hadoop 2.4 and later) didn't get deploy on all the mirrors and cloudfront

Re: python/run-tests fails at spark master branch

2015-04-21 Thread Marcelo Vanzin
On Tue, Apr 21, 2015 at 1:30 AM, Hrishikesh Subramonian wrote: > Run streaming tests ... > Failed to find Spark Streaming Kafka assembly jar in > /home/xyz/spark/external/kafka-assembly > You need to build Spark with 'build/sbt assembly/assembly > streaming-kafka-assembly/assembly' or 'build/mvn

[pyspark] Drop __getattr__ on DataFrame

2015-04-21 Thread Karlson
I think the __getattr__ method should be removed from the DataFrame API in pyspark. May I draw the Python folk's attention to the issue https://issues.apache.org/jira/browse/SPARK-7035 and invite comments? Thank you! - To un

Spark 1.2.2 prebuilt release for Hadoop 2.4 didn't get deployed

2015-04-21 Thread Olivier Girardot
Hi everyone, It seems the some of the Spark 1.2.2 prebuilt versions (I tested mainly for Hadoop 2.4 and later) didn't get deploy on all the mirrors and cloudfront. Both the direct download and apache mirrors fails with dead links, for example : http://d3kbcqa49mib13.cloudfront.net/spark-1.2.2-bin-h

RE: Can't find postgresql jdbc driver when using external datasource

2015-04-21 Thread Felix C
It works with --driver-class-path? Please see https://eradiating.wordpress.com/2015/04/17/using-spark-data-sources-to-load-data-from-postgresql/ --- Original Message --- From: "JaeSung Jun" Sent: April 21, 2015 1:05 AM To: dev@spark.apache.org Subject: Can't find postgresql jdbc driver when us

RE: Can't find postgresql jdbc driver when using external datasource

2015-04-21 Thread Felix C
It works with --driver-class-path? Please see https://eradiating.wordpress.com/2015/04/17/using-spark-data-sources-to-load-data-from-postgresql/ --- Original Message --- From: "JaeSung Jun" Sent: April 21, 2015 1:05 AM To: dev@spark.apache.org Subject: Can't find postgresql jdbc driver when us

python/run-tests fails at spark master branch

2015-04-21 Thread hrishikesh91
Hi, I cloned spark master branch from github and was built successfully using the mvn -DskipTests clean package command. But the python/run-tests command fails. Please see the log below: /Running PySpark tests. Output is in python/unit-tests.log. Testing with Python version: Python 2.7.3 Run core

python/run-tests fails at spark master branch

2015-04-21 Thread Hrishikesh Subramonian
Hi, I cloned spark master branch from github and was built successfully using the mvn -DskipTests clean package command. But the python/run-tests command fails. Please see the log below: Running PySpark tests. Output is in python/unit-tests.log. Testing with Python version: Python 2.7.3 Run co

Can't find postgresql jdbc driver when using external datasource

2015-04-21 Thread JaeSung Jun
Hi, I tried to get external data base table running sitting on postgresql. i've got java.lang.ClassNotFoundException even if i added driver jar using --jars option like followings : is it class loader hierarchy problem or any idea? thanks - spark-sql --jars ../lib/postgresql-9.