Re: Multi-Line JSON in SparkSQL

2015-05-03 Thread Reynold Xin
I took a quick look at that implementation. I'm not sure if it actually handles JSON correctly, because it attempts to find the first { starting from a random point. However, that random point could be in the middle of a string, and thus the first { might just be part of a string, rather than a rea

Re: spark log analyzer sample

2015-05-03 Thread Emrehan Tüzün
On Mon, May 4, 2015 at 9:50 AM, anshu shukla wrote: > Exception in thread "main" java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot > communicate with client version 4 > I am not using any hadoop facility (not even hdfs) then why it is giving > this

spark log analyzer sample

2015-05-03 Thread anshu shukla
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 I am not using any hadoop facility (not even hdfs) then why it is giving this error . -- Thanks & Regards, Anshu Shukla

Re: Multi-Line JSON in SparkSQL

2015-05-03 Thread Emre Sevinc
You can check out the following library: https://github.com/alexholmes/json-mapreduce -- Emre Sevinç On Sun, May 3, 2015 at 10:04 PM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi everyone, > Is there any way in Spark SQL to load multi-line JSON data efficiently, I > think

Re: Multi-Line JSON in SparkSQL

2015-05-03 Thread Olivier Girardot
I'll try to study that and get back to you. Regards, Olivier. Le lun. 4 mai 2015 à 04:05, Reynold Xin a écrit : > How does the pivotal format decides where to split the files? It seems to > me the challenge is to decide that, and on the top of my head the only way > to do this is to scan from t

Question about PageRank with Live Journal

2015-05-03 Thread yunming zhang
Hi, I have a question about running PageRan with live journal data as suggested by the example at org.apache.spark.examples.graphx.LiveJournalPageRank I ran with the following options bin/run-example org.apache.spark.examples.graphx.LiveJournalPageRank data/graphx/soc-LiveJournal1.txt --numEPa

Re: createDataFrame allows column names as second param in Python not in Scala

2015-05-03 Thread Reynold Xin
We can't drop the existing createDataFrame one, since it breaks API compatibility, and the existing one also automatically infers the column name for case classes (in that case users most likely won't be declaring names directly). If this is really a problem, we should just create a new function (m

Re: Multi-Line JSON in SparkSQL

2015-05-03 Thread Reynold Xin
How does the pivotal format decides where to split the files? It seems to me the challenge is to decide that, and on the top of my head the only way to do this is to scan from the beginning and parse the json properly, which makes it not possible with large files (doable for whole input with a lot

Re: Speeding up Spark build during development

2015-05-03 Thread Mark Hamstra
https://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn On Sun, May 3, 2015 at 2:54 PM, Pramod Biligiri wrote: > This is great. I didn't know about the mvn script in the build directory. > > Pramod > > On Fri, May 1, 2015 at 9:51 AM, York, Brennon > > wrote: > > > Follow

Re: Submit & Kill Spark Application program programmatically from another application

2015-05-03 Thread Chester Chen
Sounds like you are in Yarn-Cluster mode. I created a JIRA SPARK-3913 and PR https://github.com/apache/spark/pull/2786 is this what you looking for ? Chester On Sat, May 2, 2015 at 10:32 PM, Yijie Shen wrote: > Hi, > > I’ve posted this pro

Re: Speeding up Spark build during development

2015-05-03 Thread Pramod Biligiri
This is great. I didn't know about the mvn script in the build directory. Pramod On Fri, May 1, 2015 at 9:51 AM, York, Brennon wrote: > Following what Ted said, if you leverage the `mvn` from within the > `build/` directory of Spark you¹ll get zinc for free which should help > speed up build ti

Multi-Line JSON in SparkSQL

2015-05-03 Thread Olivier Girardot
Hi everyone, Is there any way in Spark SQL to load multi-line JSON data efficiently, I think there was in the mailing list a reference to http://pivotal-field-engineering.github.io/pmr-common/ for its JSONInputFormat But it's rather inaccessible considering the dependency is not available in any p

Re: [discuss] ending support for Java 6?

2015-05-03 Thread shane knapp
that bug predates my time at the amplab... :) anyways, just to restate: jenkins currently only builds w/java 7. if you folks need 6, i can make it happen, but it will be a (smallish) bit of work. shane On Sun, May 3, 2015 at 2:14 AM, Sean Owen wrote: > Should be, but isn't what Jenkins does.

Blockers for 1.4.0

2015-05-03 Thread Sean Owen
I'd like to preemptively post the current list of 35 Blockers for release 1.4.0. (There are 53 Critical too, and a total of 273 JIRAs targeted for 1.4.0. Clearly most of that isn't accurate, so would be good to un-target most of that.) As a matter of process and hygiene, it would be best to either

Re: [discuss] ending support for Java 6?

2015-05-03 Thread Sean Owen
Should be, but isn't what Jenkins does. https://issues.apache.org/jira/browse/SPARK-1437 At this point it might be simpler to just decide that 1.5 will require Java 7 and then the Jenkins setup is correct. (NB: you can also solve this by setting bootclasspath to JDK 6 libs even when using javac 7

Re: createDataFrame allows column names as second param in Python not in Scala

2015-05-03 Thread Olivier Girardot
I have the perfect counter example where some of the data scientists prototype in Python and the production materials is done in Scala. But I get your point, as a matter of fact I realised the toDF method took parameters a little while after posting this. However the toDF still needs you to go from

LDA and PageRank Using GraphX

2015-05-03 Thread Praveen Kumar Muthuswamy
Hi All, I am looking to run LDA for topic modeling and page rank algorithms that comes with GraphX for some data analysis. Are there are any examples (GraphX) that I can take a look ? Thanks Praveen