Re: [discuss] ending support for Java 6?

2015-04-30 Thread Nick Pentreath
+1 for this think it's high time. We should of course do it with enough warning for users. 1.4 May be too early (not for me though!). Perhaps we specify that 1.5 will officially move to JDK7? — Sent from Mailbox On Fri, May 1, 2015 at 12:16 AM, Ram Sriharsha wrote: > +1 for end of

Re: Regarding KryoSerialization in Spark

2015-04-30 Thread twinkle sachdeva
Thanks for the info. On Fri, May 1, 2015 at 12:10 AM, Sandy Ryza wrote: > Hi Twinkle, > > Registering the class makes it so that writeClass only writes out a couple > bytes, instead of a full String of the class name. > > -Sandy > > On Thu, Apr 30, 2015 at 4:13 AM, twinkle sachdeva < > twinkle.

Re: Mima test failure in the master branch?

2015-04-30 Thread Patrick Wendell
I reverted the patch that I think was causing this: SPARK-5213 Thanks On Thu, Apr 30, 2015 at 7:59 PM, zhazhan wrote: > Any PR open for this? > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Mima-test-failure-in-the-master-branch-tp11949p119

Re: Mima test failure in the master branch?

2015-04-30 Thread Ted Yu
Looks like this has been taken care of: commit beeafcfd6ee1e460c4d564cd1515d8781989b422 Author: Patrick Wendell Date: Thu Apr 30 20:33:36 2015 -0700 Revert "[SPARK-5213] [SQL] Pluggable SQL Parser Support" On Thu, Apr 30, 2015 at 7:58 PM, zhazhan wrote: > [info] spark-sql: found 1 poten

Re: Mima test failure in the master branch?

2015-04-30 Thread zhazhan
Any PR open for this? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Mima-test-failure-in-the-master-branch-tp11949p11950.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Mima test failure in the master branch?

2015-04-30 Thread zhazhan
[info] spark-sql: found 1 potential binary incompatibilities (filtered 129) [error] * method sqlParser()org.apache.spark.sql.SparkSQLParser in class org.apache.spark.sql.SQLContext does not have a correspondent in new version [error] filter with: ProblemFilters.excludeMissingMethodProblem -- Vie

Re: Uninitialized session in HiveContext?

2015-04-30 Thread Marcelo Vanzin
Hi Michael, It would be great to see changes to make hive integration less painful, and I can test them in our environment once you have a patch. But I guess my question is a little more geared towards the current code; doesn't the issue I ran into affect 1.4 and potentially earlier versions too?

Re: Uninitialized session in HiveContext?

2015-04-30 Thread Michael Armbrust
Hey Marcelo, Thanks for the heads up! I'm currently in the process of refactoring all of this (to separate the metadata connection from the execution side) and as part of this I'm making the initialization of the session not lazy. It would be great to hear if this also works for your internal in

Re: Issue of running partitioned loading (RDD) in Spark External Datasource on Mesos

2015-04-30 Thread Yang Lei
I finally isolated the issue to be related to the ActorSystem I reuse from SparkEnv.get.actorSystem. This ActorSystem will contain the configuration defined in my application jar's reference.conf in both local cluster case, and in the case I use it directly in an extension to BaseRelation's buildSc

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Ram Sriharsha
+1 for end of support for Java 6 On Thursday, April 30, 2015 3:08 PM, Vinod Kumar Vavilapalli wrote: FYI, after enough consideration, we the Hadoop community dropped support for JDK 6 starting release Apache Hadoop 2.7.x. Thanks +Vinod On Apr 30, 2015, at 12:02 PM, Reynold Xin w

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Vinod Kumar Vavilapalli
FYI, after enough consideration, we the Hadoop community dropped support for JDK 6 starting release Apache Hadoop 2.7.x. Thanks +Vinod On Apr 30, 2015, at 12:02 PM, Reynold Xin wrote: > This has been discussed a few times in the past, but now Oracle has ended > support for Java 6 for over a ye

Uninitialized session in HiveContext?

2015-04-30 Thread Marcelo Vanzin
Hey all, We ran into some test failures in our internal branch (which builds against Hive 1.1), and I narrowed it down to the fix below. I'm not super familiar with the Hive integration code, but does this look like a bug for other versions of Hive too? This caused an error where some internal Hi

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Ted Yu
But it is hard to know how long customers stay with their most recent download. Cheers On Thu, Apr 30, 2015 at 2:26 PM, Sree V wrote: > If there is any possibility of getting the download counts,then we can use > it as EOS criteria as well.Say, if download counts are lower than 30% (or > anothe

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Sree V
If there is any possibility of getting the download counts,then we can use it as EOS criteria as well.Say, if download counts are lower than 30% (or another number) of Life time highest,then it qualifies for EOS. Thanking you. With Regards Sree On Thursday, April 30, 2015 2:22 PM, Sree

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Sree V
Hi Team, Should we take this opportunity to layout and evangelize a pattern for EOL of dependencies.I propose, we follow the official EOL of java, python, scala, .And add say 6-12-24 months depending on the popularity. Java 6 official EOL Feb 2013Add 6-12 monthsAug 2013 - Feb 2014 official En

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Marcelo Vanzin
As for the idea, I'm +1. Spark is the only reason I still have jdk6 around - exactly because I don't want to cause the issue that started this discussion (inadvertently using JDK7 APIs). And as has been pointed out, even J7 is about to go EOL real soon. Even Hadoop is moving away (I think 2.7 will

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Ted Yu
+1 on ending support for Java 6. BTW from https://www.java.com/en/download/faq/java_7.xml : After April 2015, Oracle will no longer post updates of Java SE 7 to its public download sites. On Thu, Apr 30, 2015 at 1:34 PM, Punyashloka Biswal wrote: > I'm in favor of ending support for Java 6. We

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Punyashloka Biswal
I'm in favor of ending support for Java 6. We should also articulate a policy on how long we want to support current and future versions of Java after Oracle declares them EOL (Java 7 will be in that bucket in a matter of days). Punya On Thu, Apr 30, 2015 at 1:18 PM shane knapp wrote: > somethin

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Sean Owen
I'm firmly in favor of this. It would also fix https://issues.apache.org/jira/browse/SPARK-7009 and avoid any more of the long-standing 64K file limit thing that's still a problem for PySpark. As a point of reference, CDH5 has never supported Java 6, and it was released over a year ago. On Thu,

Re: [discuss] ending support for Java 6?

2015-04-30 Thread shane knapp
something to keep in mind: we can easily support java 6 for the build environment, particularly if there's a definite EOL. i'd like to fix our java versioning 'problem', and this could be a big instigator... right now we're hackily setting java_home in test invocation on jenkins, which really is

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Patrick Wendell
I'd also support this. In general, I think it's good that we try to have Spark support different versions of things (Hadoop, Hive, etc). But at some point you need to weigh the costs of doing so against the number of users affected. In the case of Java 6, we are seeing increasing cost from this. S

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Koert Kuipers
nicholas started it! :) for java 6 i would have said the same thing about 1 year ago: it is foolish to drop it. but i think the time is right about now. about half our clients are on java 7 and the other half have active plans to migrate to it within 6 months. On Thu, Apr 30, 2015 at 3:57 PM, Rey

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Reynold Xin
Guys thanks for chiming in, but please focus on Java here. Python is an entirely separate issue. On Thu, Apr 30, 2015 at 12:53 PM, Koert Kuipers wrote: > i am not sure eol means much if it is still actively used. we have a lot > of clients with centos 5 (for which we still support python 2.4 in

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Koert Kuipers
i am not sure eol means much if it is still actively used. we have a lot of clients with centos 5 (for which we still support python 2.4 in some form or another, fun!). most of them are on centos 6, which means python 2.6. by cutting out python 2.6 you would cut out the majority of the actual clust

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Nicholas Chammas
(On that note, I think Python 2.6 should be next on the chopping block sometime later this year, but that’s for another thread.) (To continue the parenthetical, Python 2.6 was in fact EOL-ed in October of 2013. ) ​ On Thu, Apr 30, 2015 at 3:18 PM N

Re: practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-04-30 Thread badgerpants
Cody Koeninger-2 wrote > In fact, you're using the 2 arg form of reduce by key to shrink it down to > 1 partition > > reduceByKey(sumFunc, 1) > > But you started with 4 kafka partitions? So they're definitely no longer > 1:1 True. I added the second arg because we were seeing multiple threads

Re: [discuss] ending support for Java 6?

2015-04-30 Thread Nicholas Chammas
I understand the concern about cutting out users who still use Java 6, and I don't have numbers about how many people are still using Java 6. But I want to say at a high level that I support deprecating older versions of stuff to reduce our maintenance burden and let us use more modern patterns in

Re: practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-04-30 Thread Cody Koeninger
In fact, you're using the 2 arg form of reduce by key to shrink it down to 1 partition reduceByKey(sumFunc, 1) But you started with 4 kafka partitions? So they're definitely no longer 1:1 On Thu, Apr 30, 2015 at 1:58 PM, Cody Koeninger wrote: > This is what I'm suggesting, in pseudocode > >

Re: Pickling error when attempting to add a method in pyspark

2015-04-30 Thread Stephen Boesch
Bumping this. Anyone of you having some familiarity with py4j interface in pyspark? thanks 2015-04-27 22:09 GMT-07:00 Stephen Boesch : > > My intention is to add pyspark support for certain mllib spark methods. I > have been unable to resolve pickling errors of the form > >Pyspark py4j Pi

[discuss] ending support for Java 6?

2015-04-30 Thread Reynold Xin
This has been discussed a few times in the past, but now Oracle has ended support for Java 6 for over a year, I wonder if we should just drop Java 6 support. There is one outstanding issue Tom has brought to my attention: PySpark on YARN doesn't work well with Java 7/8, but we have an outstanding

Re: practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-04-30 Thread Cody Koeninger
This is what I'm suggesting, in pseudocode rdd.mapPartitionsWithIndex { case (i, iter) => offset = offsets(i) result = yourReductionFunction(iter) transaction { save(result) save(offset) } }.foreach { (_: Nothing) => () } where yourReductionFunction is just normal scala co

Re: Regarding KryoSerialization in Spark

2015-04-30 Thread Sandy Ryza
Hi Twinkle, Registering the class makes it so that writeClass only writes out a couple bytes, instead of a full String of the class name. -Sandy On Thu, Apr 30, 2015 at 4:13 AM, twinkle sachdeva < twinkle.sachd...@gmail.com> wrote: > Hi, > > As per the code, KryoSerialization used writeClassAnd

Re: practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-04-30 Thread badgerpants
Cody Koeninger-2 wrote > What's your schema for the offset table, and what's the definition of > writeOffset ? The schema is the same as the one in your post: topic | partition| offset The writeOffset is nearly identical: def writeOffset(osr: OffsetRange)(implicit session: DBSession): Unit = {

Re: Drop column/s in DataFrame

2015-04-30 Thread Rakesh Chalasani
Sure, I will try sending a PR soon. On Thu, Apr 30, 2015 at 1:42 PM Reynold Xin wrote: > I filed a ticket: https://issues.apache.org/jira/browse/SPARK-7280 > > Would you like to give it a shot? > > > On Thu, Apr 30, 2015 at 10:22 AM, rakeshchalasani > wrote: > >> Hi All: >> >> Is there any plan

Re: Drop column/s in DataFrame

2015-04-30 Thread Reynold Xin
I filed a ticket: https://issues.apache.org/jira/browse/SPARK-7280 Would you like to give it a shot? On Thu, Apr 30, 2015 at 10:22 AM, rakeshchalasani wrote: > Hi All: > > Is there any plan to add "drop" column/s functionality in the data frame? > One can you "select" function to do so, but I

Re: practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-04-30 Thread Cody Koeninger
What's your schema for the offset table, and what's the definition of writeOffset ? What key are you reducing on? Maybe I'm misreading the code, but it looks like the per-partition offset is part of the key. If that's true then you could just do your reduction on each partition, rather than afte

Drop column/s in DataFrame

2015-04-30 Thread rakeshchalasani
Hi All: Is there any plan to add "drop" column/s functionality in the data frame? One can you "select" function to do so, but I find that tedious when only one or two columns in large dataframe are to be dropped. Pandas has this functionality, which I find handy when constructing feature vectors

practical usage of the new "exactly-once" supporting DirectKafkaInputDStream

2015-04-30 Thread badgerpants
We're a group of experienced backend developers who are fairly new to Spark Streaming (and Scala) and very interested in using the new (in 1.3) DirectKafkaInputDStream impl as part of the metrics reporting service we're building. Our flow involves reading in metric events, lightly modifying some o

Re: [discuss] DataFrame function namespacing

2015-04-30 Thread Ted Yu
IMHO I would go with choice #1 Cheers On Wed, Apr 29, 2015 at 10:03 PM, Reynold Xin wrote: > We definitely still have the name collision problem in SQL. > > On Wed, Apr 29, 2015 at 10:01 PM, Punyashloka Biswal < > punya.bis...@gmail.com > > wrote: > > > Do we still have to keep the names of the

Re: Is SQLContext thread-safe?

2015-04-30 Thread Michael Armbrust
Unfortunately, I think the SQLParser is not threadsafe. I would recommend using HiveQL. On Thu, Apr 30, 2015 at 4:07 AM, Wangfei (X) wrote: > actually this is a sql parse exception, are you sure your sql is right? > > 发自我的 iPhone > > > 在 2015年4月30日,18:50,"Haopu Wang" 写道: > > > > Hi, in a test

Spark standalone cluster mode operation

2015-04-30 Thread Akshat Aranya
Hi, I'm trying to figure out how the Spark standalone cluster mode works. Specifically, I'm looking at the code to see how the user's application jar makes it from the submission node, when it is on the local file system, to the driver and the executors. >From what I can see, spark-submit causes

withColumn is very slow with datasets with large number of columns

2015-04-30 Thread alexandre Clement
Hi all, I'm experimenting serious performance problem when using withColumn and dataset with large number of columns. It is very slow: on a dataset with 100 columns it takes a few seconds. The code snippet demonstrates the problem. val custs = Seq( Row(1, "Bob", 21, 80.5), Row(2, "Bobby", 21,

Re: withColumn is very slow with datasets with large number of columns

2015-04-30 Thread alexandre Clement
I have reported the issue on JIRA: https://issues.apache.org/jira/browse/SPARK-7276 On Thu, Apr 30, 2015 at 4:36 PM, alexandre Clement wrote: > Hi all, > > > I'm experimenting serious performance problem when using withColumn and > dataset with large number of columns. It is very slow: on a data

Regarding KryoSerialization in Spark

2015-04-30 Thread twinkle sachdeva
Hi, As per the code, KryoSerialization used writeClassAndObject method, which internally calls writeClass method, which will write the class of the object while serilization. As per the documentation in tuning page of spark, it says that registering the class will avoid that. Am I missing someth

Re: Is SQLContext thread-safe?

2015-04-30 Thread Wangfei (X)
actually this is a sql parse exception, are you sure your sql is right? 发自我的 iPhone > 在 2015年4月30日,18:50,"Haopu Wang" 写道: > > Hi, in a test on SparkSQL 1.3.0, multiple threads are doing select on a > same SQLContext instance, but below exception is thrown, so it looks > like SQLContext is NOT t

RE: Is SQLContext thread-safe?

2015-04-30 Thread Haopu Wang
Hi, in a test on SparkSQL 1.3.0, multiple threads are doing select on a same SQLContext instance, but below exception is thrown, so it looks like SQLContext is NOT thread safe? I think this is not the desired behavior. == java.lang.RuntimeException: [1.1] failure: ``insert'' expected but iden