date:20160622

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Mark Hamstra

No, that isn't necessarily enough to be considered a blocker. A blocker would be something that would have large negative effects on a significant number of people trying to run Spark. Arguably, something that prevents a minority of Spark developers from running unit tests on one OS does not qual

RE: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Ulanov, Alexander

Here is the fix https://github.com/apache/spark/pull/13868 From: Reynold Xin [mailto:r...@databricks.com] Sent: Wednesday, June 22, 2016 6:43 PM To: Ulanov, Alexander Cc: Mark Hamstra ; Marcelo Vanzin ; dev@spark.apache.org Subject: Re: [VOTE] Release Apache Spark 2.0.0 (RC1) Alex - if you have

why did spark2.0 Disallow ROW FORMAT and STORED AS (parquet | orc | avro etc.)

2016-06-22 Thread linxi zeng

Hi All, I have tried the spark sql of Spark branch-2.0 and countered an unexpected problem: Operation not allowed: ROW FORMAT DELIMITED is only compatible with 'textfile', not 'orc'(line 1, pos 0) the sql is like: CREATE TABLE IF NOT EXISTS test.test_orc ( ... ) PARTITIONED BY (xxx) ROW FOR

Re: Creating a python port for a Scala Spark Projeect

2016-06-22 Thread Daniel Imberman

Thank you Holden, I look forward to watching your talk! On Wed, Jun 22, 2016 at 7:12 PM Holden Karau wrote: > PySpark RDDs are (on the Java side) are essentially RDD of pickled objects > and mostly (but not entirely) opaque to the JVM. It is possible (by using > some internals) to pass a PySpark

Re: Creating a python port for a Scala Spark Projeect

2016-06-22 Thread Holden Karau

PySpark RDDs are (on the Java side) are essentially RDD of pickled objects and mostly (but not entirely) opaque to the JVM. It is possible (by using some internals) to pass a PySpark DataFrame to a Scala library (you may or may not find the talk I gave at Spark Summit useful https://www.youtube.com

Creating a python port for a Scala Spark Projeect

2016-06-22 Thread Daniel Imberman

Hi All, I've developed a spark module in scala that I would like to add a python port for. I want to be able to allow users to create a pyspark RDD and send it to my system. I've been looking into the pyspark source code as well as py4J and was wondering if there has been anything like this implem

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Reynold Xin

Alex - if you have access to a windows box, can you fix the issue? I'm not sure how many Spark contributors have windows boxes. On Wed, Jun 22, 2016 at 5:56 PM, Ulanov, Alexander wrote: > Spark Unit tests fail on Windows in Spark 2.0. It can be considered as > blocker since there are people tha

RE: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Ulanov, Alexander

Spark Unit tests fail on Windows in Spark 2.0. It can be considered as blocker since there are people that develop for Spark on Windows. The referenced issue is indeed Minor and has nothing to do with unit tests. From: Mark Hamstra [mailto:m...@clearstorydata.com] Sent: Wednesday, June 22, 2016

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Mark Hamstra

It's also marked as Minor, not Blocker. On Wed, Jun 22, 2016 at 4:07 PM, Marcelo Vanzin wrote: > On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander > wrote: > > -1 > > > > Spark Unit tests fail on Windows. Still not resolved, though marked as > > resolved. > > To be pedantic, it's marked as a d

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Marcelo Vanzin

On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander wrote: > -1 > > Spark Unit tests fail on Windows. Still not resolved, though marked as > resolved. To be pedantic, it's marked as a duplicate (https://issues.apache.org/jira/browse/SPARK-15899), which doesn't mean necessarily that it's fixed.

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Mark Hamstra

SPARK-15893 is resolved as a duplicate of SPARK-15899. SPARK-15899 is Unresolved. On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander wrote: > -1 > > Spark Unit tests fail on Windows. Still not resolved, though marked as > resolved. > > https://issues.apache.org/jira/browse/SPARK-15893 > > *From

RE: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Ulanov, Alexander

-1 Spark Unit tests fail on Windows. Still not resolved, though marked as resolved. https://issues.apache.org/jira/browse/SPARK-15893 From: Reynold Xin [mailto:r...@databricks.com] Sent: Tuesday, June 21, 2016 6:27 PM To: dev@spark.apache.org Subject: [VOTE] Release Apache Spark 2.0.0 (RC1) Please

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Krishna Sankar

+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 37:11 min mvn clean package -Pyarn -Phadoop-2.6 -DskipTests 2. Tested pyspark, mllib (iPython 4.0) 2.0 Spark version is 1.6.2 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Lasso Regression OK 2.

Re: Jar for Spark developement

2016-06-22 Thread Petar Zecevic

You can check out the Spark in Action book. In my (not so humble) opinion, it's very good for beginners. Petar (author) On 21.6.2016. 18:01, tesm...@gmail.com wrote: Hi, Beginner in Spark development. Took time to configure Eclipse + Scala. Is there any tutorial that can help beginners.

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sameer Agarwal

+1 On Wed, Jun 22, 2016 at 1:07 PM, Kousuke Saruta wrote: > +1 (non-binding) > > On 2016/06/23 4:53, Reynold Xin wrote: > > +1 myself > > > On Wed, Jun 22, 2016 at 12:19 PM, Sean McNamara < > sean.mcnam...@webtrends.com> wrote: > >> +1 >> >> On Jun 22, 2016, at 1:14 PM, Michael Armbrust >> wrot

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Kousuke Saruta

+1 (non-binding) On 2016/06/23 4:53, Reynold Xin wrote: +1 myself On Wed, Jun 22, 2016 at 12:19 PM, Sean McNamara mailto:sean.mcnam...@webtrends.com>> wrote: +1 On Jun 22, 2016, at 1:14 PM, Michael Armbrust mailto:mich...@databricks.com>> wrote: +1 On Wed, Jun 22, 2

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Reynold Xin

+1 myself On Wed, Jun 22, 2016 at 12:19 PM, Sean McNamara wrote: > +1 > > On Jun 22, 2016, at 1:14 PM, Michael Armbrust > wrote: > > +1 > > On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly > wrote: > >> +1 >> >> On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter >> wrote: >> >>> +1 This release pas

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sean McNamara

+1 On Jun 22, 2016, at 1:14 PM, Michael Armbrust mailto:mich...@databricks.com>> wrote: +1 On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly mailto:jonathaka...@gmail.com>> wrote: +1 On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter mailto:timhun...@databricks.com>> wrote: +1 This release passes al

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Michael Armbrust

+1 On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly wrote: > +1 > > On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter > wrote: > >> +1 This release passes all tests on the graphframes and tensorframes >> packages. >> >> On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger >> wrote: >> >>> If we're consid

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Jonathan Kelly

+1 On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter wrote: > +1 This release passes all tests on the graphframes and tensorframes > packages. > > On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger > wrote: > >> If we're considering backporting changes for the 0.8 kafka >> integration, I am sure there a

Re: Question about Bloom Filter in Spark 2.0

2016-06-22 Thread Jörn Franke

You should see at it both levels: there is one bloom filter for Orc data and one for data in-memory. It is already a good step towards an integration of format and in-memory representation for columnar data. > On 22 Jun 2016, at 14:01, BaiRan wrote: > > After building bloom filter on existi

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Tim Hunter

+1 This release passes all tests on the graphframes and tensorframes packages. On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger wrote: > If we're considering backporting changes for the 0.8 kafka > integration, I am sure there are people who would like to get > > https://issues.apache.org/jira/br

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Mark Grover

Yeah, I am +1 for including Kafka 0.10 integration as well. We had to wait for Kafka 0.10 because there were incompatibilities between the Kafka 0.9 and 0.10 API. And, yes, the code for 0.8.0 remains unchanged so there shouldn't be any regression for existing users. It's only new code for 0.10. Th

[build system] jenkins process wedged, need to do restart

2016-06-22 Thread shane knapp

of course, on my first day back from vacation, i notice that the jenkins process got wedged immediately upon my visiting the page. one quick jenkins/httpd restart later and we're back up and building. sorry for any inconvenience! shane

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Chris Fregly

+1 for 0.10 support. this is huge. On Wed, Jun 22, 2016 at 8:17 AM, Cody Koeninger wrote: > Luciano knows there are publicly available examples of how to use the > 0.10 connector, including TLS support, because he asked me about it > and I gave him a link > > > https://github.com/koeninger/kafk

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Cody Koeninger

Luciano knows there are publicly available examples of how to use the 0.10 connector, including TLS support, because he asked me about it and I gave him a link https://github.com/koeninger/kafka-exactly-once/blob/kafka-0.9/src/main/scala/example/TlsStream.scala If any committer at any time had sa

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Luciano Resende

On Wed, Jun 22, 2016 at 7:46 AM, Cody Koeninger wrote: > As far as I know the only thing blocking it at this point is lack of > committer review / approval. > > It's technically adding a new feature after spark code-freeze, but it > doesn't change existing code, and the kafka project didn't relea

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Sean Owen

Hm, I thought that was to be added for 2.0. Imran I know you may have been working alongside Mark on it; what do you think? TD / Reynold would you object to it for 2.0? On Wed, Jun 22, 2016 at 3:46 PM, Cody Koeninger wrote: > As far as I know the only thing blocking it at this point is lack of >

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Cody Koeninger

As far as I know the only thing blocking it at this point is lack of committer review / approval. It's technically adding a new feature after spark code-freeze, but it doesn't change existing code, and the kafka project didn't release 0.10 until the end of may. On Wed, Jun 22, 2016 at 9:39 AM, S

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Sean Owen

I profess ignorance again though I really should know by now, but, what's opposing that? I personally thought this was going to be in 2.0 and didn't kind of notice it wasn't ... On Wed, Jun 22, 2016 at 3:29 PM, Cody Koeninger wrote: > I don't have a vote, but I'd just like to reiterate that I thi

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Nicholas Chammas

For the clueless (like me): https://bahir.apache.org/#home Apache Bahir provides extensions to distributed analytic platforms such as Apache Spark. Initially Apache Bahir will contain streaming connectors that were a part of Apache Spark prior to version 2.0: - streaming-akka - streaming-

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Cody Koeninger

I don't have a vote, but I'd just like to reiterate that I think kafka 0.10 support should be added to a 2.0 release candidate; if not now, then well before release. - it's a completely standalone jar, so shouldn't break anyone who's using the existing 0.8 support - it's like the 5th highest voted

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Cody Koeninger

If we're considering backporting changes for the 0.8 kafka integration, I am sure there are people who would like to get https://issues.apache.org/jira/browse/SPARK-10963 into 1.6.x as well On Wed, Jun 22, 2016 at 7:41 AM, Sean Owen wrote: > Good call, probably worth back-porting, I'll try to d

Re: Spark internal Logging trait potential thread unsafe

2016-06-22 Thread Prajwal Tuladhar

Created a JIRA issue https://issues.apache.org/jira/browse/SPARK-16131 and PR @ https://github.com/apache/spark/pull/13842 On Fri, Jun 17, 2016 at 5:19 AM, Sean Owen wrote: > I think that's OK to change, yes. I don't see why it's necessary to > init log_ the way it is now. initializeLogIfNecessa

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sean Owen

Good call, probably worth back-porting, I'll try to do that. I don't think it blocks a release, but would be good to get into a next RC if any. On Wed, Jun 22, 2016 at 11:38 AM, Pete Robbins wrote: > This has failed on our 1.6 stream builds regularly. > (https://issues.apache.org/jira/browse/SPAR

Re: Question about Bloom Filter in Spark 2.0

2016-06-22 Thread BaiRan

After building bloom filter on existing data, does spark engine utilise bloom filter during query processing? Is there any plan about predicate push down by using bloom filter in ORC / Parquet? Thanks Ran > On 22 Jun, 2016, at 10:48 am, Reynold Xin wrote: > > SPARK-12818 is about building a bl

Spark Task failure with File segment length as negative

2016-06-22 Thread Priya Ch

Hi All, I am running Spark Application with 1.8TB of data (which is stored in Hive tables format). I am reading the data using HiveContect and processing it. The cluster has 5 nodes total, 25 cores per machine and 250Gb per node. I am launching the application with 25 executors with 5 cores each

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Pete Robbins

This has failed on our 1.6 stream builds regularly. ( https://issues.apache.org/jira/browse/SPARK-6005) looks fixed in 2.0? On Wed, 22 Jun 2016 at 11:15 Sean Owen wrote: > Oops, one more in the "does anybody else see this" department: > > - offset recovery *** FAILED *** > recoveredOffsetRange

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sean Owen

Oops, one more in the "does anybody else see this" department: - offset recovery *** FAILED *** recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time, Array[org.apache.spark.streaming.kafka.OffsetRange])) => earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.str

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sean Owen

I'm fairly convinced this error and others that appear timestamp related are an environment problem. This test and method have been present for several Spark versions, without change. I reviewed the logic and it seems sound, explicitly setting the time zone correctly. I am not sure why it behaves d

40 matches

Mail list logo