No, that isn't necessarily enough to be considered a blocker. A blocker
would be something that would have large negative effects on a significant
number of people trying to run Spark. Arguably, something that prevents a
minority of Spark developers from running unit tests on one OS does not
qual
Here is the fix https://github.com/apache/spark/pull/13868
From: Reynold Xin [mailto:r...@databricks.com]
Sent: Wednesday, June 22, 2016 6:43 PM
To: Ulanov, Alexander
Cc: Mark Hamstra ; Marcelo Vanzin
; dev@spark.apache.org
Subject: Re: [VOTE] Release Apache Spark 2.0.0 (RC1)
Alex - if you have
Hi All,
I have tried the spark sql of Spark branch-2.0 and countered an
unexpected problem:
Operation not allowed: ROW FORMAT DELIMITED is only compatible with
'textfile', not 'orc'(line 1, pos 0)
the sql is like:
CREATE TABLE IF NOT EXISTS test.test_orc
(
...
)
PARTITIONED BY (xxx)
ROW FOR
Thank you Holden, I look forward to watching your talk!
On Wed, Jun 22, 2016 at 7:12 PM Holden Karau wrote:
> PySpark RDDs are (on the Java side) are essentially RDD of pickled objects
> and mostly (but not entirely) opaque to the JVM. It is possible (by using
> some internals) to pass a PySpark
PySpark RDDs are (on the Java side) are essentially RDD of pickled objects
and mostly (but not entirely) opaque to the JVM. It is possible (by using
some internals) to pass a PySpark DataFrame to a Scala library (you may or
may not find the talk I gave at Spark Summit useful
https://www.youtube.com
Hi All,
I've developed a spark module in scala that I would like to add a python
port for. I want to be able to allow users to create a pyspark RDD and send
it to my system. I've been looking into the pyspark source code as well as
py4J and was wondering if there has been anything like this implem
Alex - if you have access to a windows box, can you fix the issue? I'm not
sure how many Spark contributors have windows boxes.
On Wed, Jun 22, 2016 at 5:56 PM, Ulanov, Alexander wrote:
> Spark Unit tests fail on Windows in Spark 2.0. It can be considered as
> blocker since there are people tha
Spark Unit tests fail on Windows in Spark 2.0. It can be considered as blocker
since there are people that develop for Spark on Windows. The referenced issue
is indeed Minor and has nothing to do with unit tests.
From: Mark Hamstra [mailto:m...@clearstorydata.com]
Sent: Wednesday, June 22, 2016
It's also marked as Minor, not Blocker.
On Wed, Jun 22, 2016 at 4:07 PM, Marcelo Vanzin wrote:
> On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander
> wrote:
> > -1
> >
> > Spark Unit tests fail on Windows. Still not resolved, though marked as
> > resolved.
>
> To be pedantic, it's marked as a d
On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander
wrote:
> -1
>
> Spark Unit tests fail on Windows. Still not resolved, though marked as
> resolved.
To be pedantic, it's marked as a duplicate
(https://issues.apache.org/jira/browse/SPARK-15899), which doesn't
mean necessarily that it's fixed.
SPARK-15893 is resolved as a duplicate of SPARK-15899. SPARK-15899 is
Unresolved.
On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander wrote:
> -1
>
> Spark Unit tests fail on Windows. Still not resolved, though marked as
> resolved.
>
> https://issues.apache.org/jira/browse/SPARK-15893
>
> *From
-1
Spark Unit tests fail on Windows. Still not resolved, though marked as resolved.
https://issues.apache.org/jira/browse/SPARK-15893
From: Reynold Xin [mailto:r...@databricks.com]
Sent: Tuesday, June 21, 2016 6:27 PM
To: dev@spark.apache.org
Subject: [VOTE] Release Apache Spark 2.0.0 (RC1)
Please
+1 (non-binding, of course)
1. Compiled OSX 10.10 (Yosemite) OK Total time: 37:11 min
mvn clean package -Pyarn -Phadoop-2.6 -DskipTests
2. Tested pyspark, mllib (iPython 4.0)
2.0 Spark version is 1.6.2
2.1. statistics (min,max,mean,Pearson,Spearman) OK
2.2. Linear/Ridge/Lasso Regression OK
2.
You can check out the Spark in Action book. In my (not so humble)
opinion, it's very good for beginners.
Petar (author)
On 21.6.2016. 18:01, tesm...@gmail.com wrote:
Hi,
Beginner in Spark development. Took time to configure Eclipse + Scala.
Is there any tutorial that can help beginners.
+1
On Wed, Jun 22, 2016 at 1:07 PM, Kousuke Saruta
wrote:
> +1 (non-binding)
>
> On 2016/06/23 4:53, Reynold Xin wrote:
>
> +1 myself
>
>
> On Wed, Jun 22, 2016 at 12:19 PM, Sean McNamara <
> sean.mcnam...@webtrends.com> wrote:
>
>> +1
>>
>> On Jun 22, 2016, at 1:14 PM, Michael Armbrust
>> wrot
+1 (non-binding)
On 2016/06/23 4:53, Reynold Xin wrote:
+1 myself
On Wed, Jun 22, 2016 at 12:19 PM, Sean McNamara
mailto:sean.mcnam...@webtrends.com>> wrote:
+1
On Jun 22, 2016, at 1:14 PM, Michael Armbrust
mailto:mich...@databricks.com>> wrote:
+1
On Wed, Jun 22, 2
+1 myself
On Wed, Jun 22, 2016 at 12:19 PM, Sean McNamara wrote:
> +1
>
> On Jun 22, 2016, at 1:14 PM, Michael Armbrust
> wrote:
>
> +1
>
> On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly
> wrote:
>
>> +1
>>
>> On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter
>> wrote:
>>
>>> +1 This release pas
+1
On Jun 22, 2016, at 1:14 PM, Michael Armbrust
mailto:mich...@databricks.com>> wrote:
+1
On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly
mailto:jonathaka...@gmail.com>> wrote:
+1
On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter
mailto:timhun...@databricks.com>> wrote:
+1 This release passes al
+1
On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly
wrote:
> +1
>
> On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter
> wrote:
>
>> +1 This release passes all tests on the graphframes and tensorframes
>> packages.
>>
>> On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger
>> wrote:
>>
>>> If we're consid
+1
On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter
wrote:
> +1 This release passes all tests on the graphframes and tensorframes
> packages.
>
> On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger
> wrote:
>
>> If we're considering backporting changes for the 0.8 kafka
>> integration, I am sure there a
You should see at it both levels: there is one bloom filter for Orc data and
one for data in-memory.
It is already a good step towards an integration of format and in-memory
representation for columnar data.
> On 22 Jun 2016, at 14:01, BaiRan wrote:
>
> After building bloom filter on existi
+1 This release passes all tests on the graphframes and tensorframes
packages.
On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger wrote:
> If we're considering backporting changes for the 0.8 kafka
> integration, I am sure there are people who would like to get
>
> https://issues.apache.org/jira/br
Yeah, I am +1 for including Kafka 0.10 integration as well. We had to wait
for Kafka 0.10 because there were incompatibilities between the Kafka 0.9
and 0.10 API. And, yes, the code for 0.8.0 remains unchanged so there
shouldn't be any regression for existing users. It's only new code for 0.10.
Th
of course, on my first day back from vacation, i notice that the
jenkins process got wedged immediately upon my visiting the page.
one quick jenkins/httpd restart later and we're back up and building.
sorry for any inconvenience!
shane
+1 for 0.10 support. this is huge.
On Wed, Jun 22, 2016 at 8:17 AM, Cody Koeninger wrote:
> Luciano knows there are publicly available examples of how to use the
> 0.10 connector, including TLS support, because he asked me about it
> and I gave him a link
>
>
> https://github.com/koeninger/kafk
Luciano knows there are publicly available examples of how to use the
0.10 connector, including TLS support, because he asked me about it
and I gave him a link
https://github.com/koeninger/kafka-exactly-once/blob/kafka-0.9/src/main/scala/example/TlsStream.scala
If any committer at any time had sa
On Wed, Jun 22, 2016 at 7:46 AM, Cody Koeninger wrote:
> As far as I know the only thing blocking it at this point is lack of
> committer review / approval.
>
> It's technically adding a new feature after spark code-freeze, but it
> doesn't change existing code, and the kafka project didn't relea
Hm, I thought that was to be added for 2.0. Imran I know you may have
been working alongside Mark on it; what do you think?
TD / Reynold would you object to it for 2.0?
On Wed, Jun 22, 2016 at 3:46 PM, Cody Koeninger wrote:
> As far as I know the only thing blocking it at this point is lack of
>
As far as I know the only thing blocking it at this point is lack of
committer review / approval.
It's technically adding a new feature after spark code-freeze, but it
doesn't change existing code, and the kafka project didn't release
0.10 until the end of may.
On Wed, Jun 22, 2016 at 9:39 AM, S
I profess ignorance again though I really should know by now, but,
what's opposing that? I personally thought this was going to be in 2.0
and didn't kind of notice it wasn't ...
On Wed, Jun 22, 2016 at 3:29 PM, Cody Koeninger wrote:
> I don't have a vote, but I'd just like to reiterate that I thi
For the clueless (like me):
https://bahir.apache.org/#home
Apache Bahir provides extensions to distributed analytic platforms such as
Apache Spark.
Initially Apache Bahir will contain streaming connectors that were a part
of Apache Spark prior to version 2.0:
- streaming-akka
- streaming-
I don't have a vote, but I'd just like to reiterate that I think kafka
0.10 support should be added to a 2.0 release candidate; if not now,
then well before release.
- it's a completely standalone jar, so shouldn't break anyone who's
using the existing 0.8 support
- it's like the 5th highest voted
If we're considering backporting changes for the 0.8 kafka
integration, I am sure there are people who would like to get
https://issues.apache.org/jira/browse/SPARK-10963
into 1.6.x as well
On Wed, Jun 22, 2016 at 7:41 AM, Sean Owen wrote:
> Good call, probably worth back-porting, I'll try to d
Created a JIRA issue https://issues.apache.org/jira/browse/SPARK-16131 and
PR @ https://github.com/apache/spark/pull/13842
On Fri, Jun 17, 2016 at 5:19 AM, Sean Owen wrote:
> I think that's OK to change, yes. I don't see why it's necessary to
> init log_ the way it is now. initializeLogIfNecessa
Good call, probably worth back-porting, I'll try to do that. I don't
think it blocks a release, but would be good to get into a next RC if
any.
On Wed, Jun 22, 2016 at 11:38 AM, Pete Robbins wrote:
> This has failed on our 1.6 stream builds regularly.
> (https://issues.apache.org/jira/browse/SPAR
After building bloom filter on existing data, does spark engine utilise bloom
filter during query processing?
Is there any plan about predicate push down by using bloom filter in ORC /
Parquet?
Thanks
Ran
> On 22 Jun, 2016, at 10:48 am, Reynold Xin wrote:
>
> SPARK-12818 is about building a bl
Hi All,
I am running Spark Application with 1.8TB of data (which is stored in Hive
tables format). I am reading the data using HiveContect and processing it.
The cluster has 5 nodes total, 25 cores per machine and 250Gb per node. I
am launching the application with 25 executors with 5 cores each
This has failed on our 1.6 stream builds regularly. (
https://issues.apache.org/jira/browse/SPARK-6005) looks fixed in 2.0?
On Wed, 22 Jun 2016 at 11:15 Sean Owen wrote:
> Oops, one more in the "does anybody else see this" department:
>
> - offset recovery *** FAILED ***
> recoveredOffsetRange
Oops, one more in the "does anybody else see this" department:
- offset recovery *** FAILED ***
recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time,
Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.str
I'm fairly convinced this error and others that appear timestamp
related are an environment problem. This test and method have been
present for several Spark versions, without change. I reviewed the
logic and it seems sound, explicitly setting the time zone correctly.
I am not sure why it behaves d
40 matches
Mail list logo