Re: [ANNOUNCE] Apache Spark 2.1.1

2017-05-03 Thread Ofir Manor
Looking good...
one small things - the documentation on the web site is still 2.1.0
Specifically, the home page has a link (under Documentation menu) labeled
Latest Release (Spark 2.1.1), but when I click it, I get the 2.1.0
documentation.

Ofir Manor

Co-Founder & CTO | Equalum

Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io

On Wed, May 3, 2017 at 1:18 AM, Michael Armbrust 
wrote:

> We are happy to announce the availability of Spark 2.1.1!
>
> Apache Spark 2.1.1 is a maintenance release, based on the branch-2.1
> maintenance branch of Spark. We strongly recommend all 2.1.x users to
> upgrade to this stable release.
>
> To download Apache Spark 2.1.1 visit http://spark.apache.org/
> downloads.html
>
> We would like to acknowledge all community members for contributing
> patches to this release.
>


Missing config property in documentation

2017-05-03 Thread Dongjin Lee
Hello. I found that the property 'spark.resultGetter.threads'[^1] is not
listed in the official documentation. I wonder whether it is intended or
just a mistake.

If it is not intended, it would be better to update the documentation. Do
you have any opinion?

Thanks,
Dongjin

[^1]:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala

-- 
*Dongjin Lee*



*A hitchhiker in the mathematical
world.facebook: www.facebook.com/dongjin.lee.kr
linkedin:
kr.linkedin.com/in/dongjinleekr
github:
github.com/dongjinleekr
twitter: www.twitter.com/dongjinleekr
*


Re: [ANNOUNCE] Apache Spark 2.1.1

2017-05-03 Thread Michael Armbrust
Thanks for flagging this.  There was a bug in the git replication
 that has been fixed.

On Wed, May 3, 2017 at 1:29 AM, Ofir Manor  wrote:

> Looking good...
> one small things - the documentation on the web site is still 2.1.0
> Specifically, the home page has a link (under Documentation menu) labeled
> Latest Release (Spark 2.1.1), but when I click it, I get the 2.1.0
> documentation.
>
> Ofir Manor
>
> Co-Founder & CTO | Equalum
>
> Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io
>
> On Wed, May 3, 2017 at 1:18 AM, Michael Armbrust 
> wrote:
>
>> We are happy to announce the availability of Spark 2.1.1!
>>
>> Apache Spark 2.1.1 is a maintenance release, based on the branch-2.1
>> maintenance branch of Spark. We strongly recommend all 2.1.x users to
>> upgrade to this stable release.
>>
>> To download Apache Spark 2.1.1 visit http://spark.apache.org/
>> downloads.html
>>
>> We would like to acknowledge all community members for contributing
>> patches to this release.
>>
>
>


unsubscribe

2017-05-03 Thread Nagesh sarvepalli



Francis Lau has shared a document on Google Docs with you

2017-05-03 Thread francis . lau
Francis Lau has invited you to view the following document:

Open in Docs



Trevor Grant has shared a document on Google Docs with you

2017-05-03 Thread trevor . d . grant
Trevor Grant has invited you to view the following document:

Open in Docs



Dataframe replace 'collect()' going in indefinite time loop

2017-05-03 Thread Saurabh Adhikary
final_schema_noise_data =
sqlContext.createDataFrame(noise_data_parts,noise_data_struct_schema)

for a_name in name_field_names:
 
final_schema_noise_data=final_schema_noise_data.withColumn(a_name,spaceDeleteUDF(a_name))
#--- till here final_schema_noise_data.collect() is working---
  for t in noise_chars:
final_schema_noise_data =
final_schema_noise_data.na.replace(t,'',a_name)
print a_name,t
#The above loop gets completed but final_schema_noise_data.collect() dos not
yield any result, cursor goes to next line & some processing goes on for
hours but no output.

#Before the inner for loop , the df.collect() gives output in secs & post
completion of the loop no output for hours. 
*Any known issue with the df.na.replace function ??*



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Dataframe-replace-collect-going-in-indefinite-time-loop-tp21492.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Apache Spark 2.2.0 (RC1)

2017-05-03 Thread Michael Armbrust
I'm going to -1 this given the number of small bug fixes that have gone
into the release branch.  I'll follow with another RC shortly.

On Tue, May 2, 2017 at 7:35 AM, Nick Pentreath 
wrote:

> I won't +1 just given that it seems certain there will be another RC and
> there are the outstanding ML QA blocker issues.
>
> But clean build and test for JVM and Python tests LGTM on CentOS Linux
> 7.2.1511, OpenJDK 1.8.0_111
>
>
> On Mon, 1 May 2017 at 22:42 Frank Austin Nothaft 
> wrote:
>
>> Hi Ryan,
>>
>> IMO, the problem is that the Spark Avro version conflicts with the
>> Parquet Avro version. As discussed upthread, I don’t think there’s a way to
>> *reliably *make sure that Avro 1.8 is on the classpath first while using
>> spark-submit. Relocating avro in our project wouldn’t solve the problem,
>> because the MethodNotFoundError is thrown from the internals of the
>> ParquetAvroOutputFormat, not from code in our project.
>>
>> Regards,
>>
>> Frank Austin Nothaft
>> fnoth...@berkeley.edu
>> fnoth...@eecs.berkeley.edu
>> 202-340-0466 <(202)%20340-0466>
>>
>> On May 1, 2017, at 12:33 PM, Ryan Blue  wrote:
>>
>> Michael, I think that the problem is with your classpath.
>>
>> Spark has a dependency to 1.7.7, which can't be changed. Your project is
>> what pulls in parquet-avro and transitively Avro 1.8. Spark has no runtime
>> dependency on Avro 1.8. It is understandably annoying that using the same
>> version of Parquet for your parquet-avro dependency is what causes your
>> project to depend on Avro 1.8, but Spark's dependencies aren't a problem
>> because its Parquet dependency doesn't bring in Avro.
>>
>> There are a few ways around this:
>> 1. Make sure Avro 1.8 is found in the classpath first
>> 2. Shade Avro 1.8 in your project (assuming Avro classes aren't shared)
>> 3. Use parquet-avro 1.8.1 in your project, which I think should work with
>> 1.8.2 and avoid the Avro change
>>
>> The work-around in Spark is for tests, which do use parquet-avro. We can
>> look at a Parquet 1.8.3 that avoids this issue, but I think this is
>> reasonable for the 2.2.0 release.
>>
>> rb
>>
>> On Mon, May 1, 2017 at 12:08 PM, Michael Heuer  wrote:
>>
>>> Please excuse me if I'm misunderstanding -- the problem is not with our
>>> library or our classpath.
>>>
>>> There is a conflict within Spark itself, in that Parquet 1.8.2 expects
>>> to find Avro 1.8.0 on the runtime classpath and sees 1.7.7 instead.  Spark
>>> already has to work around this for unit tests to pass.
>>>
>>>
>>>
>>> On Mon, May 1, 2017 at 2:00 PM, Ryan Blue  wrote:
>>>
 Thanks for the extra context, Frank. I agree that it sounds like your
 problem comes from the conflict between your Jars and what comes with
 Spark. Its the same concern that makes everyone shudder when anything has a
 public dependency on Jackson. :)

 What we usually do to get around situations like this is to relocate
 the problem library inside the shaded Jar. That way, Spark uses its version
 of Avro and your classes use a different version of Avro. This works if you
 don't need to share classes between the two. Would that work for your
 situation?

 rb

 On Mon, May 1, 2017 at 11:55 AM, Koert Kuipers 
 wrote:

> sounds like you are running into the fact that you cannot really put
> your classes before spark's on classpath? spark's switches to support this
> never really worked for me either.
>
> inability to control the classpath + inconsistent jars => trouble ?
>
> On Mon, May 1, 2017 at 2:36 PM, Frank Austin Nothaft <
> fnoth...@berkeley.edu> wrote:
>
>> Hi Ryan,
>>
>> We do set Avro to 1.8 in our downstream project. We also set Spark as
>> a provided dependency, and build an überjar. We run via spark-submit, 
>> which
>> builds the classpath with our überjar and all of the Spark deps. This 
>> leads
>> to avro 1.7.1 getting picked off of the classpath at runtime, which 
>> causes
>> the no such method exception to occur.
>>
>> Regards,
>>
>> Frank Austin Nothaft
>> fnoth...@berkeley.edu
>> fnoth...@eecs.berkeley.edu
>> 202-340-0466 <(202)%20340-0466>
>>
>> On May 1, 2017, at 11:31 AM, Ryan Blue  wrote:
>>
>> Frank,
>>
>> The issue you're running into is caused by using parquet-avro with
>> Avro 1.7. Can't your downstream project set the Avro dependency to 1.8?
>> Spark can't update Avro because it is a breaking change that would force
>> users to rebuilt specific Avro classes in some cases. But you should be
>> free to use Avro 1.8 to avoid the problem.
>>
>> On Mon, May 1, 2017 at 11:08 AM, Frank Austin Nothaft <
>> fnoth...@berkeley.edu> wrote:
>>
>>> Hi Ryan et al,
>>>
>>> The issue we’ve seen using a build of the Spark 2.2.0 branch from a
>>> downstream project is that parquet-avro uses one of the new Avro 1.8.0
>>> methods, and you get a N

[CFP] DataWorks Summit/Hadoop Summit Sydney - Call for abstracts

2017-05-03 Thread Yanbo Liang
The Australia/Pacific version of DataWorks Summit is in Sydney this year,
September 20-21. This is a great place to talk about work you are doing in
Apache Spark or how you are using Spark. Information on submitting an
abstract is at
https://dataworkssummit.com/sydney-2017/abstracts/submit-abstract/



Tracks:

Apache Hadoop

Apache Spark and Data Science

Cloud and Applications

Data Processing and Warehousing

Enterprise Adoption

IoT and Streaming

Operations, Governance and Security



Deadline: Friday, May 26th, 2017.