Spark Compatibility with Java 11

2020-07-14 Thread Ankur Mittal
Hi,

I am using Spark 2.X and need to execute Java 11 .Its not able to execute
Java 11 using Spark 2.X.

Is there any way we can use Java 11 with Spark2.X?

Has this issue been resolved  in Spark 3.0 ?


-- 
Regards
Ankur Mittal


Re: Spark Compatibility with Java 11

2020-07-14 Thread Prashant Sharma
Hi Ankur,

Java 11 support was added in Spark 3.0.
https://issues.apache.org/jira/browse/SPARK-24417

Thanks,


On Tue, Jul 14, 2020 at 6:12 PM Ankur Mittal 
wrote:

> Hi,
>
> I am using Spark 2.X and need to execute Java 11 .Its not able to execute
> Java 11 using Spark 2.X.
>
> Is there any way we can use Java 11 with Spark2.X?
>
> Has this issue been resolved  in Spark 3.0 ?
>
>
> --
> Regards
> Ankur Mittal
>
>


Re: Issue in parallelization of CNN model using spark

2020-07-14 Thread Patrick McCarthy
Please don't advocate for piracy, this book is not freely available.

I own it and it's wonderful, Mr. Géron deserves to benefit from it.

On Mon, Jul 13, 2020 at 9:59 PM Anwar AliKhan 
wrote:

>  link to a free book  which may be useful.
>
> Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow
> Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien
> Géron
>
> https://bit.ly/2zxueGt
>
>
>
>
>
>  13 Jul 2020, 15:18 Sean Owen,  wrote:
>
>> There is a multilayer perceptron implementation in Spark ML, but
>> that's not what you're looking for.
>> To parallelize model training developed using standard libraries like
>> Keras, use Horovod from Uber.
>> https://horovod.readthedocs.io/en/stable/spark_include.html
>>
>> On Mon, Jul 13, 2020 at 6:59 AM Mukhtaj Khan  wrote:
>> >
>> > Dear Spark User
>> >
>> > I am trying to parallelize the CNN (convolutional neural network) model
>> using spark. I have developed the model using python and Keras library. The
>> model works fine on a single machine but when we try on multiple machines,
>> the execution time remains the same as sequential.
>> > Could you please tell me that there is any built-in library for CNN to
>> parallelize in spark framework. Moreover, MLLIB does not have any support
>> for CNN.
>> > Best regards
>> > Mukhtaj
>> >
>> >
>> >
>> >
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>

-- 


*Patrick McCarthy  *

Senior Data Scientist, Machine Learning Engineering

Dstillery

470 Park Ave South, 17th Floor, NYC 10016


Re: Spark Compatibility with Java 11

2020-07-14 Thread Ankur Mittal
Thanks a lot.

On Tue, Jul 14, 2020 at 12:51 PM Prashant Sharma 
wrote:

> Hi Ankur,
>
> Java 11 support was added in Spark 3.0.
> https://issues.apache.org/jira/browse/SPARK-24417
>
> Thanks,
>
>
> On Tue, Jul 14, 2020 at 6:12 PM Ankur Mittal 
> wrote:
>
>> Hi,
>>
>> I am using Spark 2.X and need to execute Java 11 .Its not able to execute
>> Java 11 using Spark 2.X.
>>
>> Is there any way we can use Java 11 with Spark2.X?
>>
>> Has this issue been resolved  in Spark 3.0 ?
>>
>>
>> --
>> Regards
>> Ankur Mittal
>>
>>

-- 
Regards
Ankur Mittal
*+91-8447899504*


Re: Issue in parallelization of CNN model using spark

2020-07-14 Thread Anwar AliKhan
Please note It is freely available because it is an early unedited raw
edition.
It is not 100% complete , it is not entirely same as yours.
So it is not piracy.
I agree it is a good book.







On Tue, 14 Jul 2020, 14:30 Patrick McCarthy, 
wrote:

> Please don't advocate for piracy, this book is not freely available.
>
> I own it and it's wonderful, Mr. Géron deserves to benefit from it.
>
> On Mon, Jul 13, 2020 at 9:59 PM Anwar AliKhan 
> wrote:
>
>>  link to a free book  which may be useful.
>>
>> Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow
>> Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien
>> Géron
>>
>> https://bit.ly/2zxueGt
>>
>>
>>
>>
>>
>>  13 Jul 2020, 15:18 Sean Owen,  wrote:
>>
>>> There is a multilayer perceptron implementation in Spark ML, but
>>> that's not what you're looking for.
>>> To parallelize model training developed using standard libraries like
>>> Keras, use Horovod from Uber.
>>> https://horovod.readthedocs.io/en/stable/spark_include.html
>>>
>>> On Mon, Jul 13, 2020 at 6:59 AM Mukhtaj Khan 
>>> wrote:
>>> >
>>> > Dear Spark User
>>> >
>>> > I am trying to parallelize the CNN (convolutional neural network)
>>> model using spark. I have developed the model using python and Keras
>>> library. The model works fine on a single machine but when we try on
>>> multiple machines, the execution time remains the same as sequential.
>>> > Could you please tell me that there is any built-in library for CNN to
>>> parallelize in spark framework. Moreover, MLLIB does not have any support
>>> for CNN.
>>> > Best regards
>>> > Mukhtaj
>>> >
>>> >
>>> >
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>
> --
>
>
> *Patrick McCarthy  *
>
> Senior Data Scientist, Machine Learning Engineering
>
> Dstillery
>
> 470 Park Ave South, 17th Floor, NYC 10016
>


Re: Issue in parallelization of CNN model using spark

2020-07-14 Thread Sean Owen
It is still copyrighted material, no matter its state of editing. Yes,
you should not be sharing this on the internet.

On Tue, Jul 14, 2020 at 9:46 AM Anwar AliKhan  wrote:
>
> Please note It is freely available because it is an early unedited raw 
> edition.
> It is not 100% complete , it is not entirely same as yours.
> So it is not piracy.
> I agree it is a good book.
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Issue in parallelization of CNN model using spark

2020-07-14 Thread Anwar AliKhan
Ok, thanks.
You can buy it here

https://www.amazon.com/s?k=hands+on+machine+learning+with+scikit-learn+and+tensorflow+2&crid=2U0P9XVIJ790T&sprefix=Hands+on+machine+%2Caps%2C246&ref=nb_sb_ss_i_1_17

This book is like an accompaniment to the Andrew Ng course on coursera.
It uses exact same mathematical notations , examples etc. so it is a smooth
transition from that courses.




On Tue, 14 Jul 2020, 15:52 Sean Owen,  wrote:

> It is still copyrighted material, no matter its state of editing. Yes,
> you should not be sharing this on the internet.
>
> On Tue, Jul 14, 2020 at 9:46 AM Anwar AliKhan 
> wrote:
> >
> > Please note It is freely available because it is an early unedited raw
> edition.
> > It is not 100% complete , it is not entirely same as yours.
> > So it is not piracy.
> > I agree it is a good book.
> >
>


Mock spark reads and writes

2020-07-14 Thread Dark Crusader
Sorry I wasn't very clear in my last email.

I have a function like this:

def main( read_file):
df = spark.read.csv(read_file)
** Some other code **
df.write.csv(path)

Which I need to write a unit test for.
Would pythons unittest mock help me here?

When I googled this, I mostly see that we shouldn't mock these reads and
writes, but this doesn't solve the problem of how I unittest helper
functions/main method that will have to read and write files.

An example of the proper way to do this in python would be really helpful.

Thanks a lot.


Kotlin Spark API

2020-07-14 Thread Maria Khalusova
Hi folks,

We would love your feedback on the new Kotlin Spark API that we are working
on: https://github.com/JetBrains/kotlin-spark-api.

Why Kotlin Spark API? Kotlin developers can already use Kotlin with the
existing Apache Spark Java API, however they cannot take full advantage of
Kotlin language features. With Kotlin Spark API, you can use Kotlin data
classes and lambda expressions.

The API also adds some helpful extension functions. For example, you can
use `withCached` to perform arbitrary transformations on a Dataset and not
worry about the Dataset unpersisting at the end.

If you like Kotlin and would like to try the API, we've prepared a Quick
Start Guide to help you set up all the needed dependencies in no time using
either Maven or Gradle:
https://github.com/JetBrains/kotlin-spark-api/blob/master/docs/quick-start-guide.md

In the repo, you’ll also find a few code examples to get an idea of what
the API looks like:
https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples

We’d love to see your feedback in the project’s GitHub issues:
https://github.com/JetBrains/kotlin-spark-api/issues.


Thanks!


Re: Kotlin Spark API

2020-07-14 Thread Anwar AliKhan
Is kotlin another new language ?

GRADY BOOCH;  The United States Department of defence (DOD) is perhaps the
largest user of computers in the world. By the mid-1970s, software
development for its systems had reached crisis proportions: projects were
often late, over budget and they often failed to meet their stated
requirements. It was evident that the problems would only worsen as
software development costs continued to rise exponentially. To help
resolve these
problems which were further compounded by the proliferation of hundreds of
different languages. The DOD sponsored the development of a single, common
high order programming language. The winning design was originally called
the Green Language (so called because of its team colour code during the
competition), and was renamed ADA



On Tue, 14 Jul 2020, 18:42 Maria Khalusova,  wrote:

> Hi folks,
>
> We would love your feedback on the new Kotlin Spark API that we are
> working on: https://github.com/JetBrains/kotlin-spark-api.
>
> Why Kotlin Spark API? Kotlin developers can already use Kotlin with the
> existing Apache Spark Java API, however they cannot take full advantage of
> Kotlin language features. With Kotlin Spark API, you can use Kotlin data
> classes and lambda expressions.
>
> The API also adds some helpful extension functions. For example, you can
> use `withCached` to perform arbitrary transformations on a Dataset and not
> worry about the Dataset unpersisting at the end.
>
> If you like Kotlin and would like to try the API, we've prepared a Quick
> Start Guide to help you set up all the needed dependencies in no time using
> either Maven or Gradle:
> https://github.com/JetBrains/kotlin-spark-api/blob/master/docs/quick-start-guide.md
>
> In the repo, you’ll also find a few code examples to get an idea of what
> the API looks like:
> https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples
>
> We’d love to see your feedback in the project’s GitHub issues:
> https://github.com/JetBrains/kotlin-spark-api/issues.
>
>
> Thanks!
>
>
>


Re: Kotlin Spark API

2020-07-14 Thread Stephen Boesch
I started with scala/spark in 2012 and scala has been my go-to language for
six years. But I heartily applaud this direction. Kotlin is more like a
simplified Scala - with the benefits that brings - than a simplified java.
I particularly like the simplified / streamlined collections classes.

Really looking forward to this development.

On Tue, 14 Jul 2020 at 10:42, Maria Khalusova  wrote:

> Hi folks,
>
> We would love your feedback on the new Kotlin Spark API that we are
> working on: https://github.com/JetBrains/kotlin-spark-api.
>
> Why Kotlin Spark API? Kotlin developers can already use Kotlin with the
> existing Apache Spark Java API, however they cannot take full advantage of
> Kotlin language features. With Kotlin Spark API, you can use Kotlin data
> classes and lambda expressions.
>
> The API also adds some helpful extension functions. For example, you can
> use `withCached` to perform arbitrary transformations on a Dataset and not
> worry about the Dataset unpersisting at the end.
>
> If you like Kotlin and would like to try the API, we've prepared a Quick
> Start Guide to help you set up all the needed dependencies in no time using
> either Maven or Gradle:
> https://github.com/JetBrains/kotlin-spark-api/blob/master/docs/quick-start-guide.md
>
> In the repo, you’ll also find a few code examples to get an idea of what
> the API looks like:
> https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples
>
> We’d love to see your feedback in the project’s GitHub issues:
> https://github.com/JetBrains/kotlin-spark-api/issues.
>
>
> Thanks!
>
>
>


Re: Kotlin Spark API

2020-07-14 Thread Stephen Boesch
I just looked at the examples.
https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples
These look v nice!  V concise yet flexible.  I like the ability to do
inline *side-effects.  *E.g. caching or printing or showDs()

package org.jetbrains.spark.api.examples
import org.apache.spark.sql.Row
import org.jetbrains.spark.api.*

fun main() {
withSpark {
val sd = dsOf(1, 2, 3)
sd.createOrReplaceTempView("ds")
spark.sql("select * from ds")
.withCached {
println("asList: ${toList()}")
println("asArray: ${toArray().contentToString()}")
this
}
.to()
.withCached {
println("typed collect: " + (collect() as
Array).contentToString())
println("type collectAsList: " + collectAsList())
}

dsOf(1, 2, 3)
.map { c(it, it + 1, it + 2) }
.to()
.select("_1")
.collectAsList()
.forEach { println(it) }
}
}


So that shows some of the niceness of kotlin: intuitive type conversion
`to`/`to` and `dsOf( list)`- and also the inlining of the side
effects. Overall concise and pleasant to read.


On Tue, 14 Jul 2020 at 12:18, Stephen Boesch  wrote:

> I started with scala/spark in 2012 and scala has been my go-to language
> for six years. But I heartily applaud this direction. Kotlin is more like a
> simplified Scala - with the benefits that brings - than a simplified java.
> I particularly like the simplified / streamlined collections classes.
>
> Really looking forward to this development.
>
> On Tue, 14 Jul 2020 at 10:42, Maria Khalusova  wrote:
>
>> Hi folks,
>>
>> We would love your feedback on the new Kotlin Spark API that we are
>> working on: https://github.com/JetBrains/kotlin-spark-api.
>>
>> Why Kotlin Spark API? Kotlin developers can already use Kotlin with the
>> existing Apache Spark Java API, however they cannot take full advantage of
>> Kotlin language features. With Kotlin Spark API, you can use Kotlin data
>> classes and lambda expressions.
>>
>> The API also adds some helpful extension functions. For example, you can
>> use `withCached` to perform arbitrary transformations on a Dataset and not
>> worry about the Dataset unpersisting at the end.
>>
>> If you like Kotlin and would like to try the API, we've prepared a Quick
>> Start Guide to help you set up all the needed dependencies in no time using
>> either Maven or Gradle:
>> https://github.com/JetBrains/kotlin-spark-api/blob/master/docs/quick-start-guide.md
>>
>> In the repo, you’ll also find a few code examples to get an idea of what
>> the API looks like:
>> https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples
>>
>> We’d love to see your feedback in the project’s GitHub issues:
>> https://github.com/JetBrains/kotlin-spark-api/issues.
>>
>>
>> Thanks!
>>
>>
>>


Re: Kotlin Spark API

2020-07-14 Thread Pasha Finkelshteyn
Hi Stephen,

Thank you so much for finding time for looking at our examples! Yes, we've 
tried to implement as clean design of API as possible and are constantly 
looking for ways to make it even more readable, clear and friendly.

And as Maria already stated we welcome any feedback!

On 20/07/14 01:55PM, Stephen Boesch wrote:
> I just looked at the examples.
> https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples
> These look v nice!  V concise yet flexible.  I like the ability to do
> inline *side-effects.  *E.g. caching or printing or showDs()
> 
> package org.jetbrains.spark.api.examples
> import org.apache.spark.sql.Row
> import org.jetbrains.spark.api.*
> 
> fun main() {
> withSpark {
> val sd = dsOf(1, 2, 3)
> sd.createOrReplaceTempView("ds")
> spark.sql("select * from ds")
> .withCached {
> println("asList: ${toList()}")
> println("asArray: ${toArray().contentToString()}")
> this
> }
> .to()
> .withCached {
> println("typed collect: " + (collect() as
> Array).contentToString())
> println("type collectAsList: " + collectAsList())
> }
> 
> dsOf(1, 2, 3)
> .map { c(it, it + 1, it + 2) }
> .to()
> .select("_1")
> .collectAsList()
> .forEach { println(it) }
> }
> }
> 
> 
> So that shows some of the niceness of kotlin: intuitive type conversion
> `to`/`to` and `dsOf( list)`- and also the inlining of the side
> effects. Overall concise and pleasant to read.
> 
> 
> On Tue, 14 Jul 2020 at 12:18, Stephen Boesch  wrote:
> 
> > I started with scala/spark in 2012 and scala has been my go-to language
> > for six years. But I heartily applaud this direction. Kotlin is more like a
> > simplified Scala - with the benefits that brings - than a simplified java.
> > I particularly like the simplified / streamlined collections classes.
> >
> > Really looking forward to this development.
> >
> > On Tue, 14 Jul 2020 at 10:42, Maria Khalusova  wrote:
> >
> >> Hi folks,
> >>
> >> We would love your feedback on the new Kotlin Spark API that we are
> >> working on: https://github.com/JetBrains/kotlin-spark-api.
> >>
> >> Why Kotlin Spark API? Kotlin developers can already use Kotlin with the
> >> existing Apache Spark Java API, however they cannot take full advantage of
> >> Kotlin language features. With Kotlin Spark API, you can use Kotlin data
> >> classes and lambda expressions.
> >>
> >> The API also adds some helpful extension functions. For example, you can
> >> use `withCached` to perform arbitrary transformations on a Dataset and not
> >> worry about the Dataset unpersisting at the end.
> >>
> >> If you like Kotlin and would like to try the API, we've prepared a Quick
> >> Start Guide to help you set up all the needed dependencies in no time using
> >> either Maven or Gradle:
> >> https://github.com/JetBrains/kotlin-spark-api/blob/master/docs/quick-start-guide.md
> >>
> >> In the repo, you’ll also find a few code examples to get an idea of what
> >> the API looks like:
> >> https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples
> >>
> >> We’d love to see your feedback in the project’s GitHub issues:
> >> https://github.com/JetBrains/kotlin-spark-api/issues.
> >>
> >>
> >> Thanks!
> >>
> >>
> >>

-- 
Regards,
Pasha

Big Data Tools @ JetBrains


signature.asc
Description: PGP signature