Spark Compatibility with Java 11
Hi, I am using Spark 2.X and need to execute Java 11 .Its not able to execute Java 11 using Spark 2.X. Is there any way we can use Java 11 with Spark2.X? Has this issue been resolved in Spark 3.0 ? -- Regards Ankur Mittal
Re: Spark Compatibility with Java 11
Hi Ankur, Java 11 support was added in Spark 3.0. https://issues.apache.org/jira/browse/SPARK-24417 Thanks, On Tue, Jul 14, 2020 at 6:12 PM Ankur Mittal wrote: > Hi, > > I am using Spark 2.X and need to execute Java 11 .Its not able to execute > Java 11 using Spark 2.X. > > Is there any way we can use Java 11 with Spark2.X? > > Has this issue been resolved in Spark 3.0 ? > > > -- > Regards > Ankur Mittal > >
Re: Issue in parallelization of CNN model using spark
Please don't advocate for piracy, this book is not freely available. I own it and it's wonderful, Mr. Géron deserves to benefit from it. On Mon, Jul 13, 2020 at 9:59 PM Anwar AliKhan wrote: > link to a free book which may be useful. > > Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow > Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien > Géron > > https://bit.ly/2zxueGt > > > > > > 13 Jul 2020, 15:18 Sean Owen, wrote: > >> There is a multilayer perceptron implementation in Spark ML, but >> that's not what you're looking for. >> To parallelize model training developed using standard libraries like >> Keras, use Horovod from Uber. >> https://horovod.readthedocs.io/en/stable/spark_include.html >> >> On Mon, Jul 13, 2020 at 6:59 AM Mukhtaj Khan wrote: >> > >> > Dear Spark User >> > >> > I am trying to parallelize the CNN (convolutional neural network) model >> using spark. I have developed the model using python and Keras library. The >> model works fine on a single machine but when we try on multiple machines, >> the execution time remains the same as sequential. >> > Could you please tell me that there is any built-in library for CNN to >> parallelize in spark framework. Moreover, MLLIB does not have any support >> for CNN. >> > Best regards >> > Mukhtaj >> > >> > >> > >> > >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> -- *Patrick McCarthy * Senior Data Scientist, Machine Learning Engineering Dstillery 470 Park Ave South, 17th Floor, NYC 10016
Re: Spark Compatibility with Java 11
Thanks a lot. On Tue, Jul 14, 2020 at 12:51 PM Prashant Sharma wrote: > Hi Ankur, > > Java 11 support was added in Spark 3.0. > https://issues.apache.org/jira/browse/SPARK-24417 > > Thanks, > > > On Tue, Jul 14, 2020 at 6:12 PM Ankur Mittal > wrote: > >> Hi, >> >> I am using Spark 2.X and need to execute Java 11 .Its not able to execute >> Java 11 using Spark 2.X. >> >> Is there any way we can use Java 11 with Spark2.X? >> >> Has this issue been resolved in Spark 3.0 ? >> >> >> -- >> Regards >> Ankur Mittal >> >> -- Regards Ankur Mittal *+91-8447899504*
Re: Issue in parallelization of CNN model using spark
Please note It is freely available because it is an early unedited raw edition. It is not 100% complete , it is not entirely same as yours. So it is not piracy. I agree it is a good book. On Tue, 14 Jul 2020, 14:30 Patrick McCarthy, wrote: > Please don't advocate for piracy, this book is not freely available. > > I own it and it's wonderful, Mr. Géron deserves to benefit from it. > > On Mon, Jul 13, 2020 at 9:59 PM Anwar AliKhan > wrote: > >> link to a free book which may be useful. >> >> Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow >> Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien >> Géron >> >> https://bit.ly/2zxueGt >> >> >> >> >> >> 13 Jul 2020, 15:18 Sean Owen, wrote: >> >>> There is a multilayer perceptron implementation in Spark ML, but >>> that's not what you're looking for. >>> To parallelize model training developed using standard libraries like >>> Keras, use Horovod from Uber. >>> https://horovod.readthedocs.io/en/stable/spark_include.html >>> >>> On Mon, Jul 13, 2020 at 6:59 AM Mukhtaj Khan >>> wrote: >>> > >>> > Dear Spark User >>> > >>> > I am trying to parallelize the CNN (convolutional neural network) >>> model using spark. I have developed the model using python and Keras >>> library. The model works fine on a single machine but when we try on >>> multiple machines, the execution time remains the same as sequential. >>> > Could you please tell me that there is any built-in library for CNN to >>> parallelize in spark framework. Moreover, MLLIB does not have any support >>> for CNN. >>> > Best regards >>> > Mukhtaj >>> > >>> > >>> > >>> > >>> >>> - >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>> >>> > > -- > > > *Patrick McCarthy * > > Senior Data Scientist, Machine Learning Engineering > > Dstillery > > 470 Park Ave South, 17th Floor, NYC 10016 >
Re: Issue in parallelization of CNN model using spark
It is still copyrighted material, no matter its state of editing. Yes, you should not be sharing this on the internet. On Tue, Jul 14, 2020 at 9:46 AM Anwar AliKhan wrote: > > Please note It is freely available because it is an early unedited raw > edition. > It is not 100% complete , it is not entirely same as yours. > So it is not piracy. > I agree it is a good book. > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Issue in parallelization of CNN model using spark
Ok, thanks. You can buy it here https://www.amazon.com/s?k=hands+on+machine+learning+with+scikit-learn+and+tensorflow+2&crid=2U0P9XVIJ790T&sprefix=Hands+on+machine+%2Caps%2C246&ref=nb_sb_ss_i_1_17 This book is like an accompaniment to the Andrew Ng course on coursera. It uses exact same mathematical notations , examples etc. so it is a smooth transition from that courses. On Tue, 14 Jul 2020, 15:52 Sean Owen, wrote: > It is still copyrighted material, no matter its state of editing. Yes, > you should not be sharing this on the internet. > > On Tue, Jul 14, 2020 at 9:46 AM Anwar AliKhan > wrote: > > > > Please note It is freely available because it is an early unedited raw > edition. > > It is not 100% complete , it is not entirely same as yours. > > So it is not piracy. > > I agree it is a good book. > > >
Mock spark reads and writes
Sorry I wasn't very clear in my last email. I have a function like this: def main( read_file): df = spark.read.csv(read_file) ** Some other code ** df.write.csv(path) Which I need to write a unit test for. Would pythons unittest mock help me here? When I googled this, I mostly see that we shouldn't mock these reads and writes, but this doesn't solve the problem of how I unittest helper functions/main method that will have to read and write files. An example of the proper way to do this in python would be really helpful. Thanks a lot.
Kotlin Spark API
Hi folks, We would love your feedback on the new Kotlin Spark API that we are working on: https://github.com/JetBrains/kotlin-spark-api. Why Kotlin Spark API? Kotlin developers can already use Kotlin with the existing Apache Spark Java API, however they cannot take full advantage of Kotlin language features. With Kotlin Spark API, you can use Kotlin data classes and lambda expressions. The API also adds some helpful extension functions. For example, you can use `withCached` to perform arbitrary transformations on a Dataset and not worry about the Dataset unpersisting at the end. If you like Kotlin and would like to try the API, we've prepared a Quick Start Guide to help you set up all the needed dependencies in no time using either Maven or Gradle: https://github.com/JetBrains/kotlin-spark-api/blob/master/docs/quick-start-guide.md In the repo, you’ll also find a few code examples to get an idea of what the API looks like: https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples We’d love to see your feedback in the project’s GitHub issues: https://github.com/JetBrains/kotlin-spark-api/issues. Thanks!
Re: Kotlin Spark API
Is kotlin another new language ? GRADY BOOCH; The United States Department of defence (DOD) is perhaps the largest user of computers in the world. By the mid-1970s, software development for its systems had reached crisis proportions: projects were often late, over budget and they often failed to meet their stated requirements. It was evident that the problems would only worsen as software development costs continued to rise exponentially. To help resolve these problems which were further compounded by the proliferation of hundreds of different languages. The DOD sponsored the development of a single, common high order programming language. The winning design was originally called the Green Language (so called because of its team colour code during the competition), and was renamed ADA On Tue, 14 Jul 2020, 18:42 Maria Khalusova, wrote: > Hi folks, > > We would love your feedback on the new Kotlin Spark API that we are > working on: https://github.com/JetBrains/kotlin-spark-api. > > Why Kotlin Spark API? Kotlin developers can already use Kotlin with the > existing Apache Spark Java API, however they cannot take full advantage of > Kotlin language features. With Kotlin Spark API, you can use Kotlin data > classes and lambda expressions. > > The API also adds some helpful extension functions. For example, you can > use `withCached` to perform arbitrary transformations on a Dataset and not > worry about the Dataset unpersisting at the end. > > If you like Kotlin and would like to try the API, we've prepared a Quick > Start Guide to help you set up all the needed dependencies in no time using > either Maven or Gradle: > https://github.com/JetBrains/kotlin-spark-api/blob/master/docs/quick-start-guide.md > > In the repo, you’ll also find a few code examples to get an idea of what > the API looks like: > https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples > > We’d love to see your feedback in the project’s GitHub issues: > https://github.com/JetBrains/kotlin-spark-api/issues. > > > Thanks! > > >
Re: Kotlin Spark API
I started with scala/spark in 2012 and scala has been my go-to language for six years. But I heartily applaud this direction. Kotlin is more like a simplified Scala - with the benefits that brings - than a simplified java. I particularly like the simplified / streamlined collections classes. Really looking forward to this development. On Tue, 14 Jul 2020 at 10:42, Maria Khalusova wrote: > Hi folks, > > We would love your feedback on the new Kotlin Spark API that we are > working on: https://github.com/JetBrains/kotlin-spark-api. > > Why Kotlin Spark API? Kotlin developers can already use Kotlin with the > existing Apache Spark Java API, however they cannot take full advantage of > Kotlin language features. With Kotlin Spark API, you can use Kotlin data > classes and lambda expressions. > > The API also adds some helpful extension functions. For example, you can > use `withCached` to perform arbitrary transformations on a Dataset and not > worry about the Dataset unpersisting at the end. > > If you like Kotlin and would like to try the API, we've prepared a Quick > Start Guide to help you set up all the needed dependencies in no time using > either Maven or Gradle: > https://github.com/JetBrains/kotlin-spark-api/blob/master/docs/quick-start-guide.md > > In the repo, you’ll also find a few code examples to get an idea of what > the API looks like: > https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples > > We’d love to see your feedback in the project’s GitHub issues: > https://github.com/JetBrains/kotlin-spark-api/issues. > > > Thanks! > > >
Re: Kotlin Spark API
I just looked at the examples. https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples These look v nice! V concise yet flexible. I like the ability to do inline *side-effects. *E.g. caching or printing or showDs() package org.jetbrains.spark.api.examples import org.apache.spark.sql.Row import org.jetbrains.spark.api.* fun main() { withSpark { val sd = dsOf(1, 2, 3) sd.createOrReplaceTempView("ds") spark.sql("select * from ds") .withCached { println("asList: ${toList()}") println("asArray: ${toArray().contentToString()}") this } .to() .withCached { println("typed collect: " + (collect() as Array).contentToString()) println("type collectAsList: " + collectAsList()) } dsOf(1, 2, 3) .map { c(it, it + 1, it + 2) } .to() .select("_1") .collectAsList() .forEach { println(it) } } } So that shows some of the niceness of kotlin: intuitive type conversion `to`/`to` and `dsOf( list)`- and also the inlining of the side effects. Overall concise and pleasant to read. On Tue, 14 Jul 2020 at 12:18, Stephen Boesch wrote: > I started with scala/spark in 2012 and scala has been my go-to language > for six years. But I heartily applaud this direction. Kotlin is more like a > simplified Scala - with the benefits that brings - than a simplified java. > I particularly like the simplified / streamlined collections classes. > > Really looking forward to this development. > > On Tue, 14 Jul 2020 at 10:42, Maria Khalusova wrote: > >> Hi folks, >> >> We would love your feedback on the new Kotlin Spark API that we are >> working on: https://github.com/JetBrains/kotlin-spark-api. >> >> Why Kotlin Spark API? Kotlin developers can already use Kotlin with the >> existing Apache Spark Java API, however they cannot take full advantage of >> Kotlin language features. With Kotlin Spark API, you can use Kotlin data >> classes and lambda expressions. >> >> The API also adds some helpful extension functions. For example, you can >> use `withCached` to perform arbitrary transformations on a Dataset and not >> worry about the Dataset unpersisting at the end. >> >> If you like Kotlin and would like to try the API, we've prepared a Quick >> Start Guide to help you set up all the needed dependencies in no time using >> either Maven or Gradle: >> https://github.com/JetBrains/kotlin-spark-api/blob/master/docs/quick-start-guide.md >> >> In the repo, you’ll also find a few code examples to get an idea of what >> the API looks like: >> https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples >> >> We’d love to see your feedback in the project’s GitHub issues: >> https://github.com/JetBrains/kotlin-spark-api/issues. >> >> >> Thanks! >> >> >>
Re: Kotlin Spark API
Hi Stephen, Thank you so much for finding time for looking at our examples! Yes, we've tried to implement as clean design of API as possible and are constantly looking for ways to make it even more readable, clear and friendly. And as Maria already stated we welcome any feedback! On 20/07/14 01:55PM, Stephen Boesch wrote: > I just looked at the examples. > https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples > These look v nice! V concise yet flexible. I like the ability to do > inline *side-effects. *E.g. caching or printing or showDs() > > package org.jetbrains.spark.api.examples > import org.apache.spark.sql.Row > import org.jetbrains.spark.api.* > > fun main() { > withSpark { > val sd = dsOf(1, 2, 3) > sd.createOrReplaceTempView("ds") > spark.sql("select * from ds") > .withCached { > println("asList: ${toList()}") > println("asArray: ${toArray().contentToString()}") > this > } > .to() > .withCached { > println("typed collect: " + (collect() as > Array).contentToString()) > println("type collectAsList: " + collectAsList()) > } > > dsOf(1, 2, 3) > .map { c(it, it + 1, it + 2) } > .to() > .select("_1") > .collectAsList() > .forEach { println(it) } > } > } > > > So that shows some of the niceness of kotlin: intuitive type conversion > `to`/`to` and `dsOf( list)`- and also the inlining of the side > effects. Overall concise and pleasant to read. > > > On Tue, 14 Jul 2020 at 12:18, Stephen Boesch wrote: > > > I started with scala/spark in 2012 and scala has been my go-to language > > for six years. But I heartily applaud this direction. Kotlin is more like a > > simplified Scala - with the benefits that brings - than a simplified java. > > I particularly like the simplified / streamlined collections classes. > > > > Really looking forward to this development. > > > > On Tue, 14 Jul 2020 at 10:42, Maria Khalusova wrote: > > > >> Hi folks, > >> > >> We would love your feedback on the new Kotlin Spark API that we are > >> working on: https://github.com/JetBrains/kotlin-spark-api. > >> > >> Why Kotlin Spark API? Kotlin developers can already use Kotlin with the > >> existing Apache Spark Java API, however they cannot take full advantage of > >> Kotlin language features. With Kotlin Spark API, you can use Kotlin data > >> classes and lambda expressions. > >> > >> The API also adds some helpful extension functions. For example, you can > >> use `withCached` to perform arbitrary transformations on a Dataset and not > >> worry about the Dataset unpersisting at the end. > >> > >> If you like Kotlin and would like to try the API, we've prepared a Quick > >> Start Guide to help you set up all the needed dependencies in no time using > >> either Maven or Gradle: > >> https://github.com/JetBrains/kotlin-spark-api/blob/master/docs/quick-start-guide.md > >> > >> In the repo, you’ll also find a few code examples to get an idea of what > >> the API looks like: > >> https://github.com/JetBrains/kotlin-spark-api/tree/master/examples/src/main/kotlin/org/jetbrains/spark/api/examples > >> > >> We’d love to see your feedback in the project’s GitHub issues: > >> https://github.com/JetBrains/kotlin-spark-api/issues. > >> > >> > >> Thanks! > >> > >> > >> -- Regards, Pasha Big Data Tools @ JetBrains signature.asc Description: PGP signature