Super. Thanks Deepak! On Mon, Dec 9, 2019 at 6:58 PM Deepak Vohra <dvohr...@yahoo.com> wrote:
> Please install Apache Spark on Windows as discussed in Apache Spark on > Windows - DZone Open Source > <https://dzone.com/articles/working-on-apache-spark-on-windows> > > Apache Spark on Windows - DZone Open Source > > This article explains and provides solutions for some of the most common > errors developers come across when inst... > <https://dzone.com/articles/working-on-apache-spark-on-windows> > > > > On Monday, December 9, 2019, 11:27:53 p.m. UTC, Ping Liu < > pingpinga...@gmail.com> wrote: > > > Thanks Deepak! Yes, I want to try it with Docker. But my AWS account ran > out of free period. Is there a shared EC2 for Spark that we can use for > free? > > Ping > > > On Monday, December 9, 2019, Deepak Vohra <dvohr...@yahoo.com> wrote: > > Haven't tested but the general procedure is to exclude all guava > dependencies that are not needed. The hadoop-common depedency does not have > a dependency on guava according to Maven Repository: org.apache.hadoop » > hadoop-common > > > > Maven Repository: org.apache.hadoop » hadoop-common > > > > Apache Spark 2.4 has dependency on guava 14. > > If a Docker image for Cloudera Hadoop is used Spark is may be installed > on Docker for Windows. > > For Docker on Windows on EC2 refer Getting Started with Docker for > Windows - Developer.com > > > > Getting Started with Docker for Windows - Developer.com > > > > Docker for Windows makes it feasible to run a Docker daemon on Windows > Server 2016. Learn to harness its power. > > > > > > Conflicting versions is not an issue if Docker is used. > > "Apache Spark applications usually have a complex set of required > software dependencies. Spark applications may require specific versions of > these dependencies (such as Pyspark and R) on the Spark executor hosts, > sometimes with conflicting versions." > > Running Spark in Docker Containers on YARN > > > > Running Spark in Docker Containers on YARN > > > > > > > > > > > > On Monday, December 9, 2019, 08:37:47 p.m. UTC, Ping Liu < > pingpinga...@gmail.com> wrote: > > > > Hi Deepak, > > I tried it. Unfortunately, it still doesn't work. 28.1-jre isn't > downloaded for somehow. I'll try something else. Thank you very much for > your help! > > Ping > > > > On Fri, Dec 6, 2019 at 5:28 PM Deepak Vohra <dvohr...@yahoo.com> wrote: > > > > As multiple guava versions are found exclude guava from all the > dependecies it could have been downloaded with. And explicitly add a recent > guava version. > > <dependency> > > <groupId>org.apache.hadoop</groupId> > > <artifactId>hadoop-common</artifactId> > > <version>3.2.1</version> > > <exclusions> > > <exclusion> > > <groupId>com.google.guava</groupId> > > <artifactId>guava</artifactId> > > </exclusion> > > </exclusions> > > </dependency> > > <dependency> > > <groupId>com.google.guava</groupId> > > <artifactId>guava</artifactId> > > <version>28.1-jre</version> > > </dependency> > > </dependencies> > > </dependencyManagement> > > > > On Friday, December 6, 2019, 10:12:55 p.m. UTC, Ping Liu < > pingpinga...@gmail.com> wrote: > > > > Hi Deepak, > > Following your suggestion, I put exclusion of guava in topmost POM > (under Spark home directly) as follows. > > 2227- </dependency> > > 2228- <dependency> > > 2229- <groupId>org.apache.hadoop</groupId> > > 2230: <artifactId>hadoop-common</artifactId> > > 2231- <version>3.2.1</version> > > 2232- <exclusions> > > 2233- <exclusion> > > 2234- <groupId>com.google.guava</groupId> > > 2235- <artifactId>guava</artifactId> > > 2236- </exclusion> > > 2237- </exclusions> > > 2238- </dependency> > > 2239- </dependencies> > > 2240- </dependencyManagement> > > I also set properties for spark.executor.userClassPathFirst=true and > spark.driver.userClassPathFirst=true > > D:\apache\spark>mvn -Pyarn -Phadoop-3.2 -Dhadoop-version=3.2.1 > -Dspark.executor.userClassPathFirst=true > -Dspark.driver.userClassPathFirst=true -DskipTests clean package > > and rebuilt spark. > > But I got the same error when running spark-shell. > > > > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: > > [INFO] > > [INFO] Spark Project Parent POM ........................... SUCCESS [ > 25.092 s] > > [INFO] Spark Project Tags ................................. SUCCESS [ > 22.093 s] > > [INFO] Spark Project Sketch ............................... SUCCESS [ > 19.546 s] > > [INFO] Spark Project Local DB ............................. SUCCESS [ > 10.468 s] > > [INFO] Spark Project Networking ........................... SUCCESS [ > 17.733 s] > > [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ > 6.531 s] > > [INFO] Spark Project Unsafe ............................... SUCCESS [ > 25.327 s] > > [INFO] Spark Project Launcher ............................. SUCCESS [ > 27.264 s] > > [INFO] Spark Project Core ................................. SUCCESS > [07:59 min] > > [INFO] Spark Project ML Local Library ..................... SUCCESS > [01:39 min] > > [INFO] Spark Project GraphX ............................... SUCCESS > [02:08 min] > > [INFO] Spark Project Streaming ............................ SUCCESS > [02:56 min] > > [INFO] Spark Project Catalyst ............................. SUCCESS > [08:55 min] > > [INFO] Spark Project SQL .................................. SUCCESS > [12:33 min] > > [INFO] Spark Project ML Library ........................... SUCCESS > [08:49 min] > > [INFO] Spark Project Tools ................................ SUCCESS [ > 16.967 s] > > [INFO] Spark Project Hive ................................. SUCCESS > [06:15 min] > > [INFO] Spark Project Graph API ............................ SUCCESS [ > 10.219 s] > > [INFO] Spark Project Cypher ............................... SUCCESS [ > 11.952 s] > > [INFO] Spark Project Graph ................................ SUCCESS [ > 11.171 s] > > [INFO] Spark Project REPL ................................. SUCCESS [ > 55.029 s] > > [INFO] Spark Project YARN Shuffle Service ................. SUCCESS > [01:07 min] > > [INFO] Spark Project YARN ................................. SUCCESS > [02:22 min] > > [INFO] Spark Project Assembly ............................. SUCCESS [ > 21.483 s] > > [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ > 56.450 s] > > [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS > [01:21 min] > > [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS > [02:33 min] > > [INFO] Spark Project Examples ............................. SUCCESS > [02:05 min] > > [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ > 30.780 s] > > [INFO] Spark Avro ......................................... SUCCESS > [01:43 min] > > [INFO] > ------------------------------------------------------------------------ > > [INFO] BUILD SUCCESS > > [INFO] > ------------------------------------------------------------------------ > > [INFO] Total time: 01:08 h > > [INFO] Finished at: 2019-12-06T11:43:08-08:00 > > [INFO] > ------------------------------------------------------------------------ > > > > D:\apache\spark>spark-shell > > 'spark-shell' is not recognized as an internal or external command, > > operable program or batch file. > > > > D:\apache\spark>cd bin > > > > D:\apache\spark\bin>spark-shell > > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > > at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > > at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > > at > org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > > at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > > at > org.apache.spark.deploy.SparkSubmit$$Lambda$132/1985836631.apply(Unknown > Source) > > at scala.Option.getOrElse(Option.scala:189) > > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > > at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > > at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > > at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > > at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > Before building spark, I went to my local Maven repo and removed guava > at all. But after building, I found the same versions of guava have been > downloaded. > > D:\mavenrepo\com\google\guava\guava>ls > > 14.0.1 16.0.1 18.0 19.0 > > On Thu, Dec 5, 2019 at 5:12 PM Deepak Vohra <dvohr...@yahoo.com> wrote: > > > > Just to clarify, excluding Hadoop provided guava in pom.xml is an > alternative to using an Uber jar, which is a more involved process. > > > > On Thursday, December 5, 2019, 10:37:39 p.m. UTC, Ping Liu < > pingpinga...@gmail.com> wrote: > > > > Hi Sean, > > Thanks for your response! > > Sorry, I didn't mention that "build/mvn ..." doesn't work. So I did go > to Spark home directory and ran mvn from there. Following is my build and > running result. The source code was just updated yesterday. I guess the > POM should specify newer Guava library somehow. > > > > Thanks Sean. > > Ping > > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: > > [INFO] > > [INFO] Spark Project Parent POM ........................... SUCCESS [ > 14.794 s] > > [INFO] Spark Project Tags ................................. SUCCESS [ > 18.233 s] > > [INFO] Spark Project Sketch ............................... SUCCESS [ > 20.077 s] > > [INFO] Spark Project Local DB ............................. SUCCESS [ > 7.846 s] > > [INFO] Spark Project Networking ........................... SUCCESS [ > 14.906 s] > > [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ > 6.267 s] > > [INFO] Spark Project Unsafe ............................... SUCCESS [ > 31.710 s] > > [INFO] Spark Project Launcher ............................. SUCCESS [ > 10.227 s] > > [INFO] Spark Project Core ................................. SUCCESS > [08:03 min] > > [INFO] Spark Project ML Local Library ..................... SUCCESS > [01:51 min] > > [INFO] Spark Project GraphX ............................... SUCCESS > [02:20 min] > > [INFO] Spark Project Streaming ............................ SUCCESS > [03:16 min] > > [INFO] Spark Project Catalyst ............................. SUCCESS > [08:45 min] > > [INFO] Spark Project SQL .................................. SUCCESS > [12:12 min] > > [INFO] Spark Project ML Library ........................... SUCCESS [ > 16:28 h] > > [INFO] Spark Project Tools ................................ SUCCESS [ > 23.602 s] > > [INFO] Spark Project Hive ................................. SUCCESS > [07:50 min] > > [INFO] Spark Project Graph API ............................ SUCCESS [ > 8.734 s] > > [INFO] Spark Project Cypher ............................... SUCCESS [ > 12.420 s] > > [INFO] Spark Project Graph ................................ SUCCESS [ > 10.186 s] > > [INFO] Spark Project REPL ................................. SUCCESS > [01:03 min] > > [INFO] Spark Project YARN Shuffle Service ................. SUCCESS > [01:19 min] > > [INFO] Spark Project YARN ................................. SUCCESS > [02:19 min] > > [INFO] Spark Project Assembly ............................. SUCCESS [ > 18.912 s] > > [INFO] Kafka 0.10+ Token Provider for Streaming ........... SUCCESS [ > 57.925 s] > > [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS > [01:20 min] > > [INFO] Kafka 0.10+ Source for Structured Streaming ........ SUCCESS > [02:26 min] > > [INFO] Spark Project Examples ............................. SUCCESS > [02:00 min] > > [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ > 28.354 s] > > [INFO] Spark Avro ......................................... SUCCESS > [01:44 min] > > [INFO] > ------------------------------------------------------------------------ > > [INFO] BUILD SUCCESS > > [INFO] > ------------------------------------------------------------------------ > > [INFO] Total time: 17:30 h > > [INFO] Finished at: 2019-12-05T12:20:01-08:00 > > [INFO] > ------------------------------------------------------------------------ > > > > D:\apache\spark>cd bin > > > > D:\apache\spark\bin>ls > > beeline load-spark-env.cmd run-example spark-shell > spark-sql2.cmd sparkR.cmd > > beeline.cmd load-spark-env.sh run-example.cmd > spark-shell.cmd spark-submit sparkR2.cmd > > docker-image-tool.sh pyspark spark-class > spark-shell2.cmd spark-submit.cmd > > find-spark-home pyspark.cmd spark-class.cmd spark-sql > spark-submit2.cmd > > find-spark-home.cmd pyspark2.cmd spark-class2.cmd > spark-sql.cmd sparkR > > > > D:\apache\spark\bin>spark-shell > > Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > > at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > > at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > > at > org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > > at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > > at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > > at > org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown > Source) > > at scala.Option.getOrElse(Option.scala:189) > > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > > at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > > at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > > at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > > at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > > D:\apache\spark\bin> > > On Thu, Dec 5, 2019 at 1:33 PM Sean Owen <sro...@gmail.com> wrote: > > > > What was the build error? you didn't say. Are you sure it succeeded? > > Try running from the Spark home dir, not bin. > > I know we do run Windows tests and it appears to pass tests, etc. > > > > On Thu, Dec 5, 2019 at 3:28 PM Ping Liu <pingpinga...@gmail.com> wrote: > >> > >> Hello, > >> > >> I understand Spark is preferably built on Linux. But I have a Windows > machine with a slow Virtual Box for Linux. So I wish I am able to build > and run Spark code on Windows environment. > >> > >> Unfortunately, > >> > >> # Apache Hadoop 2.6.X > >> ./build/mvn -Pyarn -DskipTests clean package > >> > >> # Apache Hadoop 2.7.X and later > >> ./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests > clean package > >> > >> > >> Both are listed on > http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version-and-enabling-yarn > >> > >> But neither works for me (I stay directly under spark root directory > and run "mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.3 -DskipTests clean > package" > >> > >> and > >> > >> Then I tried "mvn -Pyarn -Phadoop-3.2 -Dhadoop.version=3.2.1 > -DskipTests clean package" > >> > >> Now build works. But when I run spark-shell. I got the following > error. > >> > >> D:\apache\spark\bin>spark-shell > >> Exception in thread "main" java.lang.NoSuchMethodError: > com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V > >> at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) > >> at > org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) > >> at > org.apache.spark.deploy.SparkHadoopUtil$.org$apache$spark$deploy$SparkHadoopUtil$$appendS3AndSparkHadoopHiveConfigurations(SparkHadoopUtil.scala:456) > >> at > org.apache.spark.deploy.SparkHadoopUtil$.newConfiguration(SparkHadoopUtil.scala:427) > >> at > org.apache.spark.deploy.SparkSubmit.$anonfun$prepareSubmitEnvironment$2(SparkSubmit.scala:342) > >> at > org.apache.spark.deploy.SparkSubmit$$Lambda$132/817978763.apply(Unknown > Source) > >> at scala.Option.getOrElse(Option.scala:189) > >> at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:342) > >> at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871) > >> at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > >> at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > >> at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > >> at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > >> at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > >> > >> > >> Has anyone experienced building and running Spark source code > successfully on Windows? Could you please share your experience? > >> > >> Thanks a lot! > >> > >> Ping > >> > > >