Bizarre. I originally cloned from and have been pulling from https://github.com/apache/spark, and my repo shows the following:
user@host:~/development/spark$ git diff v1.2.0-rc2..v1.2.0 | wc -l > 1898 If I pull a fresh clone, I get this: user@host:~$ git clone https://github.com/apache/spark > Cloning into 'spark'... > remote: Counting objects: 152765, done. > remote: Compressing objects: 100% (50/50), done. > remote: Total 152765 (delta 16), reused 64 (delta 16) > Receiving objects: 100% (152765/152765), 85.01 MiB | 3.29 MiB/s, done. > Resolving deltas: 100% (68247/68247), done. > user@host:~$ cd spark > user@host:~/spark$ git diff v1.2.0-rc2..v1.2.0 | wc -l > 0 I will do a build from the fresh clone and report back on whether the behavior persists. -matt On Sat, Dec 20, 2014 at 4:16 PM, Mark Hamstra <[email protected]> wrote: > This makes no sense. There is no difference between v1.2.0-rc2 and > v1.2.0: https://github.com/apache/spark/compare/v1.2.0-rc2...v1.2.0 > > On Sat, Dec 20, 2014 at 12:44 PM, Matt Mead <[email protected]> wrote: > >> First, thanks for the efforts and contribution to such a useful software >> stack! Spark is great! >> >> I have been using the git tags for v1.2.0-rc1 and v1.2.0-rc2 built as >> follows: >> >> ./make-distribution.sh -Dhadoop.version=2.5.0-cdh5.2.0 >>> -Dyarn.version=2.5.0-cdh5.2.0 -Phadoop-2.4 -Phive -Pyarn -Phive-thriftserver >> >> >> I have been starting the thriftserver as follows: >> >> HADOOP_CONF_DIR=/etc/hadoop/conf ./sbin/start-thriftserver.sh --master >>> yarn --num-executors 16 >> >> >> Under v1.2.0-rc1 and v1.2.0-rc2, this has worked properly, where the >> thriftserver starts up and I am able to interact with it and execute >> queries as expected using the JDBC driver. >> >> I have updated to git tag v1.2.0, built identically and started the >> thriftserver identically, but am now running into the following issue on >> startup: >> >> Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: >>> hdfs://myhdfs/user/user/.sparkStaging/application_1416150945509_0055/datanucleus-api-jdo-3.2.6.jar, >>> expected: file:/// >>> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) >>> at >>> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80) >>> at >>> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:519) >>> at >>> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737) >>> at >>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514) >>> at >>> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398) >>> at >>> org.apache.spark.deploy.yarn.ClientDistributedCacheManager.addResource(ClientDistributedCacheManager.scala:67) >>> at >>> org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$5.apply(ClientBase.scala:257) >>> at >>> org.apache.spark.deploy.yarn.ClientBase$$anonfun$prepareLocalResources$5.apply(ClientBase.scala:242) >>> at scala.Option.foreach(Option.scala:236) >>> at >>> org.apache.spark.deploy.yarn.ClientBase$class.prepareLocalResources(ClientBase.scala:242) >>> at >>> org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:35) >>> at >>> org.apache.spark.deploy.yarn.ClientBase$class.createContainerLaunchContext(ClientBase.scala:350) >>> at >>> org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:35) >>> at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:80) >>> at >>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57) >>> at >>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:140) >>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:335) >>> at >>> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:38) >>> at >>> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2$.main(HiveThriftServer2.scala:56) >>> at >>> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.main(HiveThriftServer2.scala) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) >>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) >>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> >> >> Looking at SPARK-4757, it appears others were seeing this behavior in >> earlier releases and it is fixed in v1.2.0, whereas I did not see the >> behavior in earlier releases and now am seeing it in v1.2.0. >> >> I have tested this with the exact same build/launch commands on two >> separate CDH5.2.0 clusters with identical results. Both machines where the >> build and execution take place have a proper HDFS/YARN client configuration >> in /etc/hadoop/conf and other hadoop tools like MR2 on YARN function as >> expected. >> >> Any ideas on what to do to resolve this issue? >> >> Thanks! >> >> >> >> >> -matt >> >> >
