murphp15 opened a new issue #2811: URL: https://github.com/apache/hudi/issues/2811
**Describe the problem you faced** I want to write to a gcs bucket from dataproc using hudi. To write to gcs using hudi it says to set prop fs.defaultFS to value gs:// (https://hudi.apache.org/docs/gcs_hoodie) However when I set fs.defaultFS on dataproc to be a gcs bucket I get errors at startup relating to the job not being able to find my jar. It is looking in a gs:/ prefix, presumably because I have overridden defaultFs which is was previously using the find the jar. How would I fix this? ``` org.apache.spark.SparkException: Application application_1617963833977_0009 failed 2 times due to AM Container for appattempt_1617963833977_0009_000002 exited with exitCode: -1000 Failing this attempt.Diagnostics: [2021-04-12 15:36:05.142]java.io.FileNotFoundException: File not found : gs:/user/root/.sparkStaging/application_1617963833977_0009/myjar.jar ``` If it is relevant I am setting the defaultFs from within the code. sparkConfig.set("spark.hadoop.fs.defaultFS", gs://defaultFs) Is there anyway to use hive without requiring the defaultFS property? **To Reproduce** Run a job on dataproc and set defaultFS to be gs://mybucket **Expected behavior** Ideally hudi wouldn't be dependant on the defaultFS property as it seems to cause issues for dataproc. **Environment Description** Dataproc writing to GCS * Hudi version : 0.7.0 * Spark version : 2.4.7 * Hive version : Not using hive * Hadoop version : 2.7 * Storage (HDFS/S3/GCS..) : GCS * Running on Docker? (yes/no) : No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
