Hello, I have a local S3 service that is writable and readable using AWS sdk APIs. I created the spark session and then set the hadoop configurations as follows -
// Create Spark Session val spark = SparkSession .builder() .master("local[*]") .appName("S3Loaders") .config("spark.sql.streaming.checkpointLocation", "/Users/atekade/checkpoint-s3-loaders/") .getOrCreate() // Take spark context from spark session val sc = spark.sparkContext // Configure spark context with S3 values val accessKey = "00cce9eb2c589b1b1b5b" val secretKey = "flmheKX9Gb1tTlImO6xR++9kvnUByfRKZfI7LJT8" val endpoint = "http://s3-region1.mycloudianhyperstore.com:80" spark.sparkContext.hadoopConfiguration.set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", endpoint) // spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", accessKey) // spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", secretKey) sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", accessKey) sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", secretKey) And then trying to write to the s3 as follows - val query = rawData .writeStream .format("csv") .option("format", "append") .option("path", "s3a://bucket0/") .outputMode("append") .start() But nothing is actually getting written. Since I am running this from my local machine, I have an entry for the ip-address and S3 endpoint into the /etc/hosts file. As you can see this is a streaming dataframe and so can not write without writeStream API. Can someone help about what am I missing here? Is there any better way to perform this? Best, Aniruddha ----------- ᐧ