Hello all,
I'm developing a pipeline to possibly read a stream from a MinIO bucket. I
have no issues setting Hadoop s3a variables and reading files but when I try to
create a bucket for Spark to use as a readStream location it produces the
following errors:
Example code: initDF = spark.readStream.schema(tempschema).option("path",
"s3://bucketname").load()
The below I have used for the bucket path:
s3 -> py4j.protocol.Py4JJavaError: An error occurred while calling o436.load.
: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme
"s3"
s3a -> pyspark.errors.exceptions.captured.IllegalArgumentException: path must
be absolute
Absolute path ->
pyspark.errors.exceptions.captured.UnsupportedOperationException: None
I'm curious if readStream has any support for s3 buckets at all? Any
help/guidance would be appreciated, thank you for your time.
Sincerely,
Jade Kleckner