[PySpark] [Beginner] [Debug] Does Spark ReadStream support reading from a MinIO bucket?

Kleckner, Jade Tue, 05 Aug 2025 07:21:32 -0700

Hello all,

I'm developing a pipeline to possibly read a stream from a MinIO bucket.  I 
have no issues setting Hadoop s3a variables and reading files but when I try to 
create a bucket for Spark to use as a readStream location it produces the 
following errors:


Example code: initDF = spark.readStream.schema(tempschema).option("path", 
"s3://bucketname").load()

The below I have used for the bucket path:

s3 -> py4j.protocol.Py4JJavaError: An error occurred while calling o436.load.
: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme 
"s3"

s3a -> pyspark.errors.exceptions.captured.IllegalArgumentException: path must 
be absolute

Absolute path -> 
pyspark.errors.exceptions.captured.UnsupportedOperationException: None

I'm curious if readStream has any support for s3 buckets at all?  Any 
help/guidance would be appreciated, thank you for your time.

Sincerely,
Jade Kleckner

[PySpark] [Beginner] [Debug] Does Spark ReadStream support reading from a MinIO bucket?

Reply via email to