>> option("path", "s3://bucketname")
Shouldn’t the schema prefix be s3a instead of s3?




Information Classification: General
From: 刘唯 <z920631...@gmail.com>
Sent: Tuesday, August 5, 2025 5:34 PM
To: Kleckner, Jade <jade.kleck...@ipp.mpg.de>
Cc: user@spark.apache.org
Subject: Re: [PySpark] [Beginner] [Debug] Does Spark ReadStream support reading 
from a MinIO bucket?


You don't often get email from 
z920631...@gmail.com<mailto:z920631...@gmail.com>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>

This is not necessarily about the readStream / read API. As long as you 
correctly imported the needed dependencies and set up spark config, you should 
be able to readStream from s3 path.

See 
https://stackoverflow.com/questions/46740670/no-filesystem-for-scheme-s3-with-pyspark

Kleckner, Jade <jade.kleck...@ipp.mpg.de<mailto:jade.kleck...@ipp.mpg.de>> 
于2025年8月5日周二 10:21写道:
Hello all,

I’m developing a pipeline to possibly read a stream from a MinIO bucket.  I 
have no issues setting Hadoop s3a variables and reading files but when I try to 
create a bucket for Spark to use as a readStream location it produces the 
following errors:

Example code: initDF = spark.readStream.schema(tempschema).option("path", 
"s3://bucketname").load()

The below I have used for the bucket path:

s3 -> py4j.protocol.Py4JJavaError: An error occurred while calling o436.load.
: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme 
"s3"

s3a -> pyspark.errors.exceptions.captured.IllegalArgumentException: path must 
be absolute

Absolute path -> 
pyspark.errors.exceptions.captured.UnsupportedOperationException: None

I’m curious if readStream has any support for s3 buckets at all?  Any 
help/guidance would be appreciated, thank you for your time.

Sincerely,
Jade Kleckner
Please be advised that all inbound and outbound calls with State Street may be 
subject to recording for the verification of transaction details and quality 
assurance purposes.

Reply via email to