>> option("path", "s3://bucketname") Shouldn’t the schema prefix be s3a instead of s3?
Information Classification: General From: 刘唯 <z920631...@gmail.com> Sent: Tuesday, August 5, 2025 5:34 PM To: Kleckner, Jade <jade.kleck...@ipp.mpg.de> Cc: user@spark.apache.org Subject: Re: [PySpark] [Beginner] [Debug] Does Spark ReadStream support reading from a MinIO bucket? You don't often get email from z920631...@gmail.com<mailto:z920631...@gmail.com>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> This is not necessarily about the readStream / read API. As long as you correctly imported the needed dependencies and set up spark config, you should be able to readStream from s3 path. See https://stackoverflow.com/questions/46740670/no-filesystem-for-scheme-s3-with-pyspark Kleckner, Jade <jade.kleck...@ipp.mpg.de<mailto:jade.kleck...@ipp.mpg.de>> 于2025年8月5日周二 10:21写道: Hello all, I’m developing a pipeline to possibly read a stream from a MinIO bucket. I have no issues setting Hadoop s3a variables and reading files but when I try to create a bucket for Spark to use as a readStream location it produces the following errors: Example code: initDF = spark.readStream.schema(tempschema).option("path", "s3://bucketname").load() The below I have used for the bucket path: s3 -> py4j.protocol.Py4JJavaError: An error occurred while calling o436.load. : org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3" s3a -> pyspark.errors.exceptions.captured.IllegalArgumentException: path must be absolute Absolute path -> pyspark.errors.exceptions.captured.UnsupportedOperationException: None I’m curious if readStream has any support for s3 buckets at all? Any help/guidance would be appreciated, thank you for your time. Sincerely, Jade Kleckner Please be advised that all inbound and outbound calls with State Street may be subject to recording for the verification of transaction details and quality assurance purposes.