Maxim Martynov created HADOOP-18839: ---------------------------------------
Summary: SSLException while accessing S3 bucket is reported only after 15 minutes of waiting Key: HADOOP-18839 URL: https://issues.apache.org/jira/browse/HADOOP-18839 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 3.3.4 Reporter: Maxim Martynov Attachments: host.log, ssl.log I've tried to connect from PySpark to Minio running in docker. Installing PySpark and starting Minio: {code:bash} pip install pyspark==3.4.1 docker run --rm -d --hostname minio --name minio -p 9000:9000 -p 9001:9001 -e MINIO_ACCESS_KEY=access -e MINIO_SECRET_KEY=Eevoh2wo0ui6ech0wu8oy 3feiR3eicha -e MINIO_ROOT_USER=admin -e MINIO_ROOT_PASSWORD=iepaegaigi3ofa9TaephieSo1iecaesh bitnami/minio:latest docker exec minio mc mb test-bucket {code} Then create Spark session: {code:python} from pyspark.sql import SparkSession spark = SparkSession.builder\ .config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.3.4")\ .config("spark.hadoop.fs.s3a.endpoint", "localhost:9000")\ .config("spark.hadoop.fs.s3a.access.key", "access")\ .config("spark.hadoop.fs.s3a.secret.key", "Eevoh2wo0ui6ech0wu8oy3feiR3eicha")\ .config("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider")\ .getOrCreate() spark.sparkContext.setLogLevel("debug") {code} And try to access some object in a bucket: {code:python} import time begin = time.perf_counter() spark.read.format("csv").load("s3a://test-bucket/fake") end = time.perf_counter() py4j.protocol.Py4JJavaError: An error occurred while calling o40.load. : org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://test-bucket/fake: com.amazonaws.SdkClientException: Unable to execute HTTP request: Unsupported or unrecognized SSL message: Unable to execute HTTP request: Unsupported or unrecognized SSL message ... {code} [^ssl.log] {code:python} >>> print((end-begin)/60) 14.72387898775002 {code} I was waiting almost *15 minutes* to get the exception from Spark. The reason was I tried to connect to S3 instance with {{{}fs.s3a.connection.ssl.enabled=true{}}}, but Minio is configured to listen for HTTP protocol only. Is there any way to immediately raise exception if SSL connection cannot be established? If I try to pass wrong endpoint, like {{{}localhos:9000{}}}, I'll get exception like this in just 5 seconds: {code:java} : org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://test-bucket/fake: com.amazonaws.SdkClientException: Unable to execute HTTP request: test-bucket.localhos: Unable to execute HTTP request: test-bucket.localhos ... {code} [^host.log] {code:python} >>> print((end-begin)/60) 0.09500707178334172 >>> end-begin 5.700424307000503 {code} I know about options like {{fs.s3a.attempts.maximum}} and {{{}fs.s3a.retry.limit{}}}, setting them to 1 will cause raising exception just immediately. But this does not look right. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org