cfis opened a new issue, #1018: URL: https://github.com/apache/datafusion-python/issues/1018
This might be more of an arrow issue, but I am running into this error: `Exception: DataFusion error: SchemaError(DuplicateQualifiedField { qualifier: Bare { table: "data" }, name: "year" }, Some(""))` This is happening when querying parquet files stored on S3 using hive partitioning. The partition fields are year/month/day. Those same fields are also contained in the parquet files themselves. Thus the error. Is there a way to avoid this? I can specify the schema manually but I'd like to avoid that. Example Code: ```python import os import time import datafusion from datafusion.object_store import AmazonS3 s3 = AmazonS3( bucket_name="<removed>", region="<removed>", endpoint="<removed>", access_key_id="<removed>", secret_access_key="<removed>") ctx = datafusion.SessionContext() ctx.register_object_store("s3://<removed>/", s3) ctx.sql(""" CREATE EXTERNAL TABLE data STORED AS PARQUET PARTITIONED BY (year, month, day) LOCATION 's3://<removed>/' """) sql = f"""SELECT count(*) FROM data WHERE year = 2025 AND month = 1 AND day = 1) """ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org