jonded94 opened a new issue, #17880:
URL: https://github.com/apache/datafusion/issues/17880

   Hey 👋 
   
   I'm currently facing an issue with 
[parquet-viewer](https://github.com/XiangpengHao/parquet-viewer) with filenames 
containing more than one dot. 'parquet-viewer' uses Datafusion underlyingly and 
the error message I'm seeing definitly comes from Datafusion, so I'm vaguely 
feeling that potentially the issue could lay around here somewhere.
   
   For reference, 
[this](https://github.com/XiangpengHao/parquet-viewer/issues/65) is the issue 
I'm seeing: if a file is named `test.[random-strings].parquet`, it will lead to 
this error:
   ```
   Plan(
       "failed to resolve schema: test",
   )
   ```
   
   At least when I try to reproduce the issue with Datafusion from Python, I 
can't seem be able to reproduce the issue though:
   ```
   >>> from datafusion import SessionContext
   >>> ctx = SessionContext()
   >>> df = ctx.read_parquet("[random-path]/test.ako.parquet")
   >>> df.show()
   DataFrame()
   +----+-----+-----+
   | l1 | bar | foo |
   +----+-----+-----+
   |    |     | 0   |
   |    | 0   |     |
   +----+-----+-----+
   >>> df.limit(2)
   DataFrame()
   +----+-----+-----+
   | l1 | bar | foo |
   +----+-----+-----+
   |    |     | 0   |
   |    | 0   |     |
   +----+-----+-----+
   >>> df.schema()
   l1: string_view
   bar: uint64
   foo: uint64
   ```
   
   I did however find 
[this](https://github.com/apache/datafusion/blob/3ee52f85fdb94544da04f6a67f0c7fc03c714843/datafusion/catalog/src/listing_schema.rs#L119)
 line in the Datafusion codebase, which definitely seems fishy to me, as it 
could lead to problems with multiple parquet files called `part.1.parquet`, 
`part.2.parquet`?
   
   Maybe it is also connected to the issue I'm seeing here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to