phillipleblanc opened a new issue, #31475:
URL: https://github.com/apache/superset/issues/31475

   ### Bug description
   
   Currently, Superset requires pyarrow>=14.0.1,<15, but this creates 
compatibility issues when working with databases that return StringView types 
(introduced in PyArrow 16).
   
   I've tested Superset with PyArrow 18.1.0 and verified it works correctly in 
my (admittedly bare-bones) setup. This update would:
   1. Fix compatibility with databases returning StringView types
   2. Allow users to work with newer Arrow-based databases and tools
   3. Take advantage of performance improvements in newer PyArrow versions
   
   Proposed change:
   Update the pyarrow dependency in pyproject.toml from:
   `"pyarrow>=14.0.1, <15"`
   to:
   `"pyarrow>=14.0.1, <19"`
   
   ### Screenshots/recordings
   
   _No response_
   
   ### Superset version
   
   master / latest-dev
   
   ### Python version
   
   3.10
   
   ### Node version
   
   Not applicable
   
   ### Browser
   
   Not applicable
   
   ### Additional context
   
   I'm using the https://github.com/influxdata/flightsql-dbapi DB API2 layer to 
query a database that returns native Arrow arrays. It is returning StringView 
types that pyarrow 14 can't understand. I force upgraded to pyarrow 18.1 and it 
started working.
   
   ```console
   
   2024-12-16 12:48:10,731:ERROR:flask_appbuilder.api:Unrecognized type: 24
   Traceback (most recent call last):
     File 
"/usr/local/lib/python3.10/site-packages/flask_appbuilder/api/init.py", line 
110, in wraps
       return f(self, *args, kwargs)
     File "/app/superset/views/base_api.py", line 127, in wraps
       raise ex
     File "/app/superset/views/base_api.py", line 121, in wraps
       duration, response = time_function(f, self, *args, kwargs)
     File "/app/superset/utils/core.py", line 1470, in time_function
       response = func(args, **kwargs)
     File "/app/superset/utils/log.py", line 255, in wrapper
       value = f(args, kwargs)
     File "/app/superset/databases/api.py", line 742, in table_metadata
       table_info = get_table_metadata(database, table_name, schema_name)
     File "/app/superset/databases/utils.py", line 67, in get_table_metadata
       columns = database.get_columns(table_name, schema_name)
     File "/app/superset/models/core.py", line 839, in get_columns
       return self.db_engine_spec.get_columns(
     File "/app/superset/db_engine_specs/base.py", line 1341, in get_columns
       cast(list[SQLAColumnType], inspector.get_columns(table_name, schema))
     File 
"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 
497, in get_columns
       col_defs = self.dialect.get_columns(
     File "<string>", line 2, in get_columns
     File 
"/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 
55, in cache
       ret = fn(self, con, *args, kw)
     File "/usr/local/lib/python3.10/site-packages/flightsql/sqlalchemy.py", 
line 87, in get_columns
       return connection.connection.flightsql_get_columns(table, schema)
     File "/usr/local/lib/python3.10/site-packages/flightsql/util.py", line 8, 
in g
       return f(self, *args, kwargs)
     File "/usr/local/lib/python3.10/site-packages/flightsql/dbapi.py", line 
173, in flightsql_get_columns
       reader = ipc.open_stream(table_schema)
     File "/usr/local/lib/python3.10/site-packages/pyarrow/ipc.py", line 190, 
in open_stream
       return RecordBatchStreamReader(source, options=options,
     File "/usr/local/lib/python3.10/site-packages/pyarrow/ipc.py", line 52, in 
init**
       self._open(source, options=options, memory_pool=memory_pool)
     File "pyarrow/ipc.pxi", line 929, in 
pyarrow.lib._RecordBatchStreamReader._open
     File "pyarrow/error.pxi", line 154, in 
pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Unrecognized type: 24
   ```
   
   ### Checklist
   
   - [X] I have searched Superset docs and Slack and didn't find a solution to 
my problem.
   - [X] I have searched the GitHub issue tracker and didn't find a similar bug 
report.
   - [X] I have checked Superset's logs for errors and if I found a relevant 
Python stacktrace, I included it here as text in the "additional context" 
section.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to