kosiew commented on PR #1256:
URL:
https://github.com/apache/datafusion-python/pull/1256#issuecomment-3370324989
> If I have a python object that implements TableProvider via PyCapsule, I
should be able to pass this object directly to SessionContext.read_table
Source: #1245
The new Table constructor and the Python SessionContext.read_table path will
still choke on raw PyCapsule objects.
```python
"""Demonstrate how passing a raw PyCapsule triggers the dataset fallback.
This mirrors integrations that construct a :class:`datafusion.Table` directly
from the FFI PyCapsule returned by ``__datafusion_table_provider__``. After
the
refactor that routes all inputs through ``RawTable`` the capsule is no longer
recognized, so the constructor falls back to the PyArrow dataset path and
raises
``ValueError: dataset argument must be a pyarrow.dataset.Dataset object``.
"""
from __future__ import annotations
import ctypes
from datafusion import SessionContext, Table
# Keep the backing memory alive for the lifetime of the module so the capsule
# always wraps a valid (non-null) pointer. The capsule content is irrelevant
for
# this regression example—we only need a non-null address.
_DUMMY_CAPSULE_BYTES = ctypes.create_string_buffer(b"x")
def make_table_provider_capsule() -> object:
"""Create a dummy PyCapsule with the expected table provider name."""
pycapsule_new = ctypes.pythonapi.PyCapsule_New
pycapsule_new.restype = ctypes.py_object
pycapsule_new.argtypes = [ctypes.c_void_p, ctypes.c_char_p,
ctypes.c_void_p]
dummy_ptr = ctypes.cast(_DUMMY_CAPSULE_BYTES, ctypes.c_void_p)
return pycapsule_new(dummy_ptr, b"datafusion_table_provider", None)
def main() -> None:
"""Attempt to use the capsule the same way existing callers do."""
ctx = SessionContext()
try:
capsule = make_table_provider_capsule()
except Exception as err:
print("Creating the PyCapsule failed:", err)
return
ctx.read_table(capsule)
if __name__ == "__main__":
main()
```
raises this Traceback:
```
File
"/Users/kosiew/GitHub/datafusion-python/examples/raw_capsule_registration_failure.py",
line 49, in <module>
main()
~~~~^^
File
"/Users/kosiew/GitHub/datafusion-python/examples/raw_capsule_registration_failure.py",
line 45, in main
ctx.read_table(capsule)
~~~~~~~~~~~~~~^^^^^^^^^
File
"/Users/kosiew/GitHub/datafusion-python/python/datafusion/context.py", line
1184, in read_table
return DataFrame(self.ctx.read_table(table))
~~~~~~~~~~~~~~~~~~~^^^^^^^
ValueError: dataset argument must be a pyarrow.dataset.Dataset object
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]