kosiew commented on PR #1256:
URL: 
https://github.com/apache/datafusion-python/pull/1256#issuecomment-3370324989

   > If I have a python object that implements TableProvider via PyCapsule, I 
should be able to pass this object directly to SessionContext.read_table
   
   Source: #1245 
   
   The new Table constructor and the Python SessionContext.read_table path will 
still choke on raw PyCapsule objects. 
   
   ```python
   
   """Demonstrate how passing a raw PyCapsule triggers the dataset fallback.
   
   This mirrors integrations that construct a :class:`datafusion.Table` directly
   from the FFI PyCapsule returned by ``__datafusion_table_provider__``. After 
the
   refactor that routes all inputs through ``RawTable`` the capsule is no longer
   recognized, so the constructor falls back to the PyArrow dataset path and 
raises
   ``ValueError: dataset argument must be a pyarrow.dataset.Dataset object``.
   """
   
   from __future__ import annotations
   
   import ctypes
   
   from datafusion import SessionContext, Table
   
   
   # Keep the backing memory alive for the lifetime of the module so the capsule
   # always wraps a valid (non-null) pointer. The capsule content is irrelevant 
for
   # this regression example—we only need a non-null address.
   _DUMMY_CAPSULE_BYTES = ctypes.create_string_buffer(b"x")
   
   
   def make_table_provider_capsule() -> object:
       """Create a dummy PyCapsule with the expected table provider name."""
   
       pycapsule_new = ctypes.pythonapi.PyCapsule_New
       pycapsule_new.restype = ctypes.py_object
       pycapsule_new.argtypes = [ctypes.c_void_p, ctypes.c_char_p, 
ctypes.c_void_p]
       dummy_ptr = ctypes.cast(_DUMMY_CAPSULE_BYTES, ctypes.c_void_p)
       return pycapsule_new(dummy_ptr, b"datafusion_table_provider", None)
   
   
   def main() -> None:
       """Attempt to use the capsule the same way existing callers do."""
   
       ctx = SessionContext()
       try:
           capsule = make_table_provider_capsule()
       except Exception as err:
           print("Creating the PyCapsule failed:", err)
           return
   
   
       ctx.read_table(capsule)
       
   
   if __name__ == "__main__":
       main()
   ```
   
   raises this Traceback:
   
   ```
     File 
"/Users/kosiew/GitHub/datafusion-python/examples/raw_capsule_registration_failure.py",
 line 49, in <module>
       main()
       ~~~~^^
     File 
"/Users/kosiew/GitHub/datafusion-python/examples/raw_capsule_registration_failure.py",
 line 45, in main
       ctx.read_table(capsule)
       ~~~~~~~~~~~~~~^^^^^^^^^
     File 
"/Users/kosiew/GitHub/datafusion-python/python/datafusion/context.py", line 
1184, in read_table
       return DataFrame(self.ctx.read_table(table))
                        ~~~~~~~~~~~~~~~~~~~^^^^^^^
   ValueError: dataset argument must be a pyarrow.dataset.Dataset object
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to