debajyoti-truefoundry opened a new issue, #1140:
URL: https://github.com/apache/datafusion-python/issues/1140

   **Describe the bug**
   **What happened**:
   There are two ways of querying a Delta Table using DataFusion.
   1. [Using DataFusion 
directly.](https://github.com/delta-io/delta-rs/blob/python-v1.0.2/docs/integrations/delta-lake-datafusion.md)
   2. [Using the Query Builder from 
Delta.](https://github.com/delta-io/delta-rs/blob/f8dcef31d878e206c02a343f0f5caace399956b1/python/deltalake/query.py#L13)
   
   ```
   deltalake==1.0.2
   datafusion==47.0.0
   ```
   
   ```python
   import time
   from datafusion import SessionContext
   from deltalake import DeltaTable, QueryBuilder
   
   dt = DeltaTable("./delta_traces_3/otel_traces")
   sql = """
   SELECT
     *
   FROM tbl
   WHERE
   ("MlRepoId" = 1089) AND ("TracingProjectId" = 
'222fde49-1f7a-4752-8ec1-06bcdbf570c5') AND ("TraceId" = 
'8728990bd3d11fa91a688e9d9964bca1') AND ("SpanId" = '82c0a65e80000450')
   """
   
   qb = QueryBuilder().register("tbl", dt)
   start = time.monotonic()
   table = qb.execute(sql).read_all()
   print("Delta QueryBuilder: ", time.monotonic() - start)
   
   ctx = SessionContext()
   ctx.register_table_provider("tbl", dt)
   start = time.monotonic()
   arrow_list = ctx.sql(sql).collect()
   print("DataFusion: ", time.monotonic() - start)
   ```
   
   ```shell
   python perf_diff.py 
   Delta QueryBuilder:  1.2023070430004736
   DataFusion:  136.57222191900019
   ```
   
   As we can see in the above result, I am noticing a massive difference in the 
query execution time.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   **Expected behavior**
   I was expecting a near-identical execution time.
   
   **Additional context**
   https://github.com/delta-io/delta-rs/issues/3517#issuecomment-2938742237
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to