milenkovicm commented on PR #1338:
URL: 
https://github.com/apache/datafusion-ballista/pull/1338#issuecomment-3476863366

   Long term plan, and the best plan, would be if we could just plug-in 
ballista's physical planner to datafusion-python.
   Ideally, we do not want to maintain many python classes in ballista, we 
should really rely on pydf .
   
   The main idea of this approach is to extend `DataFrame` and `SessionContext` 
to intercept methods which create `DataFrame` and methods that actually execute 
the plan, such as `show`, `collect`, `write` ... 
   When 'execute' methods are invoked, we would create a  
`BallistaSessionContext` and create a `BallistaDataFrame` (which internally has 
a Ballista physical planner) and execute those methods on ballista context. 
   
   Regarding your concern @timsaucer, I'm not sure that I fully understand it. 
Current goal would be for ballista just to expose `BallistaSessionContext` and 
nothing else, hopefully all other classes could be resused from "single node" 
work. So full portability of the code is target (well if we can get udf 
serialised).
   
   Current risks I see:
   
   - we have two session context (pydf and ballista session context recreated 
on 'plan execution') 
   - we miss some of `DataFrame` creation methods and plan executes on single 
node context
   - to many duplicated code from `pydf` 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to