milenkovicm commented on issue #10157:
URL: https://github.com/apache/datafusion/issues/10157#issuecomment-2514694231

   @alamb, @andygrove  quick drafts summary for ballista, feel free to modify 
as necessary:
   
   As described in <https://github.com/apache/datafusion-ballista/pull/1066> 
and announced by @andygrove  in
   <https://lists.apache.org/thread/bkbxx9rbo8dbfolybxw9v0z1638do725> focus was 
 in three directions
   
   1. lighter codebase, easier to maintain
   2. change focus from "Apache DataFusion Ballista Distributed Query Engine" 
to "Making Apache DataFusion Applications Distributed"
   3. making it easier to customize each ballista component
   
   40+ commits later, we have API which can make datafusion applications 
distributed with single line change:
   
   ```rust
   use ballista::prelude::*;
   use datafusion::prelude::*;
   
   #[tokio::main]
   async fn main() -> datafusion::error::Result<()> {
     
     // create DataFusion SessionContext with ballista standalone cluster 
started 
     let ctx = datafusion::prelude::SessionContext::standalone();
   
     ctx.register_csv("example", "tests/data/example.csv", 
CsvReadOptions::new()).await?;
   
     let df = ctx.sql("SELECT a, MIN(b) FROM example WHERE a <= b GROUP BY a 
LIMIT 100").await?;
     df.show().await?;
     Ok(())
   }
   ```
   
   and ongoing planning for next release 
<https://github.com/apache/datafusion-ballista/issues/974>.
   
   Also, benchmark result has been updated, showing huge benefit keeping up 
with latest datafusion
   
   ![query 
compare](https://github.com/apache/datafusion-ballista/raw/main/docs/source/_static/images/tpch_queries_compare.png)
   
   Short term focus would be:
   
   - coming up with proper strategy for python bindings and datafusion-python 
integrations <https://github.com/apache/datafusion-ballista/issues/1142>
   - bridging gap between datafusion and ballista in sense of functionality 
<https://github.com/apache/datafusion/issues/13616>
   - further documentation and examples improvements
   - further API improvements
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to