littleKitchen opened a new pull request, #1430:
URL: https://github.com/apache/datafusion-ballista/pull/1430
## Summary
This PR implements the improvements outlined in #1398 to enhance the Jupyter
notebook experience for Ballista.
## Implementation Checklist
All items from the issue have been implemented:
- ✅ **Add example .ipynb notebooks to python/examples/**
- `getting_started.ipynb` - Basic connection and queries
- `dataframe_api.ipynb` - DataFrame transformations
- `distributed_queries.ipynb` - Multi-stage query examples
- ✅ **Document notebook support in Python README**
- Added comprehensive Jupyter section with examples for magic commands,
HTML rendering, and plan visualization
- ✅ **Create ballista.jupyter module with magic commands**
- Full implementation with `BallistaMagics` class
- Graceful fallback when IPython is not available
- ✅ **Add %ballista connect/status/tables/schema line magics**
- `connect`: Connect to Ballista cluster
- `status`: Show connection status
- `tables`: List registered tables
- `schema`: Show table schema
- `disconnect`: Disconnect from cluster
- `history`: Show query history
- ✅ **Add %%sql cell magic**
- Line magic for single-line queries (`%sql SELECT ...`)
- Cell magic for multi-line queries (`%%sql`)
- Variable assignment support (`%%sql my_result`)
- `--no-display` and `--limit N` options
- ✅ **Add explain_visual() method for query plan rendering**
- Generates DOT/SVG visualization of execution plans
- Supports Jupyter `_repr_html_` for inline display
- Fallback HTML representation when graphviz is not installed
- `.save()` method for exporting to files
- ✅ **Add progress indicator support for long-running queries**
- `collect_with_progress()` method on DataFrame
- Callback support for custom progress handling
- Jupyter-aware progress display
- ✅ **Consider JupySQL integration**
- Documented as alternative in README for users who prefer the JupySQL
ecosystem
## Additional Improvements
- `ExecutionPlanVisualization` class for plan rendering with DOT/SVG
conversion
- `tables()` method on `BallistaSessionContext` for listing registered tables
- Optional `jupyter` dependency group in `pyproject.toml`
- Comprehensive test coverage with 45 tests passing
## Usage Examples
```python
# Load the extension
%load_ext ballista.jupyter
# Connect to a Ballista cluster
%ballista connect df://localhost:50050
# Execute SQL queries
%sql SELECT COUNT(*) FROM orders
%%sql my_result
SELECT customer_id, SUM(amount) as total
FROM orders
GROUP BY customer_id
ORDER BY total DESC
LIMIT 10
# Visualize execution plan
df.explain_visual()
# Track progress on long queries
batches = df.collect_with_progress()
```
## Testing
All 45 tests pass:
- Existing tests: 6 passed
- New Jupyter module tests: 20 passed
- New notebook features tests: 19 passed
Closes #1398
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]