This is an automated email from the ASF dual-hosted git repository. skrawcz pushed a commit to branch stefan/add-eco-system-page in repository https://gitbox.apache.org/repos/asf/hamilton.git
commit ca2c86fe5b249a922cdeaf14530cfe8e39586032 Author: Stefan Krawczyk <[email protected]> AuthorDate: Sun Dec 28 22:08:27 2025 +1100 Adds eco-system page This is to help move things to a better spot with all the content under dagworks.io still. --- docs/concepts/driver.rst | 2 +- docs/ecosystem/index.md | 193 +++++++++++++++++++++++++++++ docs/get-started/learning-resources.md | 19 ++- docs/index.md | 14 +-- docs/main.md | 2 +- docs/reference/dataflows/index.rst | 4 +- docs/reference/result-builders/PyArrow.rst | 8 ++ docs/reference/result-builders/index.rst | 1 + 8 files changed, 217 insertions(+), 26 deletions(-) diff --git a/docs/concepts/driver.rst b/docs/concepts/driver.rst index b07e4cc6..26a39832 100644 --- a/docs/concepts/driver.rst +++ b/docs/concepts/driver.rst @@ -163,7 +163,7 @@ Next step Now, you know the basics of authoring and executing Apache Hamilton dataflows! We encourage you to: - Write some code with our `interactive tutorials <https://www.tryhamilton.dev/intro>`_ -- Kickstart your project with `community dataflows <https://hub.dagworks.io/docs/>`_ +- Kickstart your project with `community resources <../ecosystem/index.html>`_ The next **Concepts** pages cover notions to write more expressive and powerful code. If you feel stuck or constrained with the basics, it's probably a good time to (re)visit them. They include: diff --git a/docs/ecosystem/index.md b/docs/ecosystem/index.md new file mode 100644 index 00000000..9e9e16e9 --- /dev/null +++ b/docs/ecosystem/index.md @@ -0,0 +1,193 @@ +# Ecosystem + +Welcome to the Apache Hamilton Ecosystem page! This page showcases the integrations, plugins, and external resources available for Apache Hamilton users. + +## Built-in Integrations + +Apache Hamilton provides first-class support for many popular data science and engineering tools through built-in plugins and adapters. These integrations are maintained by the Apache Hamilton community and included in the core project. + +### Data Frameworks + +Apache Hamilton integrates seamlessly with popular data manipulation libraries: + +| Integration | Description | Documentation | +|------------|-------------|---------------| +| **pandas** | DataFrame operations and transformations | [Examples](https://github.com/apache/hamilton/tree/main/examples/pandas) \| [ResultBuilder](../reference/result-builders/Pandas.rst) | +| **Polars** | High-performance DataFrame library | [Examples](https://github.com/apache/hamilton/tree/main/examples/polars) \| [ResultBuilder](../reference/result-builders/Polars.rst) | +| **PySpark** | Distributed data processing with Spark | [Examples](https://github.com/apache/hamilton/tree/main/examples/spark) \| [GraphAdapter](../reference/graph-adapters/index.rst) | +| **Dask** | Parallel computing and distributed arrays | [Examples](https://github.com/apache/hamilton/tree/main/examples/dask) \| [GraphAdapter](../reference/graph-adapters/DaskGraphAdapter.rst) | +| **Ray** | Distributed computing framework | [Examples](https://github.com/apache/hamilton/tree/main/examples/ray) \| [GraphAdapter](../reference/graph-adapters/RayGraphAdapter.rst) | +| **Ibis** | Portable DataFrame API across backends | [Integration Guide](../integrations/ibis/index.md) | +| **Vaex** | Out-of-core DataFrame library | [Examples](https://github.com/apache/hamilton/tree/main/examples/vaex) | +| **Narwhals** | DataFrame-agnostic interface | [Examples](https://github.com/apache/hamilton/tree/main/examples/narwhals) \| [Lifecycle Hook](../reference/lifecycle-hooks/Narwhals.rst) | +| **NumPy** | Numerical computing arrays | [ResultBuilder](../reference/result-builders/Numpy.rst) | +| **PyArrow** | Columnar in-memory data | [ResultBuilder](../reference/result-builders/PyArrow.rst) | + +### Machine Learning & Data Science + +Build and deploy ML workflows with Apache Hamilton: + +| Integration | Description | Documentation | +|------------|-------------|---------------| +| **MLflow** | Experiment tracking and model registry | [Examples](https://github.com/apache/hamilton/tree/main/examples/mlflow) \| [Lifecycle Hook](../reference/lifecycle-hooks/MLFlowTracker.rst) | +| **scikit-learn** | Machine learning algorithms | [Examples](https://github.com/apache/hamilton/tree/main/examples/scikit-learn) | +| **XGBoost** | Gradient boosting framework | [IO Adapters](../reference/io/available-data-adapters.rst) | +| **LightGBM** | Gradient boosting framework | [IO Adapters](../reference/io/available-data-adapters.rst) | +| **Hugging Face** | Transformers and NLP models | [IO Adapters](../reference/io/available-data-adapters.rst) | +| **Pandera** | DataFrame validation | [Examples](https://github.com/apache/hamilton/tree/main/examples/data_quality/pandera) | +| **Pydantic** | Data validation and settings | [Decorator](../reference/decorators/check_output.rst) | + +### Orchestration & Workflow Systems + +Use Apache Hamilton within your existing orchestration infrastructure: + +| Integration | Description | Documentation | +|------------|-------------|---------------| +| **Airflow** | Workflow orchestration platform | [Examples](https://github.com/apache/hamilton/tree/main/examples/airflow) | +| **Dagster** | Data orchestrator | [Examples](https://github.com/apache/hamilton/tree/main/examples/dagster) | +| **Prefect** | Workflow orchestration | [Examples](https://github.com/apache/hamilton/tree/main/examples/prefect) | +| **Kedro** | Data science pipelines | [Examples](https://github.com/apache/hamilton/tree/main/examples/kedro) | +| **Metaflow** | ML infrastructure | [Integration](https://github.com/outerbounds/hamilton-metaflow) | +| **dbt** | Data transformation tool | [Integration Guide](../integrations/dbt.rst) | + +### Data Engineering & ETL + +Tools for building robust data pipelines: + +| Integration | Description | Documentation | +|------------|-------------|---------------| +| **dlt** | Data loading and transformation | [Integration Guide](../integrations/dlt/index.md) | +| **Feast** | Feature store | [Examples](https://github.com/apache/hamilton/tree/main/examples/feast) | +| **FastAPI** | Web service framework | [Integration Guide](../integrations/fastapi.md) | +| **Streamlit** | Interactive web applications | [Integration Guide](../integrations/streamlit.md) | + +### Observability & Monitoring + +Track and monitor your Apache Hamilton dataflows: + +| Integration | Description | Documentation | +|------------|-------------|---------------| +| **Datadog** | Monitoring and analytics | [Lifecycle Hook](../reference/lifecycle-hooks/DDOGTracer.rst) | +| **OpenTelemetry** | Observability framework | [Examples](https://github.com/apache/hamilton/tree/main/examples/opentelemetry) | +| **OpenLineage** | Data lineage tracking | [Examples](https://github.com/apache/hamilton/tree/main/examples/openlineage) \| [Lifecycle Hook](../reference/lifecycle-hooks/OpenLineageAdapter.rst) | +| **Hamilton UI** | Built-in execution tracking | [UI Guide](../hamilton-ui/index.rst) | +| **Experiment Manager** | Lightweight experiment tracking | [Examples](https://github.com/apache/hamilton/tree/main/examples/experiment_management) | + +### Visualization + +Create visualizations from your dataflows: + +| Integration | Description | Documentation | +|------------|-------------|---------------| +| **Plotly** | Interactive plotting | [Examples](https://github.com/apache/hamilton/tree/main/examples/plotly) | +| **Matplotlib** | Static plotting | [IO Adapters](../reference/io/available-data-adapters.rst) | +| **Rich** | Terminal formatting and progress | [Lifecycle Hook](../reference/lifecycle-hooks/RichProgressBar.rst) | + +### Developer Tools + +Improve your development workflow: + +| Integration | Description | Documentation | +|------------|-------------|---------------| +| **Jupyter** | Notebook magic commands | [Examples](https://github.com/apache/hamilton/tree/main/examples/jupyter_notebook_magic) | +| **VS Code** | Language server and extension | [VS Code Guide](../hamilton-vscode/index.rst) | +| **tqdm** | Progress bars | [Lifecycle Hook](../reference/lifecycle-hooks/ProgressBar.rst) | + +### Cloud Providers & Infrastructure + +Deploy Apache Hamilton to the cloud: + +| Integration | Description | Documentation | +|------------|-------------|---------------| +| **AWS** | Amazon Web Services | [Examples](https://github.com/apache/hamilton/tree/main/examples/aws) | +| **Google Cloud** | Google Cloud Platform | [Scale-up Guide](../how-tos/scale-up.rst) | +| **Modal** | Serverless cloud functions | [Scale-up Guide](../how-tos/scale-up.rst) | + +### Storage & Caching + +Persist and cache your data: + +| Integration | Description | Documentation | +|------------|-------------|---------------| +| **DiskCache** | Disk-based caching | [Examples](https://github.com/apache/hamilton/tree/main/examples/caching_nodes/diskcache_adapter) | +| **File-based caching** | Local file caching | [Caching Guide](../reference/caching/index.rst) | + +### Other Utilities + +| Integration | Description | Documentation | +|------------|-------------|---------------| +| **Slack** | Notifications and integrations | [Examples](https://github.com/apache/hamilton/tree/main/examples/slack) \| [Lifecycle Hook](../reference/lifecycle-hooks/SlackNotifierHook.rst) | +| **GeoPandas** | Geospatial data analysis | Type extension for GeoDataFrame support | +| **YAML** | Configuration management | [IO Adapters](../reference/io/available-data-adapters.rst) | + +--- + +## External Resources + +The following resources and services are provided by third parties and the broader Apache Hamilton community. + +**⚠️ Important Notice:** + +These resources and services are **not maintained, nor endorsed** by the Apache Hamilton Community and Apache Hamilton project (maintained by the Committers and the Apache Hamilton PMC). Use them at your sole discretion. The community does not verify the licenses nor validity of these tools, so it's your responsibility to verify them. + +### Community Resources + +#### 📚 Dataflow Hub +[hub.dagworks.io](https://hub.dagworks.io/docs/) + +A repository of reusable Apache Hamilton dataflows contributed by the community. Browse and download pre-built dataflows for common use cases. + +**Note**: Hosted by DAGWorks Inc., a company founded by Apache Hamilton's original creators. + +#### 📝 Blog & Tutorials +[blog.dagworks.io](https://blog.dagworks.io/) + +Articles covering Apache Hamilton use cases, design patterns, reference architectures, and best practices. + +**Note**: Maintained by DAGWorks Inc. + +#### 🎥 Video Content +[YouTube @DAGWorks-Inc](https://www.youtube.com/@DAGWorks-Inc) + +Video tutorials, talks, and meetup recordings about Apache Hamilton. + +**Note**: Hosted by DAGWorks Inc. + +#### 🚀 Interactive Tutorials +[tryhamilton.dev](https://www.tryhamilton.dev/) + +Learn Apache Hamilton concepts through interactive, browser-based tutorials. + +--- + +## Contributing to the Ecosystem + +### Adding a New Integration + +If you've created a plugin or integration for Apache Hamilton, we'd love to include it in our ecosystem! + +**For Built-in Integrations** (maintained by the Apache Hamilton project): +1. Create a plugin in the `hamilton/plugins/` directory +2. Add documentation and examples +3. Submit a pull request to the [Apache Hamilton repository](https://github.com/apache/hamilton) +4. Follow the [contribution guidelines](https://github.com/apache/hamilton/blob/main/CONTRIBUTING.md) + +**For External Resources** (maintained by third parties): +1. Submit a pull request to add your resource to this page under "External Resources" +2. Include a clear description and link +3. Ensure your resource is relevant to Apache Hamilton users +4. Your resource must be properly licensed and actively maintained + +### Support & Questions + +- 💬 [Slack Community](https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g) +- 🐛 [GitHub Issues](https://github.com/apache/hamilton/issues) +- 📖 [Documentation](https://hamilton.apache.org) + +--- + +## Stay Updated + +- ⭐ Star us on [GitHub](https://github.com/apache/hamilton) +- 🐦 Follow [@hamilton_os](https://twitter.com/hamilton_os) on Twitter/X +- 📧 Join the [mailing lists](../asf/index.rst) for announcements diff --git a/docs/get-started/learning-resources.md b/docs/get-started/learning-resources.md index 15f20baf..e4423793 100644 --- a/docs/get-started/learning-resources.md +++ b/docs/get-started/learning-resources.md @@ -10,23 +10,19 @@ The [user guide](../concepts/index.rst) gives a complete overview of Apache Hami The [reference documentation](../reference/dataflows/index.rst) details Apache Hamilton's public API. -## ✍ tryhamilton.dev - -The [tryhamilton.dev](https://tryhamilton.dev) website provides interactive tutorials in-browser to learn specific Apache Hamilton concepts. - -## 🛠 Dataflow Hub +## 🌐 Ecosystem & Integrations -The [Apache Hamilton Dataflow Hub](https://hub.dagworks.io/docs/) hosts user-created dataflows that are easy to download and reuse in your project. +The [ecosystem page](../ecosystem/index.md) lists all built-in integrations (pandas, Polars, Spark, etc.) and external community resources. Find reusable dataflows, blog posts, and video tutorials there. -## 💡 Blog +## ✍ tryhamilton.dev -The [DAGWorks Blog](https://blog.dagworks.io/) publishes articles on problems Apache Hamilton helps solve, reference architectures, and new features. +The [tryhamilton.dev](https://tryhamilton.dev) website provides interactive tutorials in-browser to learn specific Apache Hamilton concepts. ## 👋 Slack The [Slack channel](https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g) is the ideal place to ask questions, request features, and give feedback. -## 📣 Talks -See [our youtube for the most up to date recordings](https://www.youtube.com/@DAGWorks-Inc/playlists) - we are slow to list them here. +## 📣 Talks & Videos +See the [ecosystem page](../ecosystem/index.md) for links to video content and conference talks. * 2024-02 Apache Hamilton Meet-up for February * [Recording](https://www.youtube.com/watch?v=ks672Lm0CJo.) @@ -105,7 +101,8 @@ See [our youtube for the most up to date recordings](https://www.youtube.com/@DA ## 📰 External Blogs -For the latest blog posts, see the [DAGWorks's Blog](https://blog.dagworks.io/). + +For external resources including blogs, see the [ecosystem page](../ecosystem/index.md). Here are some notable blog posts about Apache Hamilton: * 2024-03 [RAG: ingestion and chunking using Apache Hamilton and scaling to Ray, Dask, or PySpark](https://blog.dagworks.io/p/rag-ingestion-and-chunking-using) * 2024-02 [A command line tool to improve your development workflow](https://blog.dagworks.io/p/a-command-line-tool-to-improve-your) diff --git a/docs/index.md b/docs/index.md index b88d3b61..f6c5c7e0 100644 --- a/docs/index.md +++ b/docs/index.md @@ -24,10 +24,12 @@ PDF <https://hamilton.apache.org/_static/Hamilton.pdf> ```{toctree} :hidden: True -:caption: Community +:caption: COMMUNITY community/index +ecosystem/index Slack <https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g> +GitHub <https://github.com/apache/hamilton> ``` ```{toctree} @@ -51,13 +53,3 @@ reference/disabling-telemetry.md asf/index ``` - -```{toctree} -:hidden: True -:caption: EXTERNAL RESOURCES - -GitHub <https://github.com/apache/hamilton> -tryhamilton.dev <https://www.tryhamilton.dev/> -Dataflow Hub <https://hub.dagworks.io/docs/> -Blog <https://blog.dagworks.io/> -``` diff --git a/docs/main.md b/docs/main.md index ec5275c2..6d5eb5e0 100644 --- a/docs/main.md +++ b/docs/main.md @@ -20,7 +20,7 @@ The ABC of Apache Hamilton **Facilitate collaboration**. By focusing on functions, Apache Hamilton avoids sprawling code hierarchy and generates flat dataflows. Well-scoped functions make it easier to add features, complete code reviews, debug pipeline failures, and hand-off projects. Visualizations can be generated directly from your code to better understand and document it. Integration with the [Apache Hamilton UI](hamilton-ui/index.rst) allows you to track lineage, catalog code & artifacts, and monitor your dataflows. -**Reduce development time**. Apache Hamilton dataflows are reusable across projects and context (e.g., pipeline vs. web service). The benefits of developing robust and well-tested solutions are multiplied by reusability. Off-the-shelf dataflows are available on the [Apache Hamilton Hub](https://hub.dagworks.io/). +**Reduce development time**. Apache Hamilton dataflows are reusable across projects and context (e.g., pipeline vs. web service). The benefits of developing robust and well-tested solutions are multiplied by reusability. Explore community-contributed dataflows in the [ecosystem](ecosystem/index.md). **Own your platform**. Apache Hamilton helps you integrate the frameworks and tools of your stack. Apache Hamilton's features are easy to extend and customize to your needs. This flexibility enables self-serve designs and ultimately reduces the risks of vendor lock-in. diff --git a/docs/reference/dataflows/index.rst b/docs/reference/dataflows/index.rst index 5dc4871a..3a87f502 100644 --- a/docs/reference/dataflows/index.rst +++ b/docs/reference/dataflows/index.rst @@ -3,8 +3,8 @@ Dataflows ============== Here lies reference documentation for `dataflows` module functions that -integrate with the `hub.dagworks.io <https://hub.dagworks.io>`_ so you can pull off-the-shelf dataflows -and get started quickly with Apache Hamilton. +enable you to discover and use community-contributed dataflows. See the `ecosystem page <../../ecosystem/index.html>`_ +for available dataflow resources. Reference --------- diff --git a/docs/reference/result-builders/PyArrow.rst b/docs/reference/result-builders/PyArrow.rst new file mode 100644 index 00000000..e5878503 --- /dev/null +++ b/docs/reference/result-builders/PyArrow.rst @@ -0,0 +1,8 @@ +===================================== +plugins.h_pyarrow.PyarrowTableResult +===================================== + +.. autoclass:: hamilton.plugins.h_pyarrow.PyarrowTableResult + :special-members: __init__ + :members: + :inherited-members: diff --git a/docs/reference/result-builders/index.rst b/docs/reference/result-builders/index.rst index a20190c4..f90e1d37 100644 --- a/docs/reference/result-builders/index.rst +++ b/docs/reference/result-builders/index.rst @@ -14,4 +14,5 @@ Reference Pandas Polars Dask + PyArrow Custom
