This is an automated email from the ASF dual-hosted git repository.
xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 072d3e5b713 [DOCS][site] add hudi-rs page and ecosystem links (#12367)
072d3e5b713 is described below
commit 072d3e5b7134b0ed53f9d4ad229cd70f26f6875a
Author: Shiyan Xu <[email protected]>
AuthorDate: Thu Nov 28 10:51:58 2024 -1000
[DOCS][site] add hudi-rs page and ecosystem links (#12367)
---
website/docs/overview.mdx | 8 ++
website/docs/python-rust-quick-start-guide.md | 119 +++++++++++++++++++++
website/sidebars.js | 1 +
website/src/pages/ecosystem.md | 2 +
website/versioned_docs/version-0.15.0/overview.mdx | 14 ++-
.../python-rust-quick-start-guide.md | 119 +++++++++++++++++++++
.../version-0.15.0-sidebars.json | 1 +
7 files changed, 261 insertions(+), 3 deletions(-)
diff --git a/website/docs/overview.mdx b/website/docs/overview.mdx
index 013ecc6dc4b..009de571fb1 100644
--- a/website/docs/overview.mdx
+++ b/website/docs/overview.mdx
@@ -28,7 +28,11 @@ Apache Hudi can easily be used on any [cloud storage
platform](/docs/cloud).
Hudi’s advanced performance optimizations, make analytical workloads faster
with any of
the popular query engines including, Apache Spark, Flink, Presto, Trino, Hive,
etc.
+[Hudi-rs](https://github.com/apache/hudi-rs) is the native Rust implementation
for Apache Hudi, which also provides bindings to Python. It
+expands the use of Apache Hudi for a diverse range of use cases in the non-JVM
ecosystems.
+
## Core Concepts to Learn
+
If you are relatively new to Apache Hudi, it is important to be familiar with
a few core concepts:
- [Hudi Timeline](/docs/next/timeline) – How Hudi manages transactions and
other table services
- [Hudi File Layout](/docs/next/file_layouts) - How the files are laid out on
storage
@@ -40,11 +44,15 @@ See more in the "Concepts" section of the docs.
Take a look at recent [blog posts](/blog) that go in depth on certain topics
or use cases.
## Getting Started
+
Sometimes the fastest way to learn is by doing. Try out these Quick Start
resources to get up and running in minutes:
+
- [Spark Quick Start Guide](/docs/quick-start-guide) – if you primarily use
Apache Spark
- [Flink Quick Start Guide](/docs/flink-quick-start-guide) – if you primarily
use Apache Flink
+- [Python/Rust Quick Start Guide
(Hudi-rs)](/docs/python-rust-quick-start-guide) - if you primarily use Python
or Rust
If you want to experience Apache Hudi integrated into an end to end demo with
Kafka, Spark, Hive, Presto, etc, try out the Docker Demo:
+
- [Docker Demo](/docs/docker_demo)
## Connect With The Community
diff --git a/website/docs/python-rust-quick-start-guide.md
b/website/docs/python-rust-quick-start-guide.md
new file mode 100644
index 00000000000..73f22a1c673
--- /dev/null
+++ b/website/docs/python-rust-quick-start-guide.md
@@ -0,0 +1,119 @@
+---
+title: "Python/Rust Quick Start (Hudi-rs)"
+toc: true
+last_modified_at: 2024-11-28T12:53:57+08:00
+---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+This guide will help you get started with
[hudi-rs](https://github.com/apache/hudi-rs), a native Rust library for Apache
Hudi with Python bindings. Learn how to install, set up, and perform basic
operations using both Python and Rust interfaces.
+
+## Installation
+
+```bash
+# Python
+pip install hudi
+
+# Rust
+cargo add hudi
+```
+
+## Basic Usage
+
+:::note
+Currently, write capabilities and reading from MOR tables are not supported.
+
+The examples below expect a Hudi table exists at `/tmp/trips_table`, created
using the [quick start guide](/docs/quick-start-guide).
+:::
+
+### Python Example
+
+```python
+from hudi import HudiTableBuilder
+import pyarrow as pa
+
+hudi_table = (
+ HudiTableBuilder
+ .from_base_uri("/tmp/trips_table")
+ .build()
+)
+
+# Read with partition filters
+records = hudi_table.read_snapshot(filters=[("city", "=", "san_francisco")])
+
+# Convert to PyArrow table
+arrow_table = pa.Table.from_batches(records)
+result = arrow_table.select(["rider", "city", "ts", "fare"])
+```
+
+### Rust Example (with DataFusion)
+
+1. Set up your project:
+
+```bash
+cargo new my_project --bin && cd my_project
+cargo add tokio@1 datafusion@42
+cargo add hudi --features datafusion
+```
+
+1. Add code to `src/main.rs`:
+
+```rust
+use std::sync::Arc;
+use datafusion::error::Result;
+use datafusion::prelude::{DataFrame, SessionContext};
+use hudi::HudiDataSource;
+
+#[tokio::main]
+async fn main() -> Result<()> {
+ let ctx = SessionContext::new();
+ let hudi = HudiDataSource::new_with_options("/tmp/trips_table", []).await?;
+ ctx.register_table("trips_table", Arc::new(hudi))?;
+ // Read with partition filters
+ let df: DataFrame = ctx.sql("SELECT * from trips_table where city =
'san_francisco'").await?;
+ df.show().await?;
+ Ok(())
+}
+```
+
+## Cloud Storage Integration
+
+### Python
+
+```python
+from hudi import HudiTableBuilder
+
+hudi_table = (
+ HudiTableBuilder
+ .from_base_uri("s3://bucket/trips_table")
+ .with_option("aws_region", "us-west-2")
+ .build()
+)
+```
+
+### Rust
+
+```rust
+use hudi::HudiDataSource;
+
+let hudi = HudiDataSource::new_with_options(
+ "s3://bucket/trips_table",
+ [("aws_region", "us-west-2")]
+).await?;
+```
+
+### Supported Cloud Storage
+
+- AWS S3 (`s3://`)
+- Azure Storage (`az://`)
+- Google Cloud Storage (`gs://`)
+
+Set appropriate environment variables (`AWS_*`, `AZURE_*`, or `GOOGLE_*`) for
authentication, or pass through the `option()` API.
+
+## Read with Timestamp
+
+Add timestamp option for time-travel queries:
+
+```python
+.with_option("hoodie.read.as.of.timestamp", "20241122010827898")
+```
diff --git a/website/sidebars.js b/website/sidebars.js
index 56d3bbe05fd..75486d5e669 100644
--- a/website/sidebars.js
+++ b/website/sidebars.js
@@ -15,6 +15,7 @@ module.exports = {
'overview',
'quick-start-guide',
'flink-quick-start-guide',
+ 'python-rust-quick-start-guide',
'docker_demo',
'use_cases',
],
diff --git a/website/src/pages/ecosystem.md b/website/src/pages/ecosystem.md
index dcda0de53ab..52857120b26 100644
--- a/website/src/pages/ecosystem.md
+++ b/website/src/pages/ecosystem.md
@@ -37,3 +37,5 @@ In such cases, you can leverage another tool like Apache
Spark or Apache Flink t
| Apache Doris |
[Read](https://doris.apache.org/docs/ecosystem/external-table/hudi-external-table/)
| |
| Starrocks |
[Read](https://docs.starrocks.io/docs/data_source/catalog/hudi_catalog/)
| [Demo with HMS +
Min.IO](https://github.com/StarRocks/demo/tree/master/documentation-samples/hudi)
|
| Dremio |
| |
+| Daft |
[Read](https://www.getdaft.io/projects/docs/en/stable/user_guide/integrations/hudi.html)
| |
+| Ray Data |
[Read](https://docs.ray.io/en/master/data/api/input_output.html#hudi)
| |
diff --git a/website/versioned_docs/version-0.15.0/overview.mdx
b/website/versioned_docs/version-0.15.0/overview.mdx
index 0abd4219987..009de571fb1 100644
--- a/website/versioned_docs/version-0.15.0/overview.mdx
+++ b/website/versioned_docs/version-0.15.0/overview.mdx
@@ -13,7 +13,7 @@ how to learn more to get started.
## What is Apache Hudi
Apache Hudi (pronounced “hoodie”) is the next generation [streaming data lake
platform](/blog/2021/07/21/streaming-data-lake-platform).
-Apache Hudi brings core warehouse and database functionality directly to a
data lake. Hudi provides [tables](/docs/next/sql_ddl),
+ Hudi brings core warehouse and database functionality directly to a data
lake. Hudi provides [tables](/docs/next/sql_ddl),
[transactions](/docs/next/timeline), [efficient
upserts/deletes](/docs/next/write_operations), [advanced
indexes](/docs/next/indexing),
[ingestion services](/docs/hoodie_streaming_ingestion), data
[clustering](/docs/next/clustering)/[compaction](/docs/next/compaction)
optimizations,
and [concurrency](/docs/next/concurrency_control) all while keeping your data
in open source file formats.
@@ -28,7 +28,11 @@ Apache Hudi can easily be used on any [cloud storage
platform](/docs/cloud).
Hudi’s advanced performance optimizations, make analytical workloads faster
with any of
the popular query engines including, Apache Spark, Flink, Presto, Trino, Hive,
etc.
+[Hudi-rs](https://github.com/apache/hudi-rs) is the native Rust implementation
for Apache Hudi, which also provides bindings to Python. It
+expands the use of Apache Hudi for a diverse range of use cases in the non-JVM
ecosystems.
+
## Core Concepts to Learn
+
If you are relatively new to Apache Hudi, it is important to be familiar with
a few core concepts:
- [Hudi Timeline](/docs/next/timeline) – How Hudi manages transactions and
other table services
- [Hudi File Layout](/docs/next/file_layouts) - How the files are laid out on
storage
@@ -40,11 +44,15 @@ See more in the "Concepts" section of the docs.
Take a look at recent [blog posts](/blog) that go in depth on certain topics
or use cases.
## Getting Started
+
Sometimes the fastest way to learn is by doing. Try out these Quick Start
resources to get up and running in minutes:
+
- [Spark Quick Start Guide](/docs/quick-start-guide) – if you primarily use
Apache Spark
- [Flink Quick Start Guide](/docs/flink-quick-start-guide) – if you primarily
use Apache Flink
+- [Python/Rust Quick Start Guide
(Hudi-rs)](/docs/python-rust-quick-start-guide) - if you primarily use Python
or Rust
If you want to experience Apache Hudi integrated into an end to end demo with
Kafka, Spark, Hive, Presto, etc, try out the Docker Demo:
+
- [Docker Demo](/docs/docker_demo)
## Connect With The Community
@@ -53,7 +61,7 @@ resources to learn more, engage, and get help as you get
started.
### Join in on discussions
See all the ways to [engage with the community here](/community/get-involved).
Two most popular methods include:
-- <SlackCommunity title="Hudi Slack Channel" />
+- <SlackCommunity title="Hudi Slack Channel"/>
- [Hudi mailing list](mailto:[email protected]) - (send any msg
to subscribe)
### Come to Office Hours for help
@@ -67,5 +75,5 @@ Apache Hudi welcomes you to join in on the fun and make a
lasting impact on the
[contributor guide](/contribute/how-to-contribute) to learn more, and don’t
hesitate to directly reach out to any of the
current committers to learn more.
-Have an idea, an ask, or feedback about a pain-point, but don’t have time to
contribute? Join the <SlackCommunity title="Hudi Slack Channel" />
+Have an idea, an ask, or feedback about a pain-point, but don’t have time to
contribute? Join the <SlackCommunity title="Hudi Slack Channel"/>
and share!
diff --git
a/website/versioned_docs/version-0.15.0/python-rust-quick-start-guide.md
b/website/versioned_docs/version-0.15.0/python-rust-quick-start-guide.md
new file mode 100644
index 00000000000..73f22a1c673
--- /dev/null
+++ b/website/versioned_docs/version-0.15.0/python-rust-quick-start-guide.md
@@ -0,0 +1,119 @@
+---
+title: "Python/Rust Quick Start (Hudi-rs)"
+toc: true
+last_modified_at: 2024-11-28T12:53:57+08:00
+---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+This guide will help you get started with
[hudi-rs](https://github.com/apache/hudi-rs), a native Rust library for Apache
Hudi with Python bindings. Learn how to install, set up, and perform basic
operations using both Python and Rust interfaces.
+
+## Installation
+
+```bash
+# Python
+pip install hudi
+
+# Rust
+cargo add hudi
+```
+
+## Basic Usage
+
+:::note
+Currently, write capabilities and reading from MOR tables are not supported.
+
+The examples below expect a Hudi table exists at `/tmp/trips_table`, created
using the [quick start guide](/docs/quick-start-guide).
+:::
+
+### Python Example
+
+```python
+from hudi import HudiTableBuilder
+import pyarrow as pa
+
+hudi_table = (
+ HudiTableBuilder
+ .from_base_uri("/tmp/trips_table")
+ .build()
+)
+
+# Read with partition filters
+records = hudi_table.read_snapshot(filters=[("city", "=", "san_francisco")])
+
+# Convert to PyArrow table
+arrow_table = pa.Table.from_batches(records)
+result = arrow_table.select(["rider", "city", "ts", "fare"])
+```
+
+### Rust Example (with DataFusion)
+
+1. Set up your project:
+
+```bash
+cargo new my_project --bin && cd my_project
+cargo add tokio@1 datafusion@42
+cargo add hudi --features datafusion
+```
+
+1. Add code to `src/main.rs`:
+
+```rust
+use std::sync::Arc;
+use datafusion::error::Result;
+use datafusion::prelude::{DataFrame, SessionContext};
+use hudi::HudiDataSource;
+
+#[tokio::main]
+async fn main() -> Result<()> {
+ let ctx = SessionContext::new();
+ let hudi = HudiDataSource::new_with_options("/tmp/trips_table", []).await?;
+ ctx.register_table("trips_table", Arc::new(hudi))?;
+ // Read with partition filters
+ let df: DataFrame = ctx.sql("SELECT * from trips_table where city =
'san_francisco'").await?;
+ df.show().await?;
+ Ok(())
+}
+```
+
+## Cloud Storage Integration
+
+### Python
+
+```python
+from hudi import HudiTableBuilder
+
+hudi_table = (
+ HudiTableBuilder
+ .from_base_uri("s3://bucket/trips_table")
+ .with_option("aws_region", "us-west-2")
+ .build()
+)
+```
+
+### Rust
+
+```rust
+use hudi::HudiDataSource;
+
+let hudi = HudiDataSource::new_with_options(
+ "s3://bucket/trips_table",
+ [("aws_region", "us-west-2")]
+).await?;
+```
+
+### Supported Cloud Storage
+
+- AWS S3 (`s3://`)
+- Azure Storage (`az://`)
+- Google Cloud Storage (`gs://`)
+
+Set appropriate environment variables (`AWS_*`, `AZURE_*`, or `GOOGLE_*`) for
authentication, or pass through the `option()` API.
+
+## Read with Timestamp
+
+Add timestamp option for time-travel queries:
+
+```python
+.with_option("hoodie.read.as.of.timestamp", "20241122010827898")
+```
diff --git a/website/versioned_sidebars/version-0.15.0-sidebars.json
b/website/versioned_sidebars/version-0.15.0-sidebars.json
index d69c2f62e40..b61a09c74ef 100644
--- a/website/versioned_sidebars/version-0.15.0-sidebars.json
+++ b/website/versioned_sidebars/version-0.15.0-sidebars.json
@@ -8,6 +8,7 @@
"overview",
"quick-start-guide",
"flink-quick-start-guide",
+ "python-rust-quick-start-guide",
"docker_demo",
"use_cases"
]