(gravitino-playground) 01/01: Update README.md

jmclean Thu, 14 Nov 2024 18:07:49 -0800

This is an automated email from the ASF dual-hosted git repository.

jmclean pushed a commit to branch justinmclean-patch-1
in repository https://gitbox.apache.org/repos/asf/gravitino-playground.git


commit dd09a18117eace0d97e23e443d3af82ef041fc55
Author: Justin Mclean <jus...@datastrato.com>
AuthorDate: Fri Nov 15 13:07:19 2024 +1100

    Update README.md
---
 README.md | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/README.md b/README.md
index 4e7110a..377b5c4 100644
--- a/README.md
+++ b/README.md
@@ -26,7 +26,7 @@ Depending on your network and computer, startup time may take 
3-5 minutes. Once
 ## Prerequisites
 
 Install Git (optional), Docker, Docker Compose.
-Docker Desktop (or Orbstack) with Kubenetes enabled, and helm CLI are required 
if you use helm-chart to deploy services.
+Docker Desktop (or Orbstack) with Kubernetes enabled and helm CLI is required 
if you use helm-chart to deploy services.
 
 ## System Resource Requirements
 
@@ -34,7 +34,7 @@ Docker Desktop (or Orbstack) with Kubenetes enabled, and helm 
CLI are required i
 
 ## TCP ports used
 
-The playground runs a number of services. The TCP ports used may clash with 
existing services you run, such as MySQL or Postgres.
+The playground runs several services. The TCP ports used may clash with 
existing services you run, such as MySQL or Postgres.
 
 | Docker container      | Ports used             |
 | --------------------- | ---------------------- |
@@ -84,7 +84,7 @@ cd gravitino-playground
 
 Enable Kubernetes in Docker Desktop or Orbstack.
 
-In Project root directory, execute this command:
+In the project root directory, execute this command:
 
 ```
 helm upgrade --install gravitino-playground ./helm-chart/ --create-namespace 
--namespace gravitino-playground --set projectRoot=$(pwd)
@@ -103,7 +103,7 @@ helm upgrade --install gravitino-playground ./helm-chart/ 
--create-namespace --n
 
 ##### Port Forwarding 
 
-To access pods or services at `localhost`, you needs to do these steps:
+To access the pods or services at `localhost`, you need to do these steps:
 
 1. Log in to the Gravitino playground Trino pod using the following command:
 
@@ -118,13 +118,13 @@ SPARK_POD=$(kubectl get pods --namespace 
gravitino-playground -l app=spark -o js
 kubectl exec $SPARK_POD -n gravitino-playground -it -- /bin/bash
 ```
 
-3. Port-forwarding Gravitino Service, so that you can access it at 
`localhost:8090`.
+3. Port-forward the Gravitino service to access it at `localhost:8090`.
 
 ```
 kubectl port-forward svc/gravitino -n gravitino-playground 8090:8090      
 ```
 
-4. Port-forwarding Jupyter Notebook Service, so that you can access it at 
`localhost:8888`.
+4. Port-forward the Jupyter Notebook service to access it at `localhost:8888`.
 
 ```
 kubectl port-forward svc/jupyternotebook -n gravitino-playground 8888:8888
@@ -220,7 +220,7 @@ SHOW TABLES from catalog_hive.company;
 
 ### Cross-catalog queries
 
-In a company, there may be different departments using different data stacks. 
In this example, the HR department uses Apache Hive to store its data and the 
sales department uses PostgreSQL. You can run some interesting queries by 
joining the two departments' data together with Gravitino.
+In a company, there may be different departments using different data stacks. 
In this example, the HR department uses Apache Hive to store its data, and the 
sales department uses PostgreSQL. You can run some interesting queries by 
joining the two departments' data together with Gravitino.
 
 To know which employee has the largest sales amount, run this SQL:
 
@@ -259,7 +259,7 @@ GROUP BY e.employee_id,  given_name, family_name;
 
 ### Using Spark and Trino
 
-You might consider generating data with SparkSQL and then querying this data 
using Trino. Give it a try with Gravitino:
+You might also consider generating data with SparkSQL and then querying this 
data using Trino. Give it a try with Gravitino:
 
 1. Login Spark container and execute the SQLs:
 
@@ -288,13 +288,13 @@ INSERT OVERWRITE TABLE employees 
PARTITION(department='Marketing') VALUES (3, 'M
 SELECT * FROM catalog_hive.product.employees WHERE department = 'Engineering';
 ```
 
-The demo is located in the `jupyter` folder, you can open the 
`gravitino-spark-trino-example.ipynb`
+The demo is located in the `jupyter` folder, and you can open the 
`gravitino-spark-trino-example.ipynb`
 demo via Jupyter Notebook by [http://localhost:18888](http://localhost:18888).
 
 ### Using Apache Iceberg REST service
 
-If you want to migrate your business from Hive to Iceberg. Some tables will 
use Hive, and the other tables will use Iceberg.
-Gravitino provides an Iceberg REST catalog service, too. You can use Spark to 
access REST catalog to write the table data.
+Suppose you want to migrate your business from Hive to Iceberg. Some tables 
will use Hive, and the other tables will use Iceberg.
+Gravitino provides an Iceberg REST catalog service, too. You can use Spark to 
access the REST catalog to write the table data.
 Then, you can use Trino to read the data from the Hive table joining the 
Iceberg table.
 
 `spark-defaults.conf` is as follows (It's already configured in the 
playground):
@@ -307,7 +307,7 @@ spark.sql.catalog.catalog_rest.uri 
http://gravitino:9001/iceberg/
 spark.locality.wait.node 0
 ```
 
-Please note that `catalog_rest` in SparkSQL and `catalog_iceberg` in Gravitino 
and Trino share the same Iceberg JDBC backend, which implies that they can 
access the same dataset.
+Please note that `catalog_rest` in SparkSQL and `catalog_iceberg` in Gravitino 
and Trino share the same Iceberg JDBC backend, implying they can access the 
same dataset.
 
 1. Login Spark container and execute the steps.
 
@@ -351,20 +351,20 @@ demo via Jupyter Notebook by 
[http://localhost:18888](http://localhost:18888).
 
 ### Using Gravitino with LlamaIndex
 
-Gravitino playground also provides a simple RAG demo with LlamaIndex. This 
demo will show you the
-ability of using Gravitino to manage both tabular and non-tabular dataset, 
connecting to
+The Gravitino Playground also provides a simple RAG demo with LlamaIndex. This 
demo will show you the
+the ability to use Gravitino to manage both tabular and non-tabular datasets, 
connecting to
 LlamaIndex as a unified data source, then use LlamaIndex and LLM to query both 
tabular and
 non-tabular data with one natural language query.
 
-The demo is located in the `jupyter` folder, you can open the 
`gravitino_llama_index_demo.ipynb`
+The demo is located in the `jupyter` folder, and you can open the 
`gravitino_llama_index_demo.ipynb`
 demo via Jupyter Notebook by [http://localhost:18888](http://localhost:18888).
 
 The scenario of this demo is that basic structured city statistics data is 
stored in MySQL, and
 detailed city introductions are stored in PDF files. The user wants to know 
the answers to the
 cities both in the structured data and the PDF files.
 
-In this demo, you will use Gravitino to manage the MySQL table using 
relational catalog, pdf
-files using fileset catalog, treated Gravitino as a unified data source for 
LlamaIndex to build
+In this demo, you will use Gravitino to manage the MySQL table using a 
relational catalog, pdf
+files using a fileset catalog, treating Gravitino as a unified data source for 
LlamaIndex to build
 indexes on both tabular and non-tabular data. Then you will use LLM to query 
the data with natural
 language queries.

(gravitino-playground) 01/01: Update README.md

Reply via email to