This is an automated email from the ASF dual-hosted git repository. jshao pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/gravitino.git
The following commit(s) were added to refs/heads/main by this push: new 548ec8db6f [#6600] Docs: restructure the getting started docs (#6596) 548ec8db6f is described below commit 548ec8db6fa1b1f7ca8b1b5fd5bfe4d2fb5b02da Author: Qiming Teng <ten...@outlook.com> AuthorDate: Thu Mar 6 21:21:28 2025 +0800 [#6600] Docs: restructure the getting started docs (#6596) Related: #6600 This PR restructures the getting-started tutorial. Changes include: - push the getting started tutorial into a separate directory; - remove duplicated instructions on install and configuration; - rework the flow of information for clarity and conciseness. Co-authored-by: Jerry Shao <jerrys...@datastrato.com> --- docs/getting-started.md | 339 ------------------------------ docs/getting-started/aws-remote-access.md | 36 ++++ docs/getting-started/hive.md | 33 +++ docs/getting-started/index.md | 250 ++++++++++++++++++++++ docs/getting-started/playground.md | 23 ++ docs/index.md | 62 +++--- docs/trino-connector/installation.md | 2 +- docs/webui.md | 4 +- 8 files changed, 379 insertions(+), 370 deletions(-) diff --git a/docs/getting-started.md b/docs/getting-started.md deleted file mode 100644 index f729d418ac..0000000000 --- a/docs/getting-started.md +++ /dev/null @@ -1,339 +0,0 @@ ---- -title: "Getting started with Apache Gravitino" -slug: /getting-started -license: "This software is licensed under the Apache License version 2." ---- - -There are several options for getting started with Apache Gravitino. Installing and configuring Hive and Trino can be a little complex, so if you are unfamiliar with the technologies it would be best to use Docker. - -If you want to download and install Gravitino: - - - on AWS, see [Getting started on Amazon Web Services](#getting-started-on-amazon-web-services) - - Google Cloud Platform, see [Getting started on Google Cloud Platform](#getting-started-on-google-cloud-platform) - - locally, see [Getting started locally](#getting-started-locally) - -If you have your own Apache Gravitino setup and want to use Apache Hive: - - - on AWS or Google Cloud Platform, see [Installing Apache Hive on AWS or Google Cloud Platform](#installing-apache-hive-on-aws-or-google-cloud-platform) - - locally, see [Installing Apache Hive locally](#installing-apache-hive-locally) - -If you prefer to get started quickly and use Docker for Gravitino, Apache Hive, Trino, and others: - - - on AWS or Google Cloud Platform, see [Installing Gravitino playground on AWS or Google Cloud Platform](#installing-apache-gravitino-playground-on-aws-or-google-cloud-platform) - - locally, see [Installing Gravitino playground locally](#installing-apache-gravitino-playground-locally) - -If you are using AWS and want to access the instance remotely, be sure to read [Accessing Gravitino on AWS externally](#accessing-apache-gravitino-on-aws-externally) - -### Index - -1. **Installation methods** - - Explore different installation methods, from using Docker to setting up Gravitino on cloud platforms or locally. - -2. **Java Development Kit (JDK)** - - Ensure you have the required Java Development Kit (JDK) installed to run Gravitino successfully. - -3. **Configuring and starting Gravitino** - - Learn how to configure Gravitino, install it from binary releases or Docker images, and start the Gravitino server. - -4. **Getting started on AWS and GCP** - - Detailed steps for setting up Gravitino on Amazon Web Services (AWS) and Google Cloud Platform (GCP), including instance setup, Java installation, and Gravitino deployment. - -5. **Getting started locally** - - Instructions for using Gravitino locally on macOS or Linux, covering JDK installation and Gravitino setup. - -6. **Integrating with Apache Hive** - - Information on installing and configuring Apache Hive on AWS, GCP, and locally. Docker container options for quick setup are also provided. - -7. **Gravitino Playground** - - Explore a bundled Docker image for a Gravitino playground, incorporating tools like Apache Hive, Apache Hadoop, Trino, MySQL, and PostgreSQL. - -8. **Using REST to interact with Gravitino** - - Examples of interacting with Gravitino via REST commands, demonstrating how to create and modify metadata. - -9. **Accessing Gravitino on AWS externally** - - Guidelines for accessing Gravitino externally when deployed on AWS, including necessary configurations and considerations. - -10. **Next steps** - - Concluding thoughts and suggested next steps for users who have completed the setup. - - -## Getting started on Amazon Web Services - -To begin using Gravitino on AWS, follow these steps: - -1. In the AWS console, launch a new instance. Select `Ubuntu` as the operating system and `t2.xlarge` as the instance type. Create a key pair named *Gravitino.pem* for SSH access and download it. Allow HTTP and HTTPS traffic if you want to connect to the instance remotely. Set the Elastic Block Store storage to 20GiB. Leave all other settings at their defaults. Other operating systems and instance types may work, but they have yet to be fully tested. - -2. Start the instance and connect to it via SSH using the downloaded .pem file: - - ```shell - ssh ubuntu@<IP_address> -i ~/Downloads/Gravitino.pem - ``` - - **Note**: you may need to adjust the permissions on your .pem file using `chmod 400` to enable SSH connections. - -3. Update the Ubuntu OS to ensure it's up-to-date: - - ```shell - sudo apt update - sudo apt upgrade - ``` - - You may need to reboot the instance for all changes to take effect. - -4. Install the required Java Development Kit. Gravitino supports running on Java 8, - 11 and 17, so you can install any of them: - - ```shell - sudo apt install openjdk-<version>-jdk-headless - ``` - - Verify the Java version with: - - ```shell - java -version - ``` - - You should see information about the OpenJDK version. - -5. Install Gravitino on the instance: - - You can install Gravitino from the binary release package or Docker image. Follow - [how-to-install](./how-to-install.md) to install Gravitino. - - Or you can install Gravitino from scratch. Follow [how-to-build](./how-to-build.md) and [how-to-install](./how-to-install.md). - -6. Start Gravitino using the gravitino.sh script: - - ```shell - <path-to-gravitino>/bin/gravitino.sh start - ``` - -## Getting started on Google Cloud Platform - -To begin using Gravitino on GCP, follow these steps: - -1. In the Google Cloud console, launch a new instance. Select `e2-standard-4` as the instance type and 20 GB for the boot disk size. Allow HTTP and HTTPS traffic if you want to connect to the instance remotely. Leave all other settings as their defaults. Other operating systems and instance types may work, but they have yet to be fully tested. - -2. Start the instance and connect to it via the SSH-in-browser tool. - -3. Update the Debian OS to ensure it's up-to-date: - - ```shell - sudo apt update - sudo apt upgrade - ``` - - You may need to reboot the instance for all changes to take effect. - -4. Install the required Java Development Kit. Gravitino supports running on Java 8, - 11 and 17, so you can install any of them: - - ```shell - wget -O - https://apt.corretto.aws/corretto.key | sudo gpg --dearmor -o /usr/share/keyrings/corretto-keyring.gpg && echo "deb [signed-by=/usr/share/keyrings/corretto-keyring.gpg] https://apt.corretto.aws stable main" | sudo tee /etc/apt/sources.list.d/corretto.list - sudo apt-get update - sudo apt-get install -y java-<version>-amazon-corretto-jdk - ``` - - Verify the Java version with: - - ```shell - java -version - ``` - - You should see information about the OpenJDK version. - -5. Install Gravitino on the instance: - - You can install Gravitino from the binary release package or Docker image. Follow - [how-to-install](./how-to-install). - - Or you can install Gravitino from scratch. Follow [how-to-build](./how-to-build.md) and [how-to-install](./how-to-install.md). - -6. Start Gravitino using the gravitino.sh script: - - ```shell - <path-to-gravitino>/bin/gravitino.sh start - ``` - -## Getting started locally - -To use Gravitino locally on macOS or Linux, follow these similar steps: - -1. Install the required Java Development Kit. Gravitino supports running on Java 8, so - 11 and 17, you can install any of them. Using [sdkman](https://sdkman.io/), for example: - - ```shell - sdk install java <version> - ``` - - You can also use different package managers to install JDK, for example, - [Homebrew](https://brew.sh/) on macOS, `apt` on Ubuntu/Debian, and `yum` on CentOS/RedHat. - -2. Install Gravitino: - - You can install Gravitino from the binary release package or Docker image, please follow the - [how-to-install](./how-to-install.md) to install Gravitino. - - Or, you can install Gravitino from scratch, follow [how-to-build](./how-to-build.md) and [how-to-install](./how-to-install.md). - -3. Start Gravitino using the gravitino.sh script in the binary release package or Docker image: - - ```shell - ${GRAVITINO_HOME}/bin/gravitino.sh start - ``` - -## Installing Apache Hive on AWS or Google Cloud Platform - -If you already have Apache Hive and Apache Hadoop in your environment, you can ignore this section -and use them with Gravitino. - -To install Apache Hive and Hadoop on AWS or Google Cloud Platform manually, follow [Apache Hive](https://cwiki.apache.org/confluence/display/Hive/) and -[Hadoop](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html). - -Installing and configuring Hive can be a little complex. If you don't already have Hive set up and running you can use the Docker container Datastrato provides to get Gravitino up and running. - -Follow these instructions for setting up [Docker on Ubuntu](https://docs.docker.com/engine/install/ubuntu/). - -```shell -sudo docker run --name gravitino-container -d -p 9000:9000 -p 8088:8088 -p 50010:50010 -p 50070:50070 -p 50075:50075 -p 10000:10000 -p 10002:10002 -p 8888:8888 -p 9083:9083 -p 8022:22 apache/gravitino-playground:hive:2.7.3 -``` - -Once Docker is installed, you can start the container with the command: - -```shell -sudo docker start gravitino-container -``` - -## Installing Apache Hive locally - -The same steps for installing Hive on AWS or Google Cloud Platform apply when installing it locally. Follow [Installing Apache Hive on AWS or Google Cloud Platform](#installing-apache-hive-on-aws-or-google-cloud-platform). - -## Installing Apache Gravitino playground on AWS or Google Cloud Platform - -Gravitino provides a bundle of Docker images to launch a Gravitino playground, which -includes Apache Hive, Apache Hadoop, Trino, MySQL, PostgreSQL, and Gravitino. You can use -Docker Compose to start them all. - -Installing Docker and Docker Compose is a requirement for using the playground. - -```shell -sudo apt install docker docker-compose -sudo gpasswd -a $USER docker -newgrp docker -``` - -You can install and run all the programs as Docker containers by using the -[gravitino-playground](https://github.com/apache/gravitino-playground). For details about -how to run the playground, see [how-to-use-the-playground](./how-to-use-the-playground.md) - -## Installing Apache Gravitino playground locally - -The same steps for installing the playground on AWS or Google Cloud Platform apply when installing it locally. Follow [Installing Gravitino playground on AWS or Google Cloud Platform](#installing-apache-gravitino-playground-on-aws-or-google-cloud-platform). - -## Using REST to interact with Apache Gravitino - -After starting the Gravitino distribution, issue REST commands to create and modify metadata. While you are using `localhost` in these examples, run these commands remotely via a hostname or IP address once you establish correct access. - -1. Create a Metalake: - - ```shell - curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ - -H "Content-Type: application/json" \ - -d '{"name":"metalake","comment":"Test metalake"}' http://localhost:8090/api/metalakes - ``` - - Verify the MetaLake's creation: - - ```shell - curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ - -H "Content-Type: application/json" \ - http://localhost:8090/api/metalakes - - curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ - -H "Content-Type: application/json" \ - http://localhost:8090/api/metalakes/metalake - ``` - - Note that if you request a Metalake that doesn't exist, you get a *NoSuchMetalakeException* error. - - ```shell - curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ - -H "Content-Type: application/json" \ - http://localhost:8090/api/metalakes/none - ``` - -2. Create a catalog in Hive: - - First, list the current catalogs to verify that no catalogs exist. - - ```shell - curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ - -H "Content-Type: application/json" \ - http://localhost:8090/api/metalakes/metalake/catalogs - ``` - - Create a new Hive catalog. - - ```shell - curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ - -H "Content-Type: application/json" \ - -d '{"name":"test","comment":"Test catalog", "type":"RELATIONAL", "provider":"hive", "properties":{"metastore.uris":"thrift://localhost:9083"}}' \ - http://localhost:8090/api/metalakes/metalake/catalogs - ``` - - Verify creation of the catalog. - - ```shell - curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ - -H "Content-Type: application/json" \ - http://localhost:8090/api/metalakes/metalake/catalogs - ``` - - Note that the metastore.uris property is used for the Hive catalog and needs updating if you change your configuration. - -## Accessing Apache Gravitino on AWS externally - -When you deploy Gravitino on AWS, accessing it externally requires some additional configuration due to how AWS networking works. - -AWS assigns your instance a public IP address, but Gravitino can't bind to that address. To resolve this, you must find the internal IP address assigned to your AWS instance. You can locate the private IP address in the AWS console, or by running the following command: - -```shell -ip a -``` - -Once you have identified the internal address, edit the Gravitino configuration to bind to that -address. Open the file `<path-to-gravitino>/conf/gravitino.conf` and modify the `gravitino.server. -webserver.host` parameter from `127.0.0.1` to your AWS instance's private IP4 address; or you can use '0.0.0.0'. '0.0.0.0' in this context means the host's IP address. Restart the Gravitino server for the change to take effect. - -```shell -<path-to-gravitino>/bin/gravitino.sh restart -``` - -You'll also need to open port 8090 in the security group of your AWS instance to access Gravitino. To access Hive you need to open port 10000 in the security group. - -After completing these steps, you should be able to access the Gravitino REST interface from either the command line or a web browser on your local computer. You can also connect to Hive via DBeaver or any other database IDE. - -## Next steps - -1. **Explore documentation:** - - Delve deeper into the Gravitino documentation for advanced features and configuration options. - - Check out https://gravitino.apache.org/docs/latest - -2. **Community engagement:** - - Join the Gravitino community forums to connect with other users, share experiences, and seek assistance if needed. - - Check out our GitHub repository: https://github.com/apache/gravitino - - Check out our Slack channel in ASF Slack: https://the-asf.slack.com - -3. **Read our blogs:** - - Check out: https://gravitino.apache.org/blog - -4. **Continuous updates:** - - Stay informed about Gravitino updates and new releases to benefit from the latest features, optimizations, and security - enhancements. - - Check out our Website: https://gravitino.apache.org - - -This document is just the beginning. You're welcome to customize your Gravitino setup based on your requirements and to explore the vast possibilities this powerful tool offers. If you encounter any issues or have questions, you can always connect with the Gravitino community for assistance. - -<img src="https://analytics.apache.org/matomo.php?idsite=62&rec=1&bots=1&action_name=GettingStarted" alt="" /> - diff --git a/docs/getting-started/aws-remote-access.md b/docs/getting-started/aws-remote-access.md new file mode 100644 index 0000000000..a66289dbc5 --- /dev/null +++ b/docs/getting-started/aws-remote-access.md @@ -0,0 +1,36 @@ +--- +title: "Remote Access for Apache Gravitino on AWS" +slug: /getting-started/aws-remote-access +license: "This software is licensed under the Apache License version 2." +--- + +## Accessing Apache Gravitino on AWS externally + +When you deploy Gravitino on AWS, accessing it externally requires +some additional configuration due to how AWS networking works. + +AWS assigns your instance a public IP address, but Gravitino can't bind to that address. +To resolve this, you must find the internal IP address assigned to your AWS instance. +You can locate the private IP address in the AWS console, or by running the following command: + +```shell +ip a +``` + +Once you have identified the internal address, edit the Gravitino configuration to bind to that address. +Open the file `<gravitino-home>/conf/gravitino.conf` and change the `gravitino.server.webserver.host` +parameter from `127.0.0.1` to your AWS instance's private IP4 address; +or you can use '0.0.0.0'. '0.0.0.0' in this context means the host's IP address. +Restart the Gravitino server for the change to take effect. + +```shell +<gravitino-home>/bin/gravitino.sh restart +``` + +You'll also need to open port 8090 in the security group of your AWS instance to access Gravitino. +To access Hive you need to open port 10000 in the security group. + +After completing these steps, you should be able to access the Gravitino REST interface +from either the command line or a web browser on your local computer. +You can also connect to Hive via DBeaver or any other database IDE. + diff --git a/docs/getting-started/hive.md b/docs/getting-started/hive.md new file mode 100644 index 0000000000..ceedf92bcf --- /dev/null +++ b/docs/getting-started/hive.md @@ -0,0 +1,33 @@ +--- +title: "Installing Apache Hive" +slug: /getting-started/hive +license: "This software is licensed under the Apache License version 2." +--- + +To install Apache Hive and Hadoop on Google Cloud Platform manually, +follow [Apache Hive](https://cwiki.apache.org/confluence/display/Hive/) and +[Hadoop](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html). + +Installing and configuring Hive can be a little complex. +If you don't already have Hive set up and running, you can use the Docker container +Datastrato provides to get Gravitino up and running. + +Follow these instructions for setting up +[Docker on Ubuntu](https://docs.docker.com/engine/install/ubuntu/). + +```shell +sudo docker run --name gravitino-container -d \ + -p 9000:9000 -p 8088:8088 -p 50010:50010 -p 50070:50070 \ + -p 50075:50075 -p 10000:10000 -p 10002:10002 -p 8888:8888 \ + -p 9083:9083 -p 8022:22 \ + apache/gravitino-playground:hive:2.7.3 +``` + +Once Docker is installed, you can start the container with the command: + +```shell +sudo docker start gravitino-container +``` + +<!--TODO: Add some instructions for non-docker environment--> + diff --git a/docs/getting-started/index.md b/docs/getting-started/index.md new file mode 100644 index 0000000000..934812b959 --- /dev/null +++ b/docs/getting-started/index.md @@ -0,0 +1,250 @@ +--- +title: "Getting started with Apache Gravitino" +slug: /getting-started +license: "This software is licensed under the Apache License version 2." +--- + +There are several options for getting started with Apache Gravitino. + +<!--Docker option--> +Installing and configuring Hive and Trino can be a little complex. +If you are unfamiliar with the technologies, using Docker might be a good choice. +There are pre-packaged containers for Gravitino, Apache Hive, Apache Hadoop, +Trino, MySQL, PostgesSQL, and others, +check [installing Gravitino playground](./playground.md) for more details. + +<!--Build from source--> +This page guides you through the process to download and install Gravitino +from source. + +1. [Prepare environment](#environment-preparation) + - Deploy and run Gravitino on [Amazon Web Service (AWS)](#aws) + - Deploy and run Gravitino on [Google Compute Platform (GCP)](#gcp) + - Run Gravitino on [your own machine](#local-workstation) +1. [Install Gravitino](#install-gravitino) +1. [Start Gravitino](#start-gravitino) +1. [Install Apache Hive](#install-apache-hive) +1. [Interact with Apache Gravitino API](#interact-with-apache-gravitino-api) + +:::note +If you want to access the instance remotely, be sure to read +[Accessing Gravitino on AWS externally](./aws-remote-access.md). +::: + +## Environment preparation + +### AWS + +To work in an AWS environment, follow these steps: + +1. In the AWS console, launch a new instance. + Select `Ubuntu` as the operating system and `t2.xlarge` as the instance type. + Create a key pair named *Gravitino.pem* for SSH access and download it. + Allow HTTP and HTTPS traffic if you want to connect to the instance remotely. + Set the Elastic Block Store storage to 20GiB. + Leave all other settings as their defaults. + Other operating systems and instance types may work, but are not be fully tested. + +1. Start the instance and connect to it via SSH using the downloaded `.pem` file: + + ```shell + ssh ubuntu@<IP_address> -i ~/Downloads/Gravitino.pem + ``` + + **Note**: you may need to adjust the permissions on your `.pem` file using + `chmod 400` to enable SSH connections. + +1. Update the Ubuntu OS to ensure it's up-to-date: + + ```shell + sudo apt update + sudo apt upgrade + ``` + + <!--TODO: need Red Hat commands?--> + You may need to reboot the instance for all changes to take effect. + +1. Install the Java Development Kit (JDK), Java 8, 11 and 17 are supported. + + ```shell + sudo apt install openjdk-<version>-jdk-headless + ``` + + Verify the Java version with: + + ```shell + java -version + ``` + + You should see information about the OpenJDK version. + +### GCP + +To work on the GCP platform, follow these steps: + +1. In the Google Cloud console, launch a new instance. + Select `e2-standard-4` as the instance type and 20 GB for the boot disk size. + Allow HTTP and HTTPS traffic if you want to connect to the instance remotely. + Leave all other settings as their defaults. + Other operating systems and instance types may work, but are not fully tested. + +1. Start the instance and connect to it via the SSH-in-browser tool. + +1. Update the Debian OS to ensure it's up-to-date: + + ```shell + sudo apt update + sudo apt upgrade + ``` + + You may need to reboot the instance for all changes to take effect. + +1. Install the Java Development Kit (JDK), Java 8, 11 and 17 are supported. + + ```shell + wget -O - https://apt.corretto.aws/corretto.key | sudo gpg --dearmor -o /usr/share/keyrings/corretto-keyring.gpg + echo "deb [signed-by=/usr/share/keyrings/corretto-keyring.gpg] https://apt.corretto.aws stable main" | sudo tee /etc/apt/sources.list.d/corretto.list + sudo apt-get update + sudo apt-get install -y java-<version>-amazon-corretto-jdk + ``` + + Verify the Java version with: + + ```shell + java -version + ``` + + You should see information about the OpenJDK version. + +### Local workstation + +To build and install Gravitino locally on a macOS or a Linux workstation, +follow these steps: + +1. Install the Java Development Kit (JDK), Java 8, 11 and 17 are supported. + This can be done using [sdkman](https://sdkman.io/), for example: + + ```shell + sdk install java <version> + ``` + + You can also use different package managers to install JDK, for example, + [Homebrew](https://brew.sh/) on macOS, `apt` on Ubuntu/Debian, and + `yum` on CentOS/RedHat. + +## Install Gravitino + +You can install Gravitino from the binary release packages or the container images. +Follow [how-to-install](../how-to-install.md). + +Or you can install Gravitino from scratch. +Follow [how-to-build](../how-to-build.md) and [how-to-install](../how-to-install.md). + +## Start Gravitino + +Start Gravitino using the `gravitino.sh` script: + +```shell +<path-to-gravitino>/bin/gravitino.sh start +``` + +## Install Apache Hive + +If you already have Apache Hive and Apache Hadoop in your environment, +you can skip this step and use the existing service with Gravitino. +Or else, you can follow the [instructions](./hive.md) to install Apache Hive. + +## Interact with Apache Gravitino API + +After having deployed the Gravitino server, you can interact it using +the RESTful APIs to create and modify metadata. + +:::tip +The following examples use `localhost` as the host name. +You may need to revise it accordingly based on the environment +you are using. +::: + +1. Create a Metalake: + + ```shell + curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ + -H "Content-Type: application/json" \ + -d '{"name":"my-metalake","comment":"Test metalake"}' \ + http://localhost:8090/api/metalakes + ``` + + Verify the MetaLake has been created: + + ```shell + curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ + -H "Content-Type: application/json" \ + http://localhost:8090/api/metalakes + + curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ + -H "Content-Type: application/json" \ + http://localhost:8090/api/metalakes/my-metalake + ``` + + Note that if you are requesting a Metalake that doesn't exist, you'll get a + `NoSuchMetalakeException` error. + + ```shell + curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ + -H "Content-Type: application/json" \ + http://localhost:8090/api/metalakes/none + ``` + +1. Create a catalog in Hive: + + First, list the current catalogs to verify that no catalogs exist. + + ```shell + curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ + -H "Content-Type: application/json" \ + http://localhost:8090/api/metalakes/my-metalake/catalogs + ``` + + Create a new Hive catalog. + + ```shell + curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ + -H "Content-Type: application/json" \ + -d '{"name":"my-catalog","comment":"Test catalog", "type":"RELATIONAL", "provider":"hive", "properties":{"metastore.uris":"thrift://localhost:9083"}}' \ + http://localhost:8090/api/metalakes/my-metalake/catalogs + ``` + + Verify that the catalog has been created: + + ```shell + curl -X GET -H "Accept: application/vnd.gravitino.v1+json" \ + -H "Content-Type: application/json" \ + http://localhost:8090/api/metalakes/my-metalake/catalogs + ``` + + :::tip + The `metastore.uris` property used for the Hive catalog has to + be adapted to your environment. + ::: + +## Next steps + +- Delve deeper into the [documentation](https://gravitino.apache.org/docs/latest) + for advanced features and configuration options. + +- Bookmark [Gravitino Website](https://gravitino.apache.org) for updates, + laest releases, new features, optimizations, and security enhancements. + +- Read our [blogs](https://gravitino.apache.org/blog) + +- Join the Gravitino community forums to connect with developers and other users, + for experience sharing and seeking helps if needed. + Questions and comments are all welcomed. + + - Join [Gravitino Slack channel](https://the-asf.slack.com) + - Explore the GitHub repository for [issues](https://github.com/apache/gravitino/issues) + or [pull requests](https://github.com/apache/gravitino/pulls), + and pick something you are interested to work on. + +<img src="https://analytics.apache.org/matomo.php?idsite=62&rec=1&bots=1&action_name=GettingStarted" alt="" /> + diff --git a/docs/getting-started/playground.md b/docs/getting-started/playground.md new file mode 100644 index 0000000000..ddbf8196cf --- /dev/null +++ b/docs/getting-started/playground.md @@ -0,0 +1,23 @@ +--- +title: "Installing Apache Gravitino Playground" +slug: /getting-started/playground +license: "This software is licensed under the Apache License version 2." +--- + +Gravitino provides a bundle of Docker images to launch a Gravitino playground, +which includes Apache Hive, Apache Hadoop, Trino, MySQL, PostgreSQL, and Gravitino. +You can use Docker Compose to start them all. + +Installing Docker and Docker Compose is a requirement for using the playground. + +```shell +sudo apt install docker docker-compose +sudo gpasswd -a $USER docker +newgrp docker +``` + +You can install and run all the programs as Docker containers by using the +[gravitino-playground](https://github.com/apache/gravitino-playground). +For details about how to run the playground, see +[how-to-use-the-playground](./how-to-use-the-playground.md) + diff --git a/docs/index.md b/docs/index.md index 4a9c43131d..8e75f09cc5 100644 --- a/docs/index.md +++ b/docs/index.md @@ -6,9 +6,9 @@ license: "This software is licensed under the Apache License version 2." ## What's Apache Gravitino? -Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages the -metadata directly in different sources, types, and regions. It also provides users with unified -metadata access for data and AI assets. +Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. +It manages the metadata directly in different sources, types, and regions. +It also provides users with unified metadata access for data and AI assets. [Learn more](./overview.md)→ @@ -17,9 +17,10 @@ metadata access for data and AI assets. You can get Gravitino from the [download page](https://gravitino.apache.org/downloads), or you can build Gravitino from source code. See [How to build Gravitino](./how-to-build.md). -Gravitino runs on both Linux and macOS platforms, and it requires the installation of Java 8, Java 11, or Java 17. Gravitino trino-connector runs with -Trino, and requires Java 17. This should include JVMs on x86_64 and -ARM64. It's easy to run locally on one machine, all you need is to have `java` installed on +Gravitino runs on both Linux and macOS platforms, and it requires the installation of +Java 8, Java 11, or Java 17. Gravitino trino-connector runs with Trino, and requires Java 17. +This should include JVMs on x86_64 and ARM64. +It's easy to run locally on one machine, all you need is to have `java` installed on your system `PATH`, or the `JAVA_HOME` environment variable pointing to a Java installation. See [How to install Gravitino](./how-to-install.md) to learn how to install the Gravitino server. @@ -34,19 +35,22 @@ and [How to use the playground](./how-to-use-the-playground.md). ## Getting started -To get started with Gravitino, see [Getting started](./getting-started.md) for the details. +To get started with Gravitino, see [Getting started](./getting-started/index.md) for the details. -* [Getting started locally](./getting-started.md#getting-started-locally): a quick guide to starting +* [Getting started locally](./getting-started/index.md#local): a quick guide to starting and using Gravitino locally. -* [Running on Amazon Web Services](./getting-started.md#getting-started-on-amazon-web-services): a + +* [Running on Amazon Web Services](./getting-started/index.md#aws): a quick guide to starting and using Gravitino on AWS. -* [Running on Google Cloud Platform](./getting-started.md#getting-started-on-google-cloud-platform): + +* [Running on Google Cloud Platform](./getting-started/index.md#gcp): a quick guide to starting and using Gravitino on GCP. ## How to use Apache Gravitino -Gravitino provides two SDKs to manage metadata from different catalogs in a unified way: the -REST API and the Java SDK. You can use either to manage metadata. See +Gravitino provides two SDKs to manage metadata from different catalogs in a unified way: +the REST API and the Java SDK. +You can use either to manage metadata. See * [Manage metalake using Gravitino](./manage-metalake-using-gravitino.md) to learn how to manage metalakes. @@ -73,13 +77,13 @@ Gravitino currently supports the following catalogs: **Relational catalogs:** -* [**Iceberg catalog**](./lakehouse-iceberg-catalog.md) -* [**Paimon catalog**](./lakehouse-paimon-catalog.md) +* [**Doris catalog**](./jdbc-doris-catalog.md) * [**Hudi catalog**](./lakehouse-hudi-catalog.md) * [**Hive catalog**](./apache-hive-catalog.md) +* [**Iceberg catalog**](./lakehouse-iceberg-catalog.md) * [**MySQL catalog**](./jdbc-mysql-catalog.md) +* [**Paimon catalog**](./lakehouse-paimon-catalog.md) * [**PostgreSQL catalog**](./jdbc-postgresql-catalog.md) -* [**Doris catalog**](./jdbc-doris-catalog.md) * [**OceanBase catalog**](./jdbc-oceanbase-catalog.md) **Fileset catalogs:** @@ -96,14 +100,15 @@ Gravitino currently supports the following catalogs: ## Apache Gravitino playground -To experience Gravitino with other components easily, Gravitino provides a playground to run. It -integrates Apache Hadoop, Apache Hive, Trino, MySQL, PostgreSQL, and Gravitino together as a +To experience Gravitino with other components easily, Gravitino provides a playground to run. +It integrates Apache Hadoop, Apache Hive, Trino, MySQL, PostgreSQL, and Gravitino together as a complete environment. To experience all the features, see -[Getting started](./getting-started.md) and [How to use the Gravitino playground](./how-to-use-the-playground.md). +[Getting started](./getting-started/index.md) and +[How to use the Gravitino playground](./how-to-use-the-playground.md). -* [Install Gravitino playground on AWS or GCP](./getting-started.md#installing-apache-gravitino-playground-on-aws-or-google-cloud-platform): +* [Install Gravitino playground on AWS or GCP](./getting-started/playground.md): a quick guide to starting and using the Gravitino playground on AWS or GCP. -* [Install Gravitino playground locally](./getting-started.md#installing-apache-gravitino-playground-locally): +* [Install Gravitino playground locally](./getting-started/playground.md): a quick guide to starting and using the Gravitino playground locally. * [How to use the Gravitino playground](./how-to-use-the-playground.md): provides an example of how to use Gravitino and other components together. @@ -114,18 +119,18 @@ complete environment. To experience all the features, see Gravitino supports different catalogs to manage the metadata in different sources. Please see: -* [Iceberg catalog](./lakehouse-iceberg-catalog.md): a complete guide to using Gravitino to manage Apache Iceberg data. -* [Paimon catalog](./lakehouse-paimon-catalog.md): a complete guide to using Gravitino to manage Apache Paimon data. -* [Hudi catalog](./lakehouse-hudi-catalog.md): a complete guide to using Gravitino to manage Apache Hudi data. -* [Hive catalog](./apache-hive-catalog.md): a complete guide to using Gravitino to manage Apache Hive data. -* [MySQL catalog](./jdbc-mysql-catalog.md): a complete guide to using Gravitino to manage MySQL data. -* [PostgreSQL catalog](./jdbc-postgresql-catalog.md): a complete guide to using Gravitino to manage PostgreSQL data. * [Doris catalog](./jdbc-doris-catalog.md): a complete guide to using Gravitino to manage Doris data. -* [OceanBase catalog](./jdbc-oceanbase-catalog.md): a complete guide to using Gravitino to manage OceanBase data. * [Hadoop catalog](./hadoop-catalog.md): a complete guide to using Gravitino to manage fileset using Hadoop Compatible File System (HCFS). +* [Hive catalog](./apache-hive-catalog.md): a complete guide to using Gravitino to manage Apache Hive data. +* [Hudi catalog](./lakehouse-hudi-catalog.md): a complete guide to using Gravitino to manage Apache Hudi data. +* [Iceberg catalog](./lakehouse-iceberg-catalog.md): a complete guide to using Gravitino to manage Apache Iceberg data. * [Kafka catalog](./kafka-catalog.md): a complete guide to using Gravitino to manage Kafka topics metadata. * [Model catalog](./model-catalog.md): a complete guide to using Gravitino to manage model metadata. +* [MySQL catalog](./jdbc-mysql-catalog.md): a complete guide to using Gravitino to manage MySQL data. +* [Paimon catalog](./lakehouse-paimon-catalog.md): a complete guide to using Gravitino to manage Apache Paimon data. +* [PostgreSQL catalog](./jdbc-postgresql-catalog.md): a complete guide to using Gravitino to manage PostgreSQL data. +* [OceanBase catalog](./jdbc-oceanbase-catalog.md): a complete guide to using Gravitino to manage OceanBase data. ### Governance @@ -136,7 +141,8 @@ Gravitino provides governance features to manage metadata in a unified way. See: ### Gravitino Iceberg REST catalog service -* [Iceberg REST catalog service](./iceberg-rest-service.md): a complete guide to using Gravitino as an Apache Iceberg REST catalog service. +* [Iceberg REST catalog service](./iceberg-rest-service.md): a guide to using Gravitino + as an Apache Iceberg REST catalog service. ### Connectors diff --git a/docs/trino-connector/installation.md b/docs/trino-connector/installation.md index 4d51fac2a2..ef2dce574a 100644 --- a/docs/trino-connector/installation.md +++ b/docs/trino-connector/installation.md @@ -88,7 +88,7 @@ catalog.management=dynamic ### Configuring the Apache Gravitino Trino connector -Assuming you have now started the Gravitino server on the host `gravitino-server-host` and already created a metalake named `test`, if those have not been prepared, please refer to the [Gravitino getting started](../getting-started.md). +Assuming you have now started the Gravitino server on the host `gravitino-server-host` and already created a metalake named `test`, if those have not been prepared, please refer to the [Gravitino getting started](../getting-started/index.md). To configure Gravitino Trino connector correctly, you need to put the following configurations to the Trino configuration file `/etc/trino/catalog/gravitino.properties`. diff --git a/docs/webui.md b/docs/webui.md index e8d8694844..00f946f984 100644 --- a/docs/webui.md +++ b/docs/webui.md @@ -12,7 +12,7 @@ This document primarily outlines how users can manage metadata within Apache Gra Currently, you can integrate [OAuth settings](security/security.md) to view, add, modify, and delete metalakes, create catalogs, and view catalogs, schemas, and tables, among other functions. -[Build](./how-to-build.md#quick-start) and [deploy](./getting-started.md#getting-started-locally) the Gravitino Web UI and open it in a browser at `http://<gravitino-host>:<gravitino-port>`, by default is [http://localhost:8090](http://localhost:8090). +[Build](./how-to-build.md#quick-start) and [deploy](./getting-started/index.md#local-workstation) the Gravitino Web UI and open it in a browser at `http://<gravitino-host>:<gravitino-port>`, by default is [http://localhost:8090](http://localhost:8090). ## Initial page @@ -71,7 +71,7 @@ At the top-right, there is an icon button that takes you to the login page when ### Metalake -#### [Create metalake](./getting-started.md#using-rest-to-interact-with-gravitino) +#### [Create metalake](./getting-started/index.md#interact-with-apache-gravitino-api) On the homepage, clicking on the `CREATE METALAKE` button displays a dialog to create a metalake.