This is an automated email from the ASF dual-hosted git repository.
xxyu pushed a commit to branch doc5.0
in repository https://gitbox.apache.org/repos/asf/kylin.git
The following commit(s) were added to refs/heads/doc5.0 by this push:
new 1954e12c72 KYLIN-5221 add apache hadoop installation
1954e12c72 is described below
commit 1954e12c72dc594a011b5a7d7efc471407a45998
Author: Mukvin <[email protected]>
AuthorDate: Tue Aug 23 19:17:34 2022 +0800
KYLIN-5221 add apache hadoop installation
---
.../platform/install_on_apache_hadoop.md | 52 ++++
.../docs/deployment/installation/platform/intro.md | 18 ++
website/docs/development/how_to_package.md | 24 +-
website/docs/quickstart/expert_mode_tutorial.md | 8 +-
website/docs/quickstart/images/gss_negotiate.png | Bin 0 -> 19292 bytes
.../images/installation_query_result.png | Bin 0 -> 127355 bytes
website/docs/quickstart/images/list.png | Bin 0 -> 153847 bytes
website/docs/quickstart/quick_start.md | 268 +++++++++++++++++++++
website/sidebars.js | 38 ++-
9 files changed, 399 insertions(+), 9 deletions(-)
diff --git
a/website/docs/deployment/installation/platform/install_on_apache_hadoop.md
b/website/docs/deployment/installation/platform/install_on_apache_hadoop.md
new file mode 100644
index 0000000000..2a5bdcc28e
--- /dev/null
+++ b/website/docs/deployment/installation/platform/install_on_apache_hadoop.md
@@ -0,0 +1,52 @@
+---
+title: Install on Apache Hadoop Platform
+language: en
+sidebar_label: Install on Apache Hadoop Platform
+pagination_label: Install on Apache Hadoop Platform
+toc_min_heading_level: 2
+toc_max_heading_level: 6
+pagination_prev: null
+pagination_next: null
+keywords:
+ - install
+ - hadoop
+draft: false
+last_update:
+ date: 08/12/2022
+---
+
+
+### Prepare Environment
+
+First, **make sure you allocate sufficient resources for the environment**.
Please refer to
[Prerequisites](../../../deployment/on-premises/prerequisite.md) for detailed
resource requirements for Kylin. Moreover, please ensure that `HDFS`, `YARN`,
`Hive`, `ZooKeeper` and other components are in normal state without any
warning information.
+
+
+
+#### Apache Hadoop Supported Version
+
+Following Apache Hadoop versions are supported by Kylin:
+
+- Apache Hadoop 3.2.1
+
+**Note**:The Apache Hadoop 3.2.1 environment with Kerberos is not currently
supported.
+
+#### Additional configuration required for Apache Hadoop version
+
+Add the following two configurations in `$KYLIN_HOME/conf/kylin.properties`:
+
+- `kylin.env.apache-hadoop-conf-dir` Hadoop conf directory in Hadoop
environment
+- `kylin.env.apache-hive-conf-dir` Hive conf directory in Hadoop environment
+
+
+
+#### Jar package required by Apache Hadoop version
+
+In Apache Hadoop 3.2.1, you also need to prepare the MySQL JDBC driver in the
operating environment of Kylin.
+
+Here is a download link for the jar file package of the MySQL 5.1 JDBC
driver:https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.41/mysql-connector-java-5.1.41.jar.
You need to prepare the other versions of the driver yourself.Please place the
JDBC driver of the corresponding version of MySQL in the `$KYLIN_HOME/lib/ext`
directory.
+
+
+
+### Install Kylin
+
+After setting up the environment, please refer to [Quick
Start](../../../quickstart/quick_start.md) to continue.
diff --git a/website/docs/deployment/installation/platform/intro.md
b/website/docs/deployment/installation/platform/intro.md
new file mode 100644
index 0000000000..fb44cb572c
--- /dev/null
+++ b/website/docs/deployment/installation/platform/intro.md
@@ -0,0 +1,18 @@
+---
+title: Install On Platforms
+language: en
+sidebar_label: Install On Platforms
+pagination_label: Install On Platforms
+toc_min_heading_level: 2
+toc_max_heading_level: 6
+pagination_prev: null
+pagination_next: null
+keywords:
+ - install
+ - platforms
+draft: false
+last_update:
+ date: 08/12/2022
+---
+
+This chapter will introduce how to install Kylin on different platforms.
diff --git a/website/docs/development/how_to_package.md
b/website/docs/development/how_to_package.md
index 72d0708f45..a52c95a8c0 100644
--- a/website/docs/development/how_to_package.md
+++ b/website/docs/development/how_to_package.md
@@ -1,5 +1,17 @@
---
-sidebar_position: 1
+title: How to package
+language: en
+sidebar_label: How to package
+pagination_label: How to package
+toc_min_heading_level: 2
+toc_max_heading_level: 6
+pagination_prev: null
+pagination_next: null
+keywords:
+ - package
+draft: false
+last_update:
+ date: 08/22/2022
---
# How to package
@@ -24,6 +36,11 @@ sidebar_position: 1
| -skipFront | If add this option, front-end won't be build and
packaging |
| -skipCompile | Add this option will assume java source code no need
be compiled again |
+### Other Options for Packaging Script
+| Option | Comment |
+|-------------------- | ---------------------------------------------------|
+| -P hadoop3 | Packaging a Kylin 5.0 software package for running on
Hadoop 3.0 + platform.|
+
### Package Content
| Option | Comment |
@@ -46,6 +63,9 @@ For example, an unofficial package could be
`apache-kylin-5.0.0-SNAPSHOT.2022081
## Case 2: Official apache release, kylin binary for deploy on Hadoop3+ and
Hive2.3+,
# and third party cannot be distributed because of apache distribution
policy(size and license)
./build/release/release.sh -noSpark -official
+
+## Case 3: A package for runing on Apache Hadoop 3 platform
+./build/release/release.sh -P hadoop3
```
### How to switch to older node.js
@@ -60,4 +80,4 @@ nvm use 12.14.0
## switch to original version
nvm use system
-```
\ No newline at end of file
+```
diff --git a/website/docs/quickstart/expert_mode_tutorial.md
b/website/docs/quickstart/expert_mode_tutorial.md
index 66905f9150..cbafa0496e 100644
--- a/website/docs/quickstart/expert_mode_tutorial.md
+++ b/website/docs/quickstart/expert_mode_tutorial.md
@@ -1,14 +1,14 @@
---
-title: Quick Start
+title: Expert Mode Tutorial
language: en
-sidebar_label: Quick Start
-pagination_label: Quick Start
+sidebar_label: Expert Mode Tutorial
+pagination_label: Expert Mode Tutorial
toc_min_heading_level: 2
toc_max_heading_level: 6
pagination_prev: null
pagination_next: null
keywords:
- - quick start
+ - expert mode tutorial
draft: true
last_update:
date: 08/12/2022
diff --git a/website/docs/quickstart/images/gss_negotiate.png
b/website/docs/quickstart/images/gss_negotiate.png
new file mode 100644
index 0000000000..2eca44b918
Binary files /dev/null and b/website/docs/quickstart/images/gss_negotiate.png
differ
diff --git a/website/docs/quickstart/images/installation_query_result.png
b/website/docs/quickstart/images/installation_query_result.png
new file mode 100644
index 0000000000..f2bd43f594
Binary files /dev/null and
b/website/docs/quickstart/images/installation_query_result.png differ
diff --git a/website/docs/quickstart/images/list.png
b/website/docs/quickstart/images/list.png
new file mode 100644
index 0000000000..937e7782c0
Binary files /dev/null and b/website/docs/quickstart/images/list.png differ
diff --git a/website/docs/quickstart/quick_start.md
b/website/docs/quickstart/quick_start.md
new file mode 100644
index 0000000000..69c91df583
--- /dev/null
+++ b/website/docs/quickstart/quick_start.md
@@ -0,0 +1,268 @@
+---
+title: Quick Start
+language: en
+sidebar_label: Quick Start
+pagination_label: Quick Start
+toc_min_heading_level: 2
+toc_max_heading_level: 6
+pagination_prev: null
+pagination_next: null
+keywords:
+ - quick start
+draft: true
+last_update:
+ date: 08/12/2022
+---
+
+In this guide, we will explain how to quickly install and start Kylin 5.
+
+Before proceeding, please make sure the
[Prerequisite](../deployment/on-premises/prerequisite.md) is met.
+
+
+### <span id="install">Download and Install</span>
+
+1. Get Kylin installation package.
+
+ Please refer to [How To Package](../development/how_to_package.md).
+
+2. Decide the installation location and the Linux account to run Kylin. All
the examples below are based on the following assumptions:
+
+ - The installation location is `/usr/local/`
+ - Linux account to run Kylin is `KyAdmin`. It is called the **Linux
account** hereafter.
+ - **For all commands in the rest of the document**, please replace the
above parameters with your real installation location and Linux account.
+
+3. Copy and uncompress Kylin software package to your server or virtual
machine.
+
+ ```shell
+ cd /usr/local
+ tar -zxvf Kylin5.0-Beta-[Version].tar.gz
+ ```
+ The decompressed directory is referred to as **$KYLIN_HOME** or **root
directory**.
+
+5. Prepare RDBMS metastore.
+
+ If PostgreSQL or MySQL has been installed already in your environment, you
can choose one of them as the metastore.
+
+ **Note**:
+
+ + For the production environment, we recommend to setup a dedicated
metastore. You can use PostgreSQL which is shipped with Kylin 5.x.
+ + The database name of metastore **must start with an English character**.
+
+ Please refer to the below links for complete steps to install and configure:
+
+ * [Use PostgreSQL as
Metastore](../deployment/on-premises/rdbms_metastore/postgresql/default_metastore.md).
+ * [Use MySQL as
Metastore](../deployment/on-premises/rdbms_metastore/mysql/mysql_metastore.md).
+
+6. (optional) Install InfluxDB.
+
+ Kylin uses InfluxDB to save various system monitoring information. If you
do not need to view related information, you can skip this step. It is strongly
recommended to complete this step in a production environment and use related
monitoring functions.
+
+ ```sh
+ cd $KYLIN_HOME/influxdb
+
+ # install influxdb
+ rpm -ivh influxdb-1.6.5.x86_64.rpm
+ ```
+
+ For more details, please refer to [Use InfluxDB as Time-Series
Database](../operations/monitoring/influxdb/influxdb.md).
+
+6. Create a working directory on HDFS and grant permissions.
+
+ The default working directory is `/kylin`. Also ensure the Linux account
has access to its home directory on HDFS. Meanwhile, create directory
`/kylin/spark-history` to store the spark log files.
+
+ ```sh
+ hadoop fs -mkdir -p /kylin
+ hadoop fs -chown root /kylin
+ hadoop fs -mkdir -p /kylin/spark-history
+ hadoop fs -chown root /kylin/spark-history
+ ```
+
+ If necessary, you can modify the path of the Kylin working directory in
`$KYLIN_HOME/conf/kylin.properties`.
+
+ **Note**: If you do not have the permission to create
`/kylin/spark-history`, you can configure
`kylin.engine.spark-conf.spark.eventLog.dir` and
`kylin.engine.spark-conf.spark.history.fs.logDirectory` with an available
directory.
+
+### <span id="configuration">Quick Configuration</span>
+
+In the `conf` directory under the root directory of the installation package,
you should configure the parameters in the file `kylin.properties` as follows:
+
+1. According to the PostgreSQL configuration, configure the following metadata
parameters. Pay attention to replace the corresponding ` {metadata_name} `,
`{host} `, ` {port} `, ` {user} `, ` {password} ` value, the maximum length of
`metadata_name` allowed is 28.
+
+ ```properties
+
kylin.metadata.url={metadata_name}@jdbc,driverClassName=org.postgresql.Driver,url=jdbc:postgresql://{host}:{port}/kylin,username={user},password={password}
+ ```
+ For more PostgreSQL configuration, please refer to [Use PostgreSQL as
Metastore](../deployment/on-premises/rdbms_metastore/postgresql/default_metastore.md).
For information for MySQL configuration, please refer to [Use MySQL as
Metastore](../deployment/on-premises/rdbms_metastore/mysql/mysql_metastore.md).
+
+ > **Note**: please name the `{metadata_name}` with letters, numbers, or
underscores. The name can't start with numbers, such as `1a` is illegal and
`a1` is legal.
+
+2. When executing jobs, Kylin will submit the build task to Yarn. You can set
and replace `{queue}` in the following parameters as the queue you actually
use, and require the build task to be submitted to the specified queue.
+
+ ```properties
+ kylin.engine.spark-conf.spark.yarn.queue={queue_name}
+ ```
+
+
+3. Configure the ZooKeeper service.
+
+ Kylin uses ZooKeeper for service discovery, which will ensure that when an
instance starts, stops, or unexpectedly interrupts communication during cluster
deployment, other instances in the cluster can automatically discover and
update the status. For more details, pleaser refer to [Service
Discovery](../deployment/on-premises/deploy_mode/service_discovery.md).
+
+ Please add ZooKeeper's connection configuration
`kylin.env.zookeeper-connect-string=host:port`. You can modify the cluster
address and port according to the following example.
+
+ ```properties
+ kylin.env.zookeeper-connect-string=10.1.2.1:2181,10.1.2.2:2181,10.1.2.3:2181
+ ```
+
+4. (optional) Configure Spark Client node information
+ Since Spark is started in yarn-client mode, if the IP information of Kylin
is not configured in the hosts file of the Hadoop cluster, please add the
following configurations in `kylin.properties`:
+ `kylin.storage.columnar.spark-conf.spark.driver.host={hostIp}`
+ `kylin.engine.spark-conf.spark.driver.host={hostIp}`
+
+ You can modify the {hostIp} according to the following example:
+ ```properties
+ kylin.storage.columnar.spark-conf.spark.driver.host=10.1.3.71
+ kylin.engine.spark-conf.spark.driver.host=10.1.3.71
+ ```
+
+
+
+
+### <span id="start">Start Kylin</span>
+
+1. Check the version of `curl`.
+
+ Since `check-env.sh` needs to rely on the support of GSS-Negotiate during
the installation process, it is recommended that you check the relevant
components of your curl first. You can use the following commands in your
environment:
+
+ ```shell
+ curl --version
+ ```
+ If GSS-Negotiate is displayed in the interface, the curl version is
available. If not, you can reinstall curl or add GSS-Negotiate support.
+ 
+
+2. Start Kylin with the startup script.
+ Run the following command to start Kylin. When it is first started, the
system will run a series of scripts to check whether the system environment has
met the requirements. For details, please refer to the [Environment Dependency
Check](../operations/system-operation/cli_tool/environment_dependency_check.md)
chapter.
+
+ ```shell
+ ${KYLIN_HOME}/bin/kylin.sh start
+ ```
+ > **Note**:If you want to observe the detailed startup progress, run:
+ >
+ > ```shell
+ > tail -f $KYLIN_HOME/logs/kylin.log
+ > ```
+
+
+Once the startup is completed, you will see information prompt in the console.
Run the command below to check the Kylin process at any time.
+
+ ```shell
+ ps -ef | grep kylin
+ ```
+
+3. Get login information.
+
+ After the startup script has finished, the random password of the default
user `ADMIN` will be displayed on the console. You are highly recommended to
save this password. If this password is accidentally lost, please refer to
[ADMIN User Reset Password](../operations/access-control/user_management.md).
+
+### <span id="use">How to Use</span>
+
+After Kylin is started, open web GUI at `http://{host}:7070/kylin`. Please
replace `host` with your host name, IP address, or domain name. The default
port is `7070`.
+
+The default user name is `ADMIN`. The random password generated by default
will be displayed on the console when Kylin is started for the first time.
After the first login, please reset the administrator password according to the
password rules.
+
+- At least 8 characters.
+- Contains at least one number, one letter, and one special character
```(~!@#$%^&*(){}|:"<>?[];',./`)```.
+
+Kylin uses the open source **SSB** (Star Schema Benchmark) dataset for star
schema OLAP scenarios as a test dataset. You can verify whether the
installation is successful by running a script to import the SSB dataset into
Hive. The SSB dataset is from multiple CSV files.
+
+**Import Sample Data**
+
+Run the following command to import the sample data:
+
+```shell
+$KYLIN_HOME/bin/sample.sh
+```
+
+The script will create 1 database **SSB** and 6 Hive tables then import data
into it.
+
+After running successfully, you should be able to see the following
information in the console:
+
+```shell
+Sample hive tables are created successfully
+```
+
+
+We will be using SSB dataset as the data sample to introduce Kylin in several
sections of this product manual. The SSB dataset simulates transaction data for
the online store, see more details in [Sample Dataset](sample_dataset.md).
Below is a brief introduction.
+
+
+| Table | Description | Introduction
|
+| ----------- | ------------------------------------- |
------------------------------------------------------------ |
+| CUSTOMER | customer information | includes customer
name, address, contact information .etc. |
+| DATES | order date | includes a order's
specific date, week, month, year .etc. |
+| LINEORDER | order information | includes some basic
information like order date, order amount, order revenue, supplier ID,
commodity ID, customer Id .etc. |
+| PART | product information | includes some basic
information like product name, category, brand .etc. |
+| P_LINEORDER | view based on order information table | includes all content
in the order information table and new content in the view |
+| SUPPLIER | supplier information | includes supplier
name, address, contact information .etc. |
+
+
+**Validate Product Functions**
+
+You can create a sample project and model according to [Expert Mode
Tutorial](expert_mode_tutorial.md). The project should validate basic features
such as source table loading, model creation, index build etc.
+
+On the **Data Asset -> Model** page, you should see an example model with some
storage over 0.00 KB, this indicates the data has been loaded for this model.
+
+
+
+On the **Monitor** page, you can see all jobs have been completed successfully
in **Batch Job** and **Streaming Job** pages.
+
+
+
+**Validate Query Analysis**
+
+When the metadata is loaded successfully, at the **Insight** page, 6 sample
hive tables would be shown at the left panel. User could input query statements
against these tables. For example, the SQL statement queries different product
group by order date, and in descending order by total revenue:
+
+```sql
+SELECT LO_PARTKEY, SUM(LO_REVENUE) AS TOTAL_REVENUE
+FROM SSB.P_LINEORDER
+WHERE LO_ORDERDATE between '19930601' AND '19940601'
+group by LO_PARTKEY
+order by SUM(LO_REVENUE) DESC
+```
+
+
+The query result will be displayed at the **Insight** page, showing that the
query hit the sample model.
+
+
+
+You can also use the same SQL statement to query on Hive to verify the result
and performance.
+
+
+
+### <span id="stop">Stop Kylin</span>
+
+Run the following command to stop Kylin:
+
+```shell
+$KYLIN_HOME/bin/kylin.sh stop
+```
+
+You can run the following command to check if the Kylin process has stopped.
+
+```shell
+ps -ef | grep kylin
+```
+
+### <span id="faq">FAQ</span>
+
+**Q: How do I change the service default port?**
+
+You can modify the following configuration in the
`$KYLIN_HOME/conf/kylin.properties`, here is an example for setting the server
port to 7070.
+
+```properties
+server.port=7070
+```
+
+**Q: Does Kylin support Kerberos integration?**
+
+Yes, if your cluster enables Kerberos authentication protocol, the Spark
embedded in Kylin needs proper configuration to access your cluster resource
securely. For more information, please refer to [Integrate with
Kerberos](#TODO)(Details doc will come soon).
+
+**Q: Is the query pushdown engine turned on by default?**
+
+Yes, if you want to turn it off, please refer to [Pushdown to
SparkSQL](../query/pushdown/pushdown_to_embedded_spark.md).
+
diff --git a/website/sidebars.js b/website/sidebars.js
index fd175df36d..7307aa5b33 100644
--- a/website/sidebars.js
+++ b/website/sidebars.js
@@ -35,6 +35,10 @@ const sidebars = {
id: 'quickstart/intro',
},
items: [
+ {
+ type: 'doc',
+ id: 'quickstart/quick_start',
+ },
{
type: 'doc',
id: 'quickstart/expert_mode_tutorial',
@@ -214,9 +218,37 @@ const sidebars = {
],
},
{
- type: 'doc',
- id: 'deployment/installation/uninstallation'
- }
+ type: 'category',
+ label: 'Install and Uninstall',
+ link: {
+ type: 'doc',
+ id: 'deployment/installation/intro',
+ },
+ items: [
+ {
+ type: 'category',
+ label: 'Install On Platforms',
+ link: {
+ type: 'doc',
+ id: 'deployment/installation/platform/intro',
+ },
+ items: [
+ {
+ type: 'doc',
+ id:
'deployment/installation/platform/install_on_apache_hadoop',
+ },
+ ],
+ },
+ {
+ type: 'doc',
+ id: 'deployment/installation/uninstallation',
+ },
+ {
+ type: 'doc',
+ id: 'deployment/installation/install_validation',
+ },
+ ],
+ },
],
},
{