This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new 22a20d0c7b [install] add jdk 8 and jdk 17 guidance. fix some missing doc (#855) 22a20d0c7b is described below commit 22a20d0c7bd02802f676d80ac2a5aa3a0690a0b0 Author: Mingyu Chen <morning...@163.com> AuthorDate: Sat Jul 13 10:10:56 2024 +0800 [install] add jdk 8 and jdk 17 guidance. fix some missing doc (#855) --- .../install/cluster-deployment/standard-deployment.md | 8 ++++++++ .../install/source-install/compilation-with-docker.md | 2 +- .../source-install/compilation-with-ldb-toolchain.md | 3 +++ docs/lakehouse/datalake-analytics/hudi.md | 19 +++++++++++++++++++ docs/lakehouse/lakehouse-overview.md | 16 ++-------------- .../install/cluster-deployment/standard-deployment.md | 8 ++++++++ .../install/source-install/compilation-with-docker.md | 2 +- .../source-install/compilation-with-ldb-toolchain.md | 2 ++ .../current/lakehouse/lakehouse-overview.md | 18 ++---------------- .../install/cluster-deployment/standard-deployment.md | 8 ++++++++ .../install/source-install/compilation-with-docker.md | 9 +-------- .../source-install/compilation-with-ldb-toolchain.md | 4 +++- .../version-2.0/lakehouse/lakehouse-overview.md | 16 ++-------------- .../install/cluster-deployment/standard-deployment.md | 8 ++++++++ .../install/source-install/compilation-with-docker.md | 4 ++-- .../source-install/compilation-with-ldb-toolchain.md | 4 +++- .../version-2.1/lakehouse/lakehouse-overview.md | 16 ++-------------- .../install/cluster-deployment/standard-deployment.md | 8 ++++++++ .../install/source-install/compilation-with-docker.md | 13 ------------- .../source-install/compilation-with-ldb-toolchain.md | 3 +++ .../version-2.0/lakehouse/lakehouse-overview.md | 11 ++--------- .../install/cluster-deployment/standard-deployment.md | 8 ++++++++ .../install/source-install/compilation-with-docker.md | 12 +++++------- .../source-install/compilation-with-ldb-toolchain.md | 3 +++ .../version-2.1/lakehouse/datalake-analytics/hudi.md | 19 +++++++++++++++++++ .../version-2.1/lakehouse/lakehouse-overview.md | 11 ++--------- 26 files changed, 125 insertions(+), 110 deletions(-) diff --git a/docs/install/cluster-deployment/standard-deployment.md b/docs/install/cluster-deployment/standard-deployment.md index aa10c1f064..92a4490a3c 100644 --- a/docs/install/cluster-deployment/standard-deployment.md +++ b/docs/install/cluster-deployment/standard-deployment.md @@ -116,6 +116,14 @@ In a Doris cluster, FE is mainly responsible for metadata storage, including met | BE | Doris uses LZ4 compression by default, with a compression ratio of 0.3~0.5.Disk space needs to be calculated based on the total data volume * 3 (3 data replicas)There is a need to reserve 40% disk space for background data compaction and temporary data storage. | | Broker | If you want to deploy a Broker, you can usually deploy the Broker node on the same machine as the FE /BE nodes. | +### Java version + +All Doris processes depend on Java. + +Before version 2.1 (inclusive), please use Java 8, recommended version: `openjdk-8u352-b08-linux-x64`. + +After version 3.0 (inclusive), please use Java 17, recommended version: `jdk-17.0.10_linux-x64_bin.tar.gz`. + ## 2. Check operating system ### Disable swap partition diff --git a/docs/install/source-install/compilation-with-docker.md b/docs/install/source-install/compilation-with-docker.md index 907c50ce72..1dc69fb907 100644 --- a/docs/install/source-install/compilation-with-docker.md +++ b/docs/install/source-install/compilation-with-docker.md @@ -67,7 +67,7 @@ apache/doris build-env-for-2.0 f29cf1979dba 3 days ago 3.3GB - `apache/doris:build-env-ldb-toolchain-latest` is used for compiling the latest master code and is updated along with the master. You can check the update time in the `docker/README.md` file. - Images with "no-avx2" in their names contain third-party libraries that can run on CPUs that do not support AVX2 instructions. Using these images, you can compile Doris with the "USE_AVX2=0". - For information about changes in the compilation image, please see [ChangeLog](https://github.com/apache/doris/blob/master/thirdparty/CHANGELOG.md). -- The Docker compilation image includes both JDK 8 and JDK 17. You can check the default JDK version by running `java -version`, and switch between versions using the following commands (JDK 8 as the default version is recommended). +- The Docker compilation image includes both JDK 8 and JDK 17. You can check the default JDK version by running `java -version`, and switch between versions using the following commands. For versions earlier than 2.1 (inclusive), please use JDK 8. For versions later than 3.0 (inclusive) or the master branch, please use JDK 17. ```Bash # Switch to JDK 8 diff --git a/docs/install/source-install/compilation-with-ldb-toolchain.md b/docs/install/source-install/compilation-with-ldb-toolchain.md index 64ef261156..da2e3234f8 100644 --- a/docs/install/source-install/compilation-with-ldb-toolchain.md +++ b/docs/install/source-install/compilation-with-ldb-toolchain.md @@ -66,6 +66,9 @@ sh ldb_toolchain_gen.sh /path/to/ldb_toolchain/ 3. **Download and install other compilation components** - Download [Java8](https://doris-thirdparty-1308700295.cos.ap-beijing.myqcloud.com/tools/jdk-8u391-linux-x64.tar.gz) and install it to /path/to/java. + + > For versions later than 3.0 (inclusive), or the master branch, please use [Java 17](https://download.oracle.com/java/17/archive/jdk-17.0.10_linux-x64_bin.tar.gz). + - Download [Apache Maven 3.6.3](https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/apache-maven-3.6.3-bin.tar.gz) and install it to /path/to/maven. - Download [Node v12.13.0](https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/node-v12.13.0-linux-x64.tar.gz) and install it to /path/to/node. - Different Linux distributions may include different default components. Therefore, you may need to install some additional components. The following takes CentOS 6 as an example. Similar steps may apply to other distributions: diff --git a/docs/lakehouse/datalake-analytics/hudi.md b/docs/lakehouse/datalake-analytics/hudi.md index c986710d49..985f1aa363 100644 --- a/docs/lakehouse/datalake-analytics/hudi.md +++ b/docs/lakehouse/datalake-analytics/hudi.md @@ -105,3 +105,22 @@ You can use the `FOR TIME AS OF` statement, based on the time of the snapshot to `SELECT * FROM hudi_tbl FOR TIME AS OF "20221007172037";` Hudi table does not support the `FOR VERSION AS OF` statement. Using this syntax to query the Hudi table will throw an error. + +## Incremental Read + +Incremental Read can query the data changed between startTime and endTime, and the returned result set is the final state of the data at endTime. + +Doris provides `@incr` syntax support Incremental Read: +``` +SELECT * from hudi_table@incr('beginTime'='xxx', ['endTime'='xxx'], ['hoodie.read.timeline.holes.resolution.policy'='FAIL'], ...); +``` +`beginTime` is required, and the time format is consistent with the hudi official website [hudi_table_changes](https://hudi.apache.org/docs/0.14.0/quick-start-guide/#incremental-query), and supports "earliest". `endTime` is optional, and the default is the latest commitTime. Compatible with [Spark Read Options](https://hudi.apache.org/docs/0.14.0/configurations#Read-Options). + +To support Incremental Read, you need to enable the [new optimizer](../../query/nereids/nereids), which is enabled by default. By viewing the execution plan through `desc`, we can find that Doris converts `@incr` into `predicates` and pushes it down to `VHUDI_SCAN_NODE`: + +``` +| 0:VHUDI_SCAN_NODE(113) | +| table: lineitem_mor | +| predicates: (_hoodie_commit_time[#0] >= '20240311151019723'), (_hoodie_commit_time[#0] <= '20240311151606605') | +| inputSplitNum=1, totalFileSize=13099711, scanRanges=1 | +``` diff --git a/docs/lakehouse/lakehouse-overview.md b/docs/lakehouse/lakehouse-overview.md index f4c2bea70e..4d89e3424c 100644 --- a/docs/lakehouse/lakehouse-overview.md +++ b/docs/lakehouse/lakehouse-overview.md @@ -145,19 +145,7 @@ Multi-Catalog is designed to facilitate connection to external data catalogs and In older versions of Doris, user data is in a two-tiered structure: database and table. Thus, connections to external catalogs could only be done at the database or table level. For example, users could create a mapping to a table in an external catalog via `create external table`, or to a database via `create external database`. If there are large amounts of databases or tables in the external catalog, users will need to create mappings to them one by one, which could be tedious. -With Multi-Catalog, Doris now has a new three-tiered metadata hierarchy (catalog -> database -> table), which means users can connect to external data at the catalog level directly. Currently it supports external catalogs including: - -- Apache Hive - -- Apache Iceberg - -- Apache Hudi - -- Elasticsearch - -- JDBC - -- Apache Paimon +With Multi-Catalog, Doris now has a new three-tiered metadata hierarchy (catalog -> database -> table), which means users can connect to external data at the catalog level directly. Multi-Catalog works as an additional and enhanced external table connection method. It helps users conduct multi-catalog federated queries quickly. @@ -388,7 +376,7 @@ Along with the new Multi-Catalog feature, we also added privilege management at Users can also specify a custom authentication class through `access_controller.class`. For example, if you specify it as -`"access_controller.class"="org.apache.doris.catalog.authorizer.ranger.hive.RangerHiveAccessControllerFactory"`, then you can use Apache Range to perform authentication management on Hive Catalog. For more information see: [Hive](../lakehouse/datalake-analytics/hive) +`"access_controller.class"="org.apache.doris.catalog.authorizer.ranger.hive.RangerHiveAccessControllerFactory"`, then you can use Apache Range to perform authentication management on Hive Catalog. For more information see: [Hive Catalog](../datalake-analytics/hive) ### Database synchronization management diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/cluster-deployment/standard-deployment.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/cluster-deployment/standard-deployment.md index 03856ae12d..10cd7ecb71 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/cluster-deployment/standard-deployment.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/cluster-deployment/standard-deployment.md @@ -128,6 +128,14 @@ Doris 支持运行和部署在 x86-64 架构的服务器平台或 ARM64 架构 | BE | Doris 默认 LZ4 压缩方式进行存储,压缩比在 0.3 - 0.5 左右磁盘空间需要按照总数据量 * 3(3 副本)计算需要预留出 40% 空间用作后台 compaction 以及临时数据的存储 | | Broker | 如需部署 Broker,通常情况下可以将 Broker 节点与 FE / BE 节点部署在同一台机器上 | +### Java 版本 + +Doris 的所有进程都依赖 Java。 + +在 2.1(含)版本之前,请使用 Java 8,推荐版本:`openjdk-8u352-b08-linux-x64`。 + +从 3.0(含)版本之后,请使用 Java 17,推荐版本:`jdk-17.0.10_linux-x64_bin.tar.gz`。 + ## 2 操作系统检查 ### 关闭 swap 分区 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/source-install/compilation-with-docker.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/source-install/compilation-with-docker.md index 8d9b069d33..6e575d14d2 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/source-install/compilation-with-docker.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/source-install/compilation-with-docker.md @@ -73,7 +73,7 @@ apache/doris build-env-for-2.0 f29cf1979dba 3 days ago 3.3GB - 编译镜像变更信息可参考 [ChangeLog](https://github.com/apache/doris/blob/master/thirdparty/CHANGELOG.md)。 -- 最新版本的 `apache/doris:build-env-ldb-toolchain-latest` 镜像中同时包含 JDK 8 和 JDK 17。 +- 最新版本的 `apache/doris:build-env-ldb-toolchain-latest` 镜像中同时包含 JDK 8 和 JDK 17。2.1(含)之前的版本,请使用 JDK 8。3.0(含)之后的版本或 master 分支,请使用 JDK 17。 ```Bash # 切换到 JDK 8 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/source-install/compilation-with-ldb-toolchain.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/source-install/compilation-with-ldb-toolchain.md index 9c71754bef..9308550295 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/source-install/compilation-with-ldb-toolchain.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/install/source-install/compilation-with-ldb-toolchain.md @@ -65,6 +65,8 @@ sh ldb_toolchain_gen.sh /path/to/ldb_toolchain/ - 下载 [Java8](https://doris-thirdparty-1308700295.cos.ap-beijing.myqcloud.com/tools/jdk-8u391-linux-x64.tar.gz),安装到 /path/to/java + > 3.0(含)之后的版本,或 master 分支,请使用 [Java 17](https://download.oracle.com/java/17/archive/jdk-17.0.10_linux-x64_bin.tar.gz)。 + - 下载 [Apache Maven 3.6.3](https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/apache-maven-3.6.3-bin.tar.gz),安装到 /path/to/maven - 下载 [Node v12.13.0](https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/node-v12.13.0-linux-x64.tar.gz),安装到 /path/to/node diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/lakehouse-overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/lakehouse-overview.md index 627e096667..0b6e625008 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/lakehouse-overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/lakehouse/lakehouse-overview.md @@ -148,21 +148,7 @@ Doris 通过收集统计信息有助于优化器了解数据分布特性,在 在之前的 Doris 版本中,用户数据只有两个层级:Database 和 Table。当我们需要连接一个外部数据目录时,我们只能在 Database 或 Table 层级进行对接。比如通过 `create external table` 的方式创建一个外部数据目录中的表的映射,或通过 `create external database` 的方式映射一个外部数据目录中的 Database。如果外部数据目录中的 Database 或 Table 非常多,则需要用户手动进行一一映射,使用体验不佳。 -而新的 Multi-Catalog 功能在原有的元数据层级上,新增一层 Catalog,构成 Catalog -> Database -> Table 的三层元数据层级。其中,Catalog 可以直接对应到外部数据目录。目前支持的外部数据目录包括: - -- Apache Hive - -- Apache Iceberg - -- Apache Hudi - -- Elasticsearch - -- JDBC: 对接数据库访问的标准接口 (JDBC) 来访问各式数据库的数据。 - -- Apache Paimon - -- LakeSoul +而新的 Multi-Catalog 功能在原有的元数据层级上,新增一层 Catalog,构成 Catalog -> Database -> Table 的三层元数据层级。 该功能将作为之前外表连接方式(External Table)的补充和增强,帮助用户进行快速的多数据目录联邦查询。 @@ -199,7 +185,7 @@ Doris 通过收集统计信息有助于优化器了解数据分布特性,在 这里我们通过连接一个 Hive 集群说明如何使用 Catalog 功能。 -更多关于 Hive 的说明,请参阅:[Hive Catalog](../lakehouse/datalake-analytics/hive) +更多关于 Hive 的说明,请参阅:[Hive Catalog](./datalake-analytics/hive) **1. 创建 Catalog** diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/install/cluster-deployment/standard-deployment.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/install/cluster-deployment/standard-deployment.md index 657168cc9b..3e014a0d8e 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/install/cluster-deployment/standard-deployment.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/install/cluster-deployment/standard-deployment.md @@ -124,6 +124,14 @@ Doris 支持运行和部署在 x86-64 架构的服务器平台或 ARM64 架构 | BE | Doris 默认 LZ4 压缩方式进行存储,压缩比在 0.3 - 0.5 左右磁盘空间需要按照总数据量 * 3(3 副本)计算需要预留出 40% 空间用作后台 compaction 以及临时数据的存储 | | Broker | 如需部署 Broker,通常情况下可以将 Broker 节点与 FE / BE 节点部署在同一台机器上 | +### Java 版本 + +Doris 的所有进程都依赖 Java。 + +在 2.1(含)版本之前,请使用 Java 8,推荐版本:`openjdk-8u352-b08-linux-x64`。 + +从 3.0(含)版本之后,请使用 Java 17,推荐版本:`jdk-17.0.10_linux-x64_bin.tar.gz`。 + ## 2 操作系统检查 ### 关闭 swap 分区 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/install/source-install/compilation-with-docker.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/install/source-install/compilation-with-docker.md index 495a215ba2..9143790497 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/install/source-install/compilation-with-docker.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/install/source-install/compilation-with-docker.md @@ -69,18 +69,11 @@ apache/doris build-env-for-2.0 f29cf1979dba 3 days ago 3.3GB - 编译镜像变更信息可参考 [ChangeLog](https://github.com/apache/doris/blob/master/thirdparty/CHANGELOG.md)。 -- Docker 编译镜像同时包含了 OpenJDK 8 和 OpenJDK 11,请通过 `java -version` 确认默认 JDK 版本。也可以通过以下方式切换版本(建议默认使用 JDK8) - ```Bash # 切换到 JDK 8 alternatives --set java java-1.8.0-openjdk.x86_64 alternatives --set javac java-1.8.0-openjdk.x86_64 export JAVA_HOME=/usr/lib/jvm/java-1.8.0 - -# 切换到 JDK 11 -alternatives --set java java-11-openjdk.x86_64 -alternatives --set javac java-11-openjdk.x86_64 -export JAVA_HOME=/usr/lib/jvm/java-11 ``` ## 编译 Doris @@ -147,4 +140,4 @@ $ cat /proc/cpuinfo | grep avx2 ## 自行编译开发环境镜像 -可以自己创建一个 Doris 开发环境镜像,具体可参阅 `docker/README.md` 文件。 \ No newline at end of file +可以自己创建一个 Doris 开发环境镜像,具体可参阅 `docker/README.md` 文件。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md index fa5044f1f4..186c06f34c 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md @@ -65,6 +65,8 @@ sh ldb_toolchain_gen.sh /path/to/ldb_toolchain/ - 下载 [Java8](https://doris-thirdparty-1308700295.cos.ap-beijing.myqcloud.com/tools/jdk-8u391-linux-x64.tar.gz),安装到 /path/to/java + > 3.0(含)之后的版本,或 master 分支,请使用 [Java 17](https://download.oracle.com/java/17/archive/jdk-17.0.10_linux-x64_bin.tar.gz)。 + - 下载 [Apache Maven 3.6.3](https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/apache-maven-3.6.3-bin.tar.gz),安装到 /path/to/maven - 下载 [Node v12.13.0](https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/node-v12.13.0-linux-x64.tar.gz),安装到 /path/to/node @@ -145,4 +147,4 @@ https://github.com/apache/doris-thirdparty/releases 这里我们提供了 Linux 和 MacOS 的预编译三方库。如果和你的编译运行环境一致,可以直接下载使用。 -下载好后,解压会得到一个 `installed/` 目录,将这个目录拷贝到 `thirdparty/` 目录下,之后运行 `build.sh` 即可。 \ No newline at end of file +下载好后,解压会得到一个 `installed/` 目录,将这个目录拷贝到 `thirdparty/` 目录下,之后运行 `build.sh` 即可。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/lakehouse/lakehouse-overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/lakehouse/lakehouse-overview.md index ee6202daae..d1b697ef64 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/lakehouse/lakehouse-overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/lakehouse/lakehouse-overview.md @@ -147,19 +147,7 @@ Doris 通过收集统计信息有助于优化器了解数据分布特性,在 在之前的 Doris 版本中,用户数据只有两个层级:Database 和 Table。当我们需要连接一个外部数据目录时,我们只能在 Database 或 Table 层级进行对接。比如通过 `create external table` 的方式创建一个外部数据目录中的表的映射,或通过 `create external database` 的方式映射一个外部数据目录中的 Database。如果外部数据目录中的 Database 或 Table 非常多,则需要用户手动进行一一映射,使用体验不佳。 -而新的 Multi-Catalog 功能在原有的元数据层级上,新增一层 Catalog,构成 Catalog -> Database -> Table 的三层元数据层级。其中,Catalog 可以直接对应到外部数据目录。目前支持的外部数据目录包括: - -- Apache Hive - -- Apache Iceberg - -- Apache Hudi - -- Elasticsearch - -- JDBC: 对接数据库访问的标准接口 (JDBC) 来访问各式数据库的数据。 - -- Apache Paimon(Incubating) +而新的 Multi-Catalog 功能在原有的元数据层级上,新增一层 Catalog,构成 Catalog -> Database -> Table 的三层元数据层级。 该功能将作为之前外表连接方式(External Table)的补充和增强,帮助用户进行快速的多数据目录联邦查询。 @@ -196,7 +184,7 @@ Doris 通过收集统计信息有助于优化器了解数据分布特性,在 这里我们通过连接一个 Hive 集群说明如何使用 Catalog 功能。 -更多关于 Hive 的说明,请参阅:[Hive Catalog](../lakehouse/datalake-analytics/hive) +更多关于 Hive 的说明,请参阅:[Hive Catalog](./datalake-analytics/hive) **1. 创建 Catalog** diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/install/cluster-deployment/standard-deployment.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/install/cluster-deployment/standard-deployment.md index 657168cc9b..3e014a0d8e 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/install/cluster-deployment/standard-deployment.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/install/cluster-deployment/standard-deployment.md @@ -124,6 +124,14 @@ Doris 支持运行和部署在 x86-64 架构的服务器平台或 ARM64 架构 | BE | Doris 默认 LZ4 压缩方式进行存储,压缩比在 0.3 - 0.5 左右磁盘空间需要按照总数据量 * 3(3 副本)计算需要预留出 40% 空间用作后台 compaction 以及临时数据的存储 | | Broker | 如需部署 Broker,通常情况下可以将 Broker 节点与 FE / BE 节点部署在同一台机器上 | +### Java 版本 + +Doris 的所有进程都依赖 Java。 + +在 2.1(含)版本之前,请使用 Java 8,推荐版本:`openjdk-8u352-b08-linux-x64`。 + +从 3.0(含)版本之后,请使用 Java 17,推荐版本:`jdk-17.0.10_linux-x64_bin.tar.gz`。 + ## 2 操作系统检查 ### 关闭 swap 分区 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/install/source-install/compilation-with-docker.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/install/source-install/compilation-with-docker.md index 53d4208ca3..870314cb15 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/install/source-install/compilation-with-docker.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/install/source-install/compilation-with-docker.md @@ -69,7 +69,7 @@ apache/doris build-env-for-2.0 f29cf1979dba 3 days ago 3.3GB - 编译镜像变更信息可参考 [ChangeLog](https://github.com/apache/doris/blob/master/thirdparty/CHANGELOG.md)。 -- 最新版本的 `apache/doris:build-env-ldb-toolchain-latest` 镜像中同时包含 JDK 8 和 JDK 17。 +- 最新版本的 `apache/doris:build-env-ldb-toolchain-latest` 镜像中同时包含 JDK 8 和 JDK 17。2.1(含)之前的版本,请使用 JDK 8。3.0(含)之后的版本或 master 分支,请使用 JDK 17。 ```Bash # 切换到 JDK 8 @@ -145,4 +145,4 @@ $ cat /proc/cpuinfo | grep avx2 ## 自行编译开发环境镜像 -可以自己创建一个 Doris 开发环境镜像,具体可参阅 `docker/README.md` 文件。 \ No newline at end of file +可以自己创建一个 Doris 开发环境镜像,具体可参阅 `docker/README.md` 文件。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/install/source-install/compilation-with-ldb-toolchain.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/install/source-install/compilation-with-ldb-toolchain.md index fa5044f1f4..186c06f34c 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/install/source-install/compilation-with-ldb-toolchain.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/install/source-install/compilation-with-ldb-toolchain.md @@ -65,6 +65,8 @@ sh ldb_toolchain_gen.sh /path/to/ldb_toolchain/ - 下载 [Java8](https://doris-thirdparty-1308700295.cos.ap-beijing.myqcloud.com/tools/jdk-8u391-linux-x64.tar.gz),安装到 /path/to/java + > 3.0(含)之后的版本,或 master 分支,请使用 [Java 17](https://download.oracle.com/java/17/archive/jdk-17.0.10_linux-x64_bin.tar.gz)。 + - 下载 [Apache Maven 3.6.3](https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/apache-maven-3.6.3-bin.tar.gz),安装到 /path/to/maven - 下载 [Node v12.13.0](https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/node-v12.13.0-linux-x64.tar.gz),安装到 /path/to/node @@ -145,4 +147,4 @@ https://github.com/apache/doris-thirdparty/releases 这里我们提供了 Linux 和 MacOS 的预编译三方库。如果和你的编译运行环境一致,可以直接下载使用。 -下载好后,解压会得到一个 `installed/` 目录,将这个目录拷贝到 `thirdparty/` 目录下,之后运行 `build.sh` 即可。 \ No newline at end of file +下载好后,解压会得到一个 `installed/` 目录,将这个目录拷贝到 `thirdparty/` 目录下,之后运行 `build.sh` 即可。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/lakehouse-overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/lakehouse-overview.md index d4b2f779da..f70ba45a4a 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/lakehouse-overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/lakehouse/lakehouse-overview.md @@ -148,19 +148,7 @@ Doris 通过收集统计信息有助于优化器了解数据分布特性,在 在之前的 Doris 版本中,用户数据只有两个层级:Database 和 Table。当我们需要连接一个外部数据目录时,我们只能在 Database 或 Table 层级进行对接。比如通过 `create external table` 的方式创建一个外部数据目录中的表的映射,或通过 `create external database` 的方式映射一个外部数据目录中的 Database。如果外部数据目录中的 Database 或 Table 非常多,则需要用户手动进行一一映射,使用体验不佳。 -而新的 Multi-Catalog 功能在原有的元数据层级上,新增一层 Catalog,构成 Catalog -> Database -> Table 的三层元数据层级。其中,Catalog 可以直接对应到外部数据目录。目前支持的外部数据目录包括: - -- Apache Hive - -- Apache Iceberg - -- Apache Hudi - -- Elasticsearch - -- JDBC: 对接数据库访问的标准接口 (JDBC) 来访问各式数据库的数据。 - -- Apache Paimon(Incubating) +而新的 Multi-Catalog 功能在原有的元数据层级上,新增一层 Catalog,构成 Catalog -> Database -> Table 的三层元数据层级。 该功能将作为之前外表连接方式(External Table)的补充和增强,帮助用户进行快速的多数据目录联邦查询。 @@ -197,7 +185,7 @@ Doris 通过收集统计信息有助于优化器了解数据分布特性,在 这里我们通过连接一个 Hive 集群说明如何使用 Catalog 功能。 -更多关于 Hive 的说明,请参阅:[Hive Catalog](../lakehouse/datalake-analytics/hive) +更多关于 Hive 的说明,请参阅:[Hive Catalog](./datalake-analytics/hive) **1. 创建 Catalog** diff --git a/versioned_docs/version-2.0/install/cluster-deployment/standard-deployment.md b/versioned_docs/version-2.0/install/cluster-deployment/standard-deployment.md index 7c169c150e..fb32f58a39 100644 --- a/versioned_docs/version-2.0/install/cluster-deployment/standard-deployment.md +++ b/versioned_docs/version-2.0/install/cluster-deployment/standard-deployment.md @@ -115,6 +115,14 @@ In a Doris cluster, FE is mainly responsible for metadata storage, including met | BE | Doris uses LZ4 compression by default, with a compression ratio of 0.3~0.5.Disk space needs to be calculated based on the total data volume * 3 (3 data replicas)There is a need to reserve 40% disk space for background data compaction and temporary data storage. | | Broker | If you want to deploy a Broker, you can usually deploy the Broker node on the same machine as the FE /BE nodes. | +### Java version + +All Doris processes depend on Java. + +Before version 2.1 (inclusive), please use Java 8, recommended version: `openjdk-8u352-b08-linux-x64`. + +After version 3.0 (inclusive), please use Java 17, recommended version: `jdk-17.0.10_linux-x64_bin.tar.gz`. + ## 2. Check operating system ### Disable swap partition diff --git a/versioned_docs/version-2.0/install/source-install/compilation-with-docker.md b/versioned_docs/version-2.0/install/source-install/compilation-with-docker.md index a2d5771f57..8b1fdc7820 100644 --- a/versioned_docs/version-2.0/install/source-install/compilation-with-docker.md +++ b/versioned_docs/version-2.0/install/source-install/compilation-with-docker.md @@ -67,19 +67,6 @@ apache/doris build-env-for-2.0 f29cf1979dba 3 days ago 3.3GB - `apache/doris:build-env-ldb-toolchain-latest` is used for compiling the latest master code and is updated along with the master. You can check the update time in the `docker/README.md` file. - Images with "no-avx2" in their names contain third-party libraries that can run on CPUs that do not support AVX2 instructions. Using these images, you can compile Doris with the "USE_AVX2=0". - For information about changes in the compilation image, please see [ChangeLog](https://github.com/apache/doris/blob/master/thirdparty/CHANGELOG.md). -- The Docker compilation image includes both OpenJDK 8 and OpenJDK 11. You can check the default JDK version by running `java -version`, and switch between versions using the following commands (JDK 8 as the default version is recommended). - -```Bash -# Switch to JDK 8 -alternatives --set java java-1.8.0-openjdk.x86_64 -alternatives --set javac java-1.8.0-openjdk.x86_64 -export JAVA_HOME=/usr/lib/jvm/java-1.8.0 - -# Switch to JDK 11 -alternatives --set java java-11-openjdk.x86_64 -alternatives --set javac java-11-openjdk.x86_64 -export JAVA_HOME=/usr/lib/jvm/java-11 -``` ## Compile Doris diff --git a/versioned_docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md b/versioned_docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md index 88215d141a..fecf724365 100644 --- a/versioned_docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md +++ b/versioned_docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md @@ -66,6 +66,9 @@ sh ldb_toolchain_gen.sh /path/to/ldb_toolchain/ 3. **Download and install other compilation components** - Download [Java8](https://doris-thirdparty-1308700295.cos.ap-beijing.myqcloud.com/tools/jdk-8u391-linux-x64.tar.gz) and install it to /path/to/java. + + > For versions later than 3.0 (inclusive), or the master branch, please use [Java 17](https://download.oracle.com/java/17/archive/jdk-17.0.10_linux-x64_bin.tar.gz). + - Download [Apache Maven 3.6.3](https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/apache-maven-3.6.3-bin.tar.gz) and install it to /path/to/maven. - Download [Node v12.13.0](https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/node-v12.13.0-linux-x64.tar.gz) and install it to /path/to/node. - Different Linux distributions may include different default components. Therefore, you may need to install some additional components. The following takes CentOS 6 as an example. Similar steps may apply to other distributions: diff --git a/versioned_docs/version-2.0/lakehouse/lakehouse-overview.md b/versioned_docs/version-2.0/lakehouse/lakehouse-overview.md index b11a19bb97..b5636edacc 100644 --- a/versioned_docs/version-2.0/lakehouse/lakehouse-overview.md +++ b/versioned_docs/version-2.0/lakehouse/lakehouse-overview.md @@ -131,14 +131,7 @@ Multi-Catalog is designed to facilitate connection to external data catalogs and In older versions of Doris, user data is in a two-tiered structure: database and table. Thus, connections to external catalogs could only be done at the database or table level. For example, users could create a mapping to a table in an external catalog via `create external table`, or to a database via `create external database`. If there are large amounts of databases or tables in the external catalog, users will need to create mappings to them one by one, which could be tedious. -With Multi-Catalog, Doris now has a new three-tiered metadata hierarchy (catalog -> database -> table), which means users can connect to external data at the catalog level directly. Currently it supports external catalogs including: - -- Apache Hive -- Apache Iceberg -- Apache Hudi -- Elasticsearch -- JDBC -- Apache Paimon(Incubating) +With Multi-Catalog, Doris now has a new three-tiered metadata hierarchy (catalog -> database -> table), which means users can connect to external data at the catalog level directly. Multi-Catalog works as an additional and enhanced external table connection method. It helps users conduct multi-catalog federated queries quickly. @@ -175,7 +168,7 @@ You cand delete an External Catalog via the [DROP CATALOG](../sql-manual/sql-ref The following is the instruction on how to connect to a Hive catalog using the Catalog feature. -For more information about connecting to Hive, please see [Hive](../lakehouse/datalake/hive). +For more information about connecting to Hive, please see [Hive Catalog](./datalake/hive). 1. Create Catalog diff --git a/versioned_docs/version-2.1/install/cluster-deployment/standard-deployment.md b/versioned_docs/version-2.1/install/cluster-deployment/standard-deployment.md index 635d6d3a9a..c8b565a1b7 100644 --- a/versioned_docs/version-2.1/install/cluster-deployment/standard-deployment.md +++ b/versioned_docs/version-2.1/install/cluster-deployment/standard-deployment.md @@ -116,6 +116,14 @@ In a Doris cluster, FE is mainly responsible for metadata storage, including met | BE | Doris uses LZ4 compression by default, with a compression ratio of 0.3~0.5.Disk space needs to be calculated based on the total data volume * 3 (3 data replicas)There is a need to reserve 40% disk space for background data compaction and temporary data storage. | | Broker | If you want to deploy a Broker, you can usually deploy the Broker node on the same machine as the FE /BE nodes. | +### Java version + +All Doris processes depend on Java. + +Before version 2.1 (inclusive), please use Java 8, recommended version: `openjdk-8u352-b08-linux-x64`. + +After version 3.0 (inclusive), please use Java 17, recommended version: `jdk-17.0.10_linux-x64_bin.tar.gz`. + ## 2. Check operating system ### Disable swap partition diff --git a/versioned_docs/version-2.1/install/source-install/compilation-with-docker.md b/versioned_docs/version-2.1/install/source-install/compilation-with-docker.md index a2d5771f57..1dc69fb907 100644 --- a/versioned_docs/version-2.1/install/source-install/compilation-with-docker.md +++ b/versioned_docs/version-2.1/install/source-install/compilation-with-docker.md @@ -67,18 +67,16 @@ apache/doris build-env-for-2.0 f29cf1979dba 3 days ago 3.3GB - `apache/doris:build-env-ldb-toolchain-latest` is used for compiling the latest master code and is updated along with the master. You can check the update time in the `docker/README.md` file. - Images with "no-avx2" in their names contain third-party libraries that can run on CPUs that do not support AVX2 instructions. Using these images, you can compile Doris with the "USE_AVX2=0". - For information about changes in the compilation image, please see [ChangeLog](https://github.com/apache/doris/blob/master/thirdparty/CHANGELOG.md). -- The Docker compilation image includes both OpenJDK 8 and OpenJDK 11. You can check the default JDK version by running `java -version`, and switch between versions using the following commands (JDK 8 as the default version is recommended). +- The Docker compilation image includes both JDK 8 and JDK 17. You can check the default JDK version by running `java -version`, and switch between versions using the following commands. For versions earlier than 2.1 (inclusive), please use JDK 8. For versions later than 3.0 (inclusive) or the master branch, please use JDK 17. ```Bash # Switch to JDK 8 -alternatives --set java java-1.8.0-openjdk.x86_64 -alternatives --set javac java-1.8.0-openjdk.x86_64 export JAVA_HOME=/usr/lib/jvm/java-1.8.0 +export PATH=$JAVA_HOME/bin/:$PATH -# Switch to JDK 11 -alternatives --set java java-11-openjdk.x86_64 -alternatives --set javac java-11-openjdk.x86_64 -export JAVA_HOME=/usr/lib/jvm/java-11 +# Switch to JDK 17 +export JAVA_HOME=/usr/lib/jvm/jdk-17.0.2/ +export PATH=$JAVA_HOME/bin/:$PATH ``` ## Compile Doris diff --git a/versioned_docs/version-2.1/install/source-install/compilation-with-ldb-toolchain.md b/versioned_docs/version-2.1/install/source-install/compilation-with-ldb-toolchain.md index 64ef261156..da2e3234f8 100644 --- a/versioned_docs/version-2.1/install/source-install/compilation-with-ldb-toolchain.md +++ b/versioned_docs/version-2.1/install/source-install/compilation-with-ldb-toolchain.md @@ -66,6 +66,9 @@ sh ldb_toolchain_gen.sh /path/to/ldb_toolchain/ 3. **Download and install other compilation components** - Download [Java8](https://doris-thirdparty-1308700295.cos.ap-beijing.myqcloud.com/tools/jdk-8u391-linux-x64.tar.gz) and install it to /path/to/java. + + > For versions later than 3.0 (inclusive), or the master branch, please use [Java 17](https://download.oracle.com/java/17/archive/jdk-17.0.10_linux-x64_bin.tar.gz). + - Download [Apache Maven 3.6.3](https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/apache-maven-3.6.3-bin.tar.gz) and install it to /path/to/maven. - Download [Node v12.13.0](https://doris-thirdparty-repo.bj.bcebos.com/thirdparty/node-v12.13.0-linux-x64.tar.gz) and install it to /path/to/node. - Different Linux distributions may include different default components. Therefore, you may need to install some additional components. The following takes CentOS 6 as an example. Similar steps may apply to other distributions: diff --git a/versioned_docs/version-2.1/lakehouse/datalake-analytics/hudi.md b/versioned_docs/version-2.1/lakehouse/datalake-analytics/hudi.md index c986710d49..985f1aa363 100644 --- a/versioned_docs/version-2.1/lakehouse/datalake-analytics/hudi.md +++ b/versioned_docs/version-2.1/lakehouse/datalake-analytics/hudi.md @@ -105,3 +105,22 @@ You can use the `FOR TIME AS OF` statement, based on the time of the snapshot to `SELECT * FROM hudi_tbl FOR TIME AS OF "20221007172037";` Hudi table does not support the `FOR VERSION AS OF` statement. Using this syntax to query the Hudi table will throw an error. + +## Incremental Read + +Incremental Read can query the data changed between startTime and endTime, and the returned result set is the final state of the data at endTime. + +Doris provides `@incr` syntax support Incremental Read: +``` +SELECT * from hudi_table@incr('beginTime'='xxx', ['endTime'='xxx'], ['hoodie.read.timeline.holes.resolution.policy'='FAIL'], ...); +``` +`beginTime` is required, and the time format is consistent with the hudi official website [hudi_table_changes](https://hudi.apache.org/docs/0.14.0/quick-start-guide/#incremental-query), and supports "earliest". `endTime` is optional, and the default is the latest commitTime. Compatible with [Spark Read Options](https://hudi.apache.org/docs/0.14.0/configurations#Read-Options). + +To support Incremental Read, you need to enable the [new optimizer](../../query/nereids/nereids), which is enabled by default. By viewing the execution plan through `desc`, we can find that Doris converts `@incr` into `predicates` and pushes it down to `VHUDI_SCAN_NODE`: + +``` +| 0:VHUDI_SCAN_NODE(113) | +| table: lineitem_mor | +| predicates: (_hoodie_commit_time[#0] >= '20240311151019723'), (_hoodie_commit_time[#0] <= '20240311151606605') | +| inputSplitNum=1, totalFileSize=13099711, scanRanges=1 | +``` diff --git a/versioned_docs/version-2.1/lakehouse/lakehouse-overview.md b/versioned_docs/version-2.1/lakehouse/lakehouse-overview.md index 0dfbec5c7a..f80b3ac42d 100644 --- a/versioned_docs/version-2.1/lakehouse/lakehouse-overview.md +++ b/versioned_docs/version-2.1/lakehouse/lakehouse-overview.md @@ -131,14 +131,7 @@ Multi-Catalog is designed to facilitate connection to external data catalogs and In older versions of Doris, user data is in a two-tiered structure: database and table. Thus, connections to external catalogs could only be done at the database or table level. For example, users could create a mapping to a table in an external catalog via `create external table`, or to a database via `create external database`. If there are large amounts of databases or tables in the external catalog, users will need to create mappings to them one by one, which could be tedious. -With Multi-Catalog, Doris now has a new three-tiered metadata hierarchy (catalog -> database -> table), which means users can connect to external data at the catalog level directly. Currently it supports external catalogs including: - -- Apache Hive -- Apache Iceberg -- Apache Hudi -- Elasticsearch -- JDBC -- Apache Paimon(Incubating) +With Multi-Catalog, Doris now has a new three-tiered metadata hierarchy (catalog -> database -> table), which means users can connect to external data at the catalog level directly. Multi-Catalog works as an additional and enhanced external table connection method. It helps users conduct multi-catalog federated queries quickly. @@ -175,7 +168,7 @@ You cand delete an External Catalog via the [DROP CATALOG](../sql-manual/sql-sta The following is the instruction on how to connect to a Hive catalog using the Catalog feature. -For more information about connecting to Hive, please see [Hive](../lakehouse/datalake-analytics/hive). +For more information about connecting to Hive, please see [Hive Catalog](./datalake-analytics/hive). 1. Create Catalog --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org