This is an automated email from the ASF dual-hosted git repository. liaoxin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new 96c1090cedf [doc](load) optimize migration from olap (#1764) 96c1090cedf is described below commit 96c1090cedf061df38c42fad161d3febae1826c2 Author: Xin Liao <liao...@selectdb.com> AuthorDate: Sun Jan 12 22:53:21 2025 +0800 [doc](load) optimize migration from olap (#1764) --- .../data-source/migrate-data-from-other-olap.md | 50 +++++++++++----------- .../data-source/migrate-data-from-other-olap.md | 11 ----- .../data-source/migrate-data-from-other-olap.md | 11 ----- .../data-source/migrate-data-from-other-olap.md | 11 ----- .../data-source/migrate-data-from-other-olap.md | 50 +++++++++++----------- .../data-source/migrate-data-from-other-olap.md | 50 +++++++++++----------- 6 files changed, 72 insertions(+), 111 deletions(-) diff --git a/docs/data-operate/import/data-source/migrate-data-from-other-olap.md b/docs/data-operate/import/data-source/migrate-data-from-other-olap.md index 5e69bc3502f..9388f9d8449 100644 --- a/docs/data-operate/import/data-source/migrate-data-from-other-olap.md +++ b/docs/data-operate/import/data-source/migrate-data-from-other-olap.md @@ -24,54 +24,52 @@ specific language governing permissions and limitations under the License. --> -To migrate data from other OLAP systems to Doris, you have a variety of options: +To migrate data from other OLAP systems to Doris, you have several options: -- For systems like Hive/Iceberg/Hudi, you can leverage Multi-Catalog to map them as external tables and then use "Insert Into" to import the data into Doris. +- For systems like Hive/Iceberg/Hudi, you can use Multi-Catalog to map them as external tables and then use "Insert Into" to load the data -- You can export data from the OLAP system into formats like CSV, and then import the data files into Doris. +- You can export data from the OLAP system into formats like CSV, and then load these data files into Doris -- You can also leverage the connectors of the OLAP systems, use tools like Spark / Flink, and then call the corresponding Doris Connector to write data into Doris. +- You can use systems like Spark/Flink, utilizing the OLAP system's Connector to read data, and then call the Doris Connector to write into Doris -The following third-party migration tools are also available: +Additionally, the following third-party migration tools are available: -- [X2Doris](https://www.velodb.io/download/tools). +- [X2Doris](https://www.selectdb.com/tools/x2doris) - X2Doris is a core tool specifically for migrating various offline data to Apache Doris. This tool integrates `automatic Doris table creation` and `data migration`. Currently, it supports the migration of data from Apache Doris/Hive/Kudu, and StarRocks databases to Doris. The entire process is visualized on a platform, making it very simple and easy to use, thereby lowering the threshold for synchronizing data to Doris. + X2Doris is a core tool specifically designed for migrating various offline data to Apache Doris. This tool combines `automatic Doris table creation` and `data migration`. Currently, it supports data migration from Apache Doris/Hive/Kudu and StarRocks databases to Doris. The entire process is operated through a visual platform, making it very simple and easy to use, reducing the barrier to synchronizing data to Doris. :::info NOTE -All third-party tools are not maintained or endorsed by the Apache Doris, which is overseen by the Committers and the Doris PMC. Their use is entirely at your discretion, and the community is not responsible for verifying the licenses or validity of these tools. +If you know of other migration tools that could be added to this list, please contact d...@doris.apache.org ::: -:::info NOTE -If you know of the third-party migration tool for Doris that should be added to this list, please let us know at d...@doris.apache.org -::: +## X2Doris Core Features -## X2Doris +### Multi-Source Support -### Support multiple data sources +As a one-stop data migration tool, X2Doris currently supports Apache Hive, Apache Kudu, StarRocks, and Apache Doris itself as data sources. More data sources such as Greenplum and Druid are under development and will be released subsequently. The Hive version supports both Hive 1.x and 2.x versions, while Doris, StarRocks, Kudu, and other data sources also support multiple different versions. -As a one-stop data migration tool, X2Doris supports Apache Hive, Apache Kudu, StarRocks, and Apache Doris itself as data source. What's more, there are more data sources such as Greenplum and Druid that are under development and will be released subsequently. Among them, the Hive version already supports Hive 1.x and 2.x, while Doris, StarRocks, Kudu, and other data sources also support multiple different versions. +Users can build complete database migration pipelines from other OLAP systems to Apache Doris using X2Doris, and achieve data backup and recovery between different Doris clusters. -With X2Doris, users can build a complete database migration link from other OLAP systems to Apache Doris, and can also achieve data backup and recovery between different Doris clusters. +### Automatic Table Creation - +One of the biggest pain points in data migration is creating corresponding target tables in Apache Doris for the source tables to be migrated. In real business scenarios, with thousands of tables stored in Hive, manually creating target tables and converting corresponding DDL statements is inefficient and impractical. -### Auto table creation +X2Doris has been adapted for this scenario. Taking Hive table migration as an example, when migrating Hive tables, X2Doris automatically creates Duplicate Key model tables (which can be manually modified) in Apache Doris and reads the Hive table's metadata information. It automatically identifies partition fields through field names and types, prompts for partition mapping if partitions are detected, and directly generates the corresponding Doris target table DDL. -One of the biggest challenges in data migration is how to create corresponding target tables in Apache Doris for the source tables that need to be migrated. In real business scenarios, there are often thousands of tables stored in Hive, and it would be extremely inefficient and impractical for users to manually create tables and convert corresponding DDL statements. +When the upstream data source is Doris/StarRocks, X2Doris automatically parses the table model based on source table information, maps source field types to corresponding target field types, and processes upstream Properties parameters, converting them into target table attribute parameters. Additionally, X2Doris has enhanced support for complex types, enabling migration of Array, Map, and Bitmap type data. -X2Doris has been adapted for this scenario. Taking Hive table migration as an example, when migrating Hive tables, X2Doris automatically creates Duplicate Key model tables (which can also be manually modified) in Apache Doris and reads the metadata information of Hive tables. It automatically identifies partition fields based on field names and types, and if partitions are detected, it prompts for partition mapping. Finally, it directly generates the corresponding Doris target table DDL. +### High Speed and Stability -When the upstream data source is Doris/StarRocks, X2Doris automatically parses the table model based on the source table information, maps the source table field types to the corresponding target field types, and identifies and processes upstream properties parameters, converting them into attribute parameters for the corresponding target table. In addition, X2Doris has also enhanced support for complex types, enabling data migration for Array, Map, and Bitmap types. +In terms of data writing, X2Doris has specifically optimized the data reading process. By optimizing data batching logic, it further reduces memory usage, while making significant improvements and enhancements to Stream Load write requests, optimizing memory usage and release, further improving data migration speed and stability. - +Compared to other similar migration tools, X2Doris performs about 2-10 times faster. For example, when synchronizing 50 million records in full with 1GB memory on a single machine, other tools take about 90 seconds, while X2Doris completes it in less than 50 seconds, achieving nearly 100% performance improvement. -### High speed & stability +In a real-world large-scale log data migration scenario, with individual records of 1KB size, a single table containing nearly 100 million records, and total storage space of about 90 GB, X2Doris completed the full table migration in just 2 minutes, with an average write speed of nearly 800 MB/s. -For data writing, X2Doris has specifically optimized the reading process. By optimizing the data batching logic, it further reduces memory usage. Additionally, significant improvements and enhancements have been made to Stream Load write requests, optimizing memory usage and release, further enhancing the speed and stability of data migration. +## Using X2Doris -Compared to other similar migration tools, X2Doris offers a performance advantage of approximately 2-10 times. For example, when using a single machine with 1G of memory, other tools take approximately 90 seconds to synchronize 50 million rows of data in full, while X2Doris completes the task in less than 50 seconds, achieving a nearly 100% performance improvement. +- Product Introduction: https://www.selectdb.com/tools/x2doris -In a practical large-scale log data migration scenario, with individual data records averaging 1KB in size, a single table containing nearly 100 million records, and a total storage space of approximately 90 GB, X2Doris can complete the full table migration in just 2 minutes, with an average write speed of nearly 800 MB/s. +- Download Now: https://www.selectdb.com/download/tools#x2doris - +- Documentation: https://docs.selectdb.com/docs/ecosystem/x2doris/x2doris-deployment-guide diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/data-source/migrate-data-from-other-olap.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/data-source/migrate-data-from-other-olap.md index 7915c5b42af..25b8a3e8138 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/data-source/migrate-data-from-other-olap.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/data-source/migrate-data-from-other-olap.md @@ -39,10 +39,6 @@ under the License. X2Doris 专门用于将各种离线数据迁移到 Apache Doris 中的核心工具,该工具集 `自动建 Doris 表` 和 `数据迁移` 为一体,目前支持了 Apache Doris/Hive/Kudu、StarRocks 数据库往 Doris 迁移的工作,整个过程可视化的平台操作,非常简单易用,减轻数据同步到 Doris 中的门槛。 -:::info NOTE -这些第三方提供的工具并非由 Apache Doris 维护或认可,这些工作由 Committers 和 Doris PMC 监督。使用这些资源和服务完全由您自行决定,社区不负责验证这些工具的许可或有效性。 -::: - :::info NOTE 如果有其他迁移工具可以加入此列表,可以联系 d...@doris.apache.org ::: @@ -55,21 +51,14 @@ under the License. 基于 X2Doris 用户可以构建从其他 OLAP 系统到 Apache Doris 的整库迁移链路,并可以实现不同 Doris 集群间的数据备份和恢复。 - - ### 自动建表 数据迁移中最大的痛点,首当其冲的是如何将待迁移的源表在 Apache Doris 中创建对应的目标表。在实际业务场景中,存储在 Hive 中动辄上千张表,让用户手动创建目标表并转换对应的 DDL 语句效率显得过于低下,不具备实际操作可能性。 X2Doris 为此场景做了适配,在此以 Hive 表迁移为例。在迁移 Hive 表的时候,X2Doris 会在 Apache Doris 中自动创建 Duplicate Key 模型表(也可手动修改)并读取 Hive 表的元数据信息,通过字段名和字段类型自动识别分区字段,如果识别到分区则会提示进行分区映射,最后会直接生成对应的 Doris 目标表 DDL。 - - - 在上游数据源为 Doris/StarRocks 时,X2Doris 会自动根据源表信息解析出表模型,自动根据源表字段类型映射对应的目标字段类型,针对上游的 Properties 参数也会识别处理,转换成对应目标表的属性参数。除此以外,X2Doris 还对复杂类型进行了增强,实现了对 Array、Map、Bitmap 类型的数据迁移。 - - ### 极速稳定 在数据写入方面,X2Doris 特别针对读取数据进行了优化。通过优化数据攒批逻辑进一步减小了内存的使用,同时对 Stream Load 写入请求进行了大量改进和增强,对内存使用和释放进行优化,进一步提升数据迁移的速度和稳定性。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/data-source/migrate-data-from-other-olap.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/data-source/migrate-data-from-other-olap.md index 7915c5b42af..25b8a3e8138 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/data-source/migrate-data-from-other-olap.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/data-source/migrate-data-from-other-olap.md @@ -39,10 +39,6 @@ under the License. X2Doris 专门用于将各种离线数据迁移到 Apache Doris 中的核心工具,该工具集 `自动建 Doris 表` 和 `数据迁移` 为一体,目前支持了 Apache Doris/Hive/Kudu、StarRocks 数据库往 Doris 迁移的工作,整个过程可视化的平台操作,非常简单易用,减轻数据同步到 Doris 中的门槛。 -:::info NOTE -这些第三方提供的工具并非由 Apache Doris 维护或认可,这些工作由 Committers 和 Doris PMC 监督。使用这些资源和服务完全由您自行决定,社区不负责验证这些工具的许可或有效性。 -::: - :::info NOTE 如果有其他迁移工具可以加入此列表,可以联系 d...@doris.apache.org ::: @@ -55,21 +51,14 @@ under the License. 基于 X2Doris 用户可以构建从其他 OLAP 系统到 Apache Doris 的整库迁移链路,并可以实现不同 Doris 集群间的数据备份和恢复。 - - ### 自动建表 数据迁移中最大的痛点,首当其冲的是如何将待迁移的源表在 Apache Doris 中创建对应的目标表。在实际业务场景中,存储在 Hive 中动辄上千张表,让用户手动创建目标表并转换对应的 DDL 语句效率显得过于低下,不具备实际操作可能性。 X2Doris 为此场景做了适配,在此以 Hive 表迁移为例。在迁移 Hive 表的时候,X2Doris 会在 Apache Doris 中自动创建 Duplicate Key 模型表(也可手动修改)并读取 Hive 表的元数据信息,通过字段名和字段类型自动识别分区字段,如果识别到分区则会提示进行分区映射,最后会直接生成对应的 Doris 目标表 DDL。 - - - 在上游数据源为 Doris/StarRocks 时,X2Doris 会自动根据源表信息解析出表模型,自动根据源表字段类型映射对应的目标字段类型,针对上游的 Properties 参数也会识别处理,转换成对应目标表的属性参数。除此以外,X2Doris 还对复杂类型进行了增强,实现了对 Array、Map、Bitmap 类型的数据迁移。 - - ### 极速稳定 在数据写入方面,X2Doris 特别针对读取数据进行了优化。通过优化数据攒批逻辑进一步减小了内存的使用,同时对 Stream Load 写入请求进行了大量改进和增强,对内存使用和释放进行优化,进一步提升数据迁移的速度和稳定性。 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/data-source/migrate-data-from-other-olap.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/data-source/migrate-data-from-other-olap.md index 7915c5b42af..25b8a3e8138 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/data-source/migrate-data-from-other-olap.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/data-source/migrate-data-from-other-olap.md @@ -39,10 +39,6 @@ under the License. X2Doris 专门用于将各种离线数据迁移到 Apache Doris 中的核心工具,该工具集 `自动建 Doris 表` 和 `数据迁移` 为一体,目前支持了 Apache Doris/Hive/Kudu、StarRocks 数据库往 Doris 迁移的工作,整个过程可视化的平台操作,非常简单易用,减轻数据同步到 Doris 中的门槛。 -:::info NOTE -这些第三方提供的工具并非由 Apache Doris 维护或认可,这些工作由 Committers 和 Doris PMC 监督。使用这些资源和服务完全由您自行决定,社区不负责验证这些工具的许可或有效性。 -::: - :::info NOTE 如果有其他迁移工具可以加入此列表,可以联系 d...@doris.apache.org ::: @@ -55,21 +51,14 @@ under the License. 基于 X2Doris 用户可以构建从其他 OLAP 系统到 Apache Doris 的整库迁移链路,并可以实现不同 Doris 集群间的数据备份和恢复。 - - ### 自动建表 数据迁移中最大的痛点,首当其冲的是如何将待迁移的源表在 Apache Doris 中创建对应的目标表。在实际业务场景中,存储在 Hive 中动辄上千张表,让用户手动创建目标表并转换对应的 DDL 语句效率显得过于低下,不具备实际操作可能性。 X2Doris 为此场景做了适配,在此以 Hive 表迁移为例。在迁移 Hive 表的时候,X2Doris 会在 Apache Doris 中自动创建 Duplicate Key 模型表(也可手动修改)并读取 Hive 表的元数据信息,通过字段名和字段类型自动识别分区字段,如果识别到分区则会提示进行分区映射,最后会直接生成对应的 Doris 目标表 DDL。 - - - 在上游数据源为 Doris/StarRocks 时,X2Doris 会自动根据源表信息解析出表模型,自动根据源表字段类型映射对应的目标字段类型,针对上游的 Properties 参数也会识别处理,转换成对应目标表的属性参数。除此以外,X2Doris 还对复杂类型进行了增强,实现了对 Array、Map、Bitmap 类型的数据迁移。 - - ### 极速稳定 在数据写入方面,X2Doris 特别针对读取数据进行了优化。通过优化数据攒批逻辑进一步减小了内存的使用,同时对 Stream Load 写入请求进行了大量改进和增强,对内存使用和释放进行优化,进一步提升数据迁移的速度和稳定性。 diff --git a/versioned_docs/version-2.1/data-operate/import/data-source/migrate-data-from-other-olap.md b/versioned_docs/version-2.1/data-operate/import/data-source/migrate-data-from-other-olap.md index 5e69bc3502f..9388f9d8449 100644 --- a/versioned_docs/version-2.1/data-operate/import/data-source/migrate-data-from-other-olap.md +++ b/versioned_docs/version-2.1/data-operate/import/data-source/migrate-data-from-other-olap.md @@ -24,54 +24,52 @@ specific language governing permissions and limitations under the License. --> -To migrate data from other OLAP systems to Doris, you have a variety of options: +To migrate data from other OLAP systems to Doris, you have several options: -- For systems like Hive/Iceberg/Hudi, you can leverage Multi-Catalog to map them as external tables and then use "Insert Into" to import the data into Doris. +- For systems like Hive/Iceberg/Hudi, you can use Multi-Catalog to map them as external tables and then use "Insert Into" to load the data -- You can export data from the OLAP system into formats like CSV, and then import the data files into Doris. +- You can export data from the OLAP system into formats like CSV, and then load these data files into Doris -- You can also leverage the connectors of the OLAP systems, use tools like Spark / Flink, and then call the corresponding Doris Connector to write data into Doris. +- You can use systems like Spark/Flink, utilizing the OLAP system's Connector to read data, and then call the Doris Connector to write into Doris -The following third-party migration tools are also available: +Additionally, the following third-party migration tools are available: -- [X2Doris](https://www.velodb.io/download/tools). +- [X2Doris](https://www.selectdb.com/tools/x2doris) - X2Doris is a core tool specifically for migrating various offline data to Apache Doris. This tool integrates `automatic Doris table creation` and `data migration`. Currently, it supports the migration of data from Apache Doris/Hive/Kudu, and StarRocks databases to Doris. The entire process is visualized on a platform, making it very simple and easy to use, thereby lowering the threshold for synchronizing data to Doris. + X2Doris is a core tool specifically designed for migrating various offline data to Apache Doris. This tool combines `automatic Doris table creation` and `data migration`. Currently, it supports data migration from Apache Doris/Hive/Kudu and StarRocks databases to Doris. The entire process is operated through a visual platform, making it very simple and easy to use, reducing the barrier to synchronizing data to Doris. :::info NOTE -All third-party tools are not maintained or endorsed by the Apache Doris, which is overseen by the Committers and the Doris PMC. Their use is entirely at your discretion, and the community is not responsible for verifying the licenses or validity of these tools. +If you know of other migration tools that could be added to this list, please contact d...@doris.apache.org ::: -:::info NOTE -If you know of the third-party migration tool for Doris that should be added to this list, please let us know at d...@doris.apache.org -::: +## X2Doris Core Features -## X2Doris +### Multi-Source Support -### Support multiple data sources +As a one-stop data migration tool, X2Doris currently supports Apache Hive, Apache Kudu, StarRocks, and Apache Doris itself as data sources. More data sources such as Greenplum and Druid are under development and will be released subsequently. The Hive version supports both Hive 1.x and 2.x versions, while Doris, StarRocks, Kudu, and other data sources also support multiple different versions. -As a one-stop data migration tool, X2Doris supports Apache Hive, Apache Kudu, StarRocks, and Apache Doris itself as data source. What's more, there are more data sources such as Greenplum and Druid that are under development and will be released subsequently. Among them, the Hive version already supports Hive 1.x and 2.x, while Doris, StarRocks, Kudu, and other data sources also support multiple different versions. +Users can build complete database migration pipelines from other OLAP systems to Apache Doris using X2Doris, and achieve data backup and recovery between different Doris clusters. -With X2Doris, users can build a complete database migration link from other OLAP systems to Apache Doris, and can also achieve data backup and recovery between different Doris clusters. +### Automatic Table Creation - +One of the biggest pain points in data migration is creating corresponding target tables in Apache Doris for the source tables to be migrated. In real business scenarios, with thousands of tables stored in Hive, manually creating target tables and converting corresponding DDL statements is inefficient and impractical. -### Auto table creation +X2Doris has been adapted for this scenario. Taking Hive table migration as an example, when migrating Hive tables, X2Doris automatically creates Duplicate Key model tables (which can be manually modified) in Apache Doris and reads the Hive table's metadata information. It automatically identifies partition fields through field names and types, prompts for partition mapping if partitions are detected, and directly generates the corresponding Doris target table DDL. -One of the biggest challenges in data migration is how to create corresponding target tables in Apache Doris for the source tables that need to be migrated. In real business scenarios, there are often thousands of tables stored in Hive, and it would be extremely inefficient and impractical for users to manually create tables and convert corresponding DDL statements. +When the upstream data source is Doris/StarRocks, X2Doris automatically parses the table model based on source table information, maps source field types to corresponding target field types, and processes upstream Properties parameters, converting them into target table attribute parameters. Additionally, X2Doris has enhanced support for complex types, enabling migration of Array, Map, and Bitmap type data. -X2Doris has been adapted for this scenario. Taking Hive table migration as an example, when migrating Hive tables, X2Doris automatically creates Duplicate Key model tables (which can also be manually modified) in Apache Doris and reads the metadata information of Hive tables. It automatically identifies partition fields based on field names and types, and if partitions are detected, it prompts for partition mapping. Finally, it directly generates the corresponding Doris target table DDL. +### High Speed and Stability -When the upstream data source is Doris/StarRocks, X2Doris automatically parses the table model based on the source table information, maps the source table field types to the corresponding target field types, and identifies and processes upstream properties parameters, converting them into attribute parameters for the corresponding target table. In addition, X2Doris has also enhanced support for complex types, enabling data migration for Array, Map, and Bitmap types. +In terms of data writing, X2Doris has specifically optimized the data reading process. By optimizing data batching logic, it further reduces memory usage, while making significant improvements and enhancements to Stream Load write requests, optimizing memory usage and release, further improving data migration speed and stability. - +Compared to other similar migration tools, X2Doris performs about 2-10 times faster. For example, when synchronizing 50 million records in full with 1GB memory on a single machine, other tools take about 90 seconds, while X2Doris completes it in less than 50 seconds, achieving nearly 100% performance improvement. -### High speed & stability +In a real-world large-scale log data migration scenario, with individual records of 1KB size, a single table containing nearly 100 million records, and total storage space of about 90 GB, X2Doris completed the full table migration in just 2 minutes, with an average write speed of nearly 800 MB/s. -For data writing, X2Doris has specifically optimized the reading process. By optimizing the data batching logic, it further reduces memory usage. Additionally, significant improvements and enhancements have been made to Stream Load write requests, optimizing memory usage and release, further enhancing the speed and stability of data migration. +## Using X2Doris -Compared to other similar migration tools, X2Doris offers a performance advantage of approximately 2-10 times. For example, when using a single machine with 1G of memory, other tools take approximately 90 seconds to synchronize 50 million rows of data in full, while X2Doris completes the task in less than 50 seconds, achieving a nearly 100% performance improvement. +- Product Introduction: https://www.selectdb.com/tools/x2doris -In a practical large-scale log data migration scenario, with individual data records averaging 1KB in size, a single table containing nearly 100 million records, and a total storage space of approximately 90 GB, X2Doris can complete the full table migration in just 2 minutes, with an average write speed of nearly 800 MB/s. +- Download Now: https://www.selectdb.com/download/tools#x2doris - +- Documentation: https://docs.selectdb.com/docs/ecosystem/x2doris/x2doris-deployment-guide diff --git a/versioned_docs/version-3.0/data-operate/import/data-source/migrate-data-from-other-olap.md b/versioned_docs/version-3.0/data-operate/import/data-source/migrate-data-from-other-olap.md index 5e69bc3502f..9388f9d8449 100644 --- a/versioned_docs/version-3.0/data-operate/import/data-source/migrate-data-from-other-olap.md +++ b/versioned_docs/version-3.0/data-operate/import/data-source/migrate-data-from-other-olap.md @@ -24,54 +24,52 @@ specific language governing permissions and limitations under the License. --> -To migrate data from other OLAP systems to Doris, you have a variety of options: +To migrate data from other OLAP systems to Doris, you have several options: -- For systems like Hive/Iceberg/Hudi, you can leverage Multi-Catalog to map them as external tables and then use "Insert Into" to import the data into Doris. +- For systems like Hive/Iceberg/Hudi, you can use Multi-Catalog to map them as external tables and then use "Insert Into" to load the data -- You can export data from the OLAP system into formats like CSV, and then import the data files into Doris. +- You can export data from the OLAP system into formats like CSV, and then load these data files into Doris -- You can also leverage the connectors of the OLAP systems, use tools like Spark / Flink, and then call the corresponding Doris Connector to write data into Doris. +- You can use systems like Spark/Flink, utilizing the OLAP system's Connector to read data, and then call the Doris Connector to write into Doris -The following third-party migration tools are also available: +Additionally, the following third-party migration tools are available: -- [X2Doris](https://www.velodb.io/download/tools). +- [X2Doris](https://www.selectdb.com/tools/x2doris) - X2Doris is a core tool specifically for migrating various offline data to Apache Doris. This tool integrates `automatic Doris table creation` and `data migration`. Currently, it supports the migration of data from Apache Doris/Hive/Kudu, and StarRocks databases to Doris. The entire process is visualized on a platform, making it very simple and easy to use, thereby lowering the threshold for synchronizing data to Doris. + X2Doris is a core tool specifically designed for migrating various offline data to Apache Doris. This tool combines `automatic Doris table creation` and `data migration`. Currently, it supports data migration from Apache Doris/Hive/Kudu and StarRocks databases to Doris. The entire process is operated through a visual platform, making it very simple and easy to use, reducing the barrier to synchronizing data to Doris. :::info NOTE -All third-party tools are not maintained or endorsed by the Apache Doris, which is overseen by the Committers and the Doris PMC. Their use is entirely at your discretion, and the community is not responsible for verifying the licenses or validity of these tools. +If you know of other migration tools that could be added to this list, please contact d...@doris.apache.org ::: -:::info NOTE -If you know of the third-party migration tool for Doris that should be added to this list, please let us know at d...@doris.apache.org -::: +## X2Doris Core Features -## X2Doris +### Multi-Source Support -### Support multiple data sources +As a one-stop data migration tool, X2Doris currently supports Apache Hive, Apache Kudu, StarRocks, and Apache Doris itself as data sources. More data sources such as Greenplum and Druid are under development and will be released subsequently. The Hive version supports both Hive 1.x and 2.x versions, while Doris, StarRocks, Kudu, and other data sources also support multiple different versions. -As a one-stop data migration tool, X2Doris supports Apache Hive, Apache Kudu, StarRocks, and Apache Doris itself as data source. What's more, there are more data sources such as Greenplum and Druid that are under development and will be released subsequently. Among them, the Hive version already supports Hive 1.x and 2.x, while Doris, StarRocks, Kudu, and other data sources also support multiple different versions. +Users can build complete database migration pipelines from other OLAP systems to Apache Doris using X2Doris, and achieve data backup and recovery between different Doris clusters. -With X2Doris, users can build a complete database migration link from other OLAP systems to Apache Doris, and can also achieve data backup and recovery between different Doris clusters. +### Automatic Table Creation - +One of the biggest pain points in data migration is creating corresponding target tables in Apache Doris for the source tables to be migrated. In real business scenarios, with thousands of tables stored in Hive, manually creating target tables and converting corresponding DDL statements is inefficient and impractical. -### Auto table creation +X2Doris has been adapted for this scenario. Taking Hive table migration as an example, when migrating Hive tables, X2Doris automatically creates Duplicate Key model tables (which can be manually modified) in Apache Doris and reads the Hive table's metadata information. It automatically identifies partition fields through field names and types, prompts for partition mapping if partitions are detected, and directly generates the corresponding Doris target table DDL. -One of the biggest challenges in data migration is how to create corresponding target tables in Apache Doris for the source tables that need to be migrated. In real business scenarios, there are often thousands of tables stored in Hive, and it would be extremely inefficient and impractical for users to manually create tables and convert corresponding DDL statements. +When the upstream data source is Doris/StarRocks, X2Doris automatically parses the table model based on source table information, maps source field types to corresponding target field types, and processes upstream Properties parameters, converting them into target table attribute parameters. Additionally, X2Doris has enhanced support for complex types, enabling migration of Array, Map, and Bitmap type data. -X2Doris has been adapted for this scenario. Taking Hive table migration as an example, when migrating Hive tables, X2Doris automatically creates Duplicate Key model tables (which can also be manually modified) in Apache Doris and reads the metadata information of Hive tables. It automatically identifies partition fields based on field names and types, and if partitions are detected, it prompts for partition mapping. Finally, it directly generates the corresponding Doris target table DDL. +### High Speed and Stability -When the upstream data source is Doris/StarRocks, X2Doris automatically parses the table model based on the source table information, maps the source table field types to the corresponding target field types, and identifies and processes upstream properties parameters, converting them into attribute parameters for the corresponding target table. In addition, X2Doris has also enhanced support for complex types, enabling data migration for Array, Map, and Bitmap types. +In terms of data writing, X2Doris has specifically optimized the data reading process. By optimizing data batching logic, it further reduces memory usage, while making significant improvements and enhancements to Stream Load write requests, optimizing memory usage and release, further improving data migration speed and stability. - +Compared to other similar migration tools, X2Doris performs about 2-10 times faster. For example, when synchronizing 50 million records in full with 1GB memory on a single machine, other tools take about 90 seconds, while X2Doris completes it in less than 50 seconds, achieving nearly 100% performance improvement. -### High speed & stability +In a real-world large-scale log data migration scenario, with individual records of 1KB size, a single table containing nearly 100 million records, and total storage space of about 90 GB, X2Doris completed the full table migration in just 2 minutes, with an average write speed of nearly 800 MB/s. -For data writing, X2Doris has specifically optimized the reading process. By optimizing the data batching logic, it further reduces memory usage. Additionally, significant improvements and enhancements have been made to Stream Load write requests, optimizing memory usage and release, further enhancing the speed and stability of data migration. +## Using X2Doris -Compared to other similar migration tools, X2Doris offers a performance advantage of approximately 2-10 times. For example, when using a single machine with 1G of memory, other tools take approximately 90 seconds to synchronize 50 million rows of data in full, while X2Doris completes the task in less than 50 seconds, achieving a nearly 100% performance improvement. +- Product Introduction: https://www.selectdb.com/tools/x2doris -In a practical large-scale log data migration scenario, with individual data records averaging 1KB in size, a single table containing nearly 100 million records, and a total storage space of approximately 90 GB, X2Doris can complete the full table migration in just 2 minutes, with an average write speed of nearly 800 MB/s. +- Download Now: https://www.selectdb.com/download/tools#x2doris - +- Documentation: https://docs.selectdb.com/docs/ecosystem/x2doris/x2doris-deployment-guide --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org