Hi, Yongqiang YANG, thank you for your response! I appended fe.conf with lines: async_load_task_pool_size = 320 async_loading_load_task_pool_size = 320 async_pending_load_task_pool_size = 320
320 is the total amount of cores in my configuration. I had restarted all the machines in the cluster (including FE) and tried again. Unfortunately, it didn't help. The total amount of instances for Hive querying/importing is 5 as before. Do you have any advice? Kind regards, Nikita пт, 10 мар. 2023 г. в 17:32, Yongqiang YANG <dataroar...@gmail.com>: > You can adjust async_pending_load_task_pool_size = 250 > and async_loading_load_task_pool_size=250 in fe.conf and have a try. > > On Fri, Mar 10, 2023 at 8:49 PM Nikita Sakharin <nikitasa1...@gmail.com> > wrote: > > > Hi, Apache Doris Community! > > > > I'm a beginner user of Doris and I'm experiencing performance issues. > > > > I have *4 hosts for BE: Intel Xeon 80 cores, 768GB RAM each.* Frontend > and > > broker were deployed on a different machine with the same configuration. > > All backends have "*mix*" as *Node Role*. > > Doris version is doris-1.2.2-rc01. > > Apache Hive cluster with HDFS throughput about 1 GB/sec. Replication > factor > > is 3 > > TPCDS benchmark <https://github.com/databricks/spark-sql-perf> was used: > > 330 GB of parquet data, snappy compression. Total size (with replication) > > is 1TB. > > > > The default configuration of Apache Doris was used except these > parameters: > > ADMIN SET FRONTEND CONFIG('broker_load_default_timeout_second' = > '21600'); > > ADMIN SET FRONTEND CONFIG('default_db_max_running_txn_num' = '320'); > > ADMIN SET FRONTEND CONFIG('default_load_parallelism' = '80'); > > ADMIN SET FRONTEND CONFIG('default_max_query_instances' = '640'); > > ADMIN SET FRONTEND CONFIG('disable_balance' = 'false'); > > ADMIN SET FRONTEND CONFIG('disable_disk_balance' = 'false'); > > ADMIN SET FRONTEND CONFIG('max_balancing_tablets' = '320'); > > ADMIN SET FRONTEND CONFIG('max_broker_concurrency' = '80'); > > ADMIN SET FRONTEND CONFIG('max_bytes_per_broker_scanner' = > '68719476736'); > > ADMIN SET FRONTEND CONFIG('max_routine_load_job_num' = '320'); > > ADMIN SET FRONTEND CONFIG('max_routine_load_task_num_per_be' = '80'); > > ADMIN SET FRONTEND CONFIG('max_routine_load_task_concurrent_num' = > '320'); > > ADMIN SET FRONTEND CONFIG('max_running_txn_num_per_db' = '320'); > > ADMIN SET FRONTEND CONFIG('max_scheduling_tablets' = '20480'); > > ADMIN SET FRONTEND CONFIG('min_bytes_per_broker_scanner' = '1073741824'); > > ADMIN SET FRONTEND CONFIG('remote_fragment_exec_timeout_ms' = '60000'); > > ADMIN SET FRONTEND CONFIG('tablet_create_timeout_second' = '60'); > > ADMIN SET FRONTEND CONFIG('tablet_delete_timeout_second' = '60'); > > > > SET GLOBAL cpu_resource_limit = 80; > > SET GLOBAL enable_exchange_node_parallel_merge = true; > > SET GLOBAL enable_profile = true; > > SET GLOBAL enable_vectorized_engine = true; > > SET GLOBAL exec_mem_limit = 768G; > > SET GLOBAL parallel_exchange_instance_num = 80; > > SET GLOBAL parallel_fragment_exec_instance_num = 80; > > SET GLOBAL query_timeout = 86400; > > SET GLOBAL send_batch_parallelism = 80; > > > > SET PROPERTY 'max_query_instances' = '640'; > > SET PROPERTY 'max_user_connections' = '1024'; > > > > External tables were created in Apache Doris using commands: > > CREATE EXTERNAL TABLE store_returns( > > sr_return_time_sk INT, > > sr_item_sk INT, > > ... > > sr_net_loss DECIMAL(7,2) > > ) ENGINE=HIVE > > COMMENT 'HIVE' > > PROPERTIES( > > 'hive.metastore.uris' = 'thrift://xyz.com:9085', > > 'database' = 'tpcds_1000', > > 'table' = 'store_returns' > > ); > > The table in Apache Hive is partitioned by sr_return_date_sk field. > > > > Importing directly from HDFS utilizes *259 instances on four B**Es*: > > 66 on first BE, > > 65 on second BE, > > 64 and 64 on third and fourth BEs. > > Query: > > INSERT INTO tpcds.store_returns > > SELECT NULL, * > > FROM HDFS( > > 'uri' = 'hdfs:// > > > > > lang33.ca.sbrf.ru:8020/tmp/tpcds_1000/store_returns/sr_returned_date_sk=__HIVE_DEFAULT_PARTITION__/* > > ', > > 'fs.defaultFS' = 'hdfs://lang33.ca.sbrf.ru:8020', > > 'hadoop.username' = 'hdfs', > > 'format' = 'parquet' > > ); > > > > Importing/querying Apache Hive utilizes *only 5 instances on four BEs*: > > 2 on first BE, > > 1 on second, third and fourth BEs. > > Query: > > INSERT INTO tpcds.store_returns > > SELECT > > sr_returned_date_sk, > > sr_customer_sk, > > sr_store_sk, > > sr_return_time_sk, > > sr_item_sk, > > sr_cdemo_sk, > > sr_hdemo_sk, > > sr_addr_sk, > > sr_reason_sk, > > sr_ticket_number, > > sr_return_quantity, > > sr_return_amt, > > sr_return_tax, > > sr_return_amt_inc_tax, > > sr_fee, > > sr_return_ship_cost, > > sr_refunded_cash, > > sr_reversed_charge, > > sr_store_credit, > > sr_net_loss > > FROM tpcds_10000.store_returns; > > > > How to make Apache Doris utilize more instances for importing/querying > > external Hive tables? Kindly ask for your advice about > parameters/settings > > values. > > > > Kind regards, > > Nikita > > >