Hi, Yongqiang YANG, thank you for your response!

I appended fe.conf with lines:
async_load_task_pool_size = 320
async_loading_load_task_pool_size = 320
async_pending_load_task_pool_size = 320

320 is the total amount of cores in my configuration.

I had restarted all the machines in the cluster (including FE) and tried
again. Unfortunately, it didn't help. The total amount of instances for
Hive querying/importing is 5 as before.

Do you have any advice?

Kind regards,
Nikita

пт, 10 мар. 2023 г. в 17:32, Yongqiang YANG <dataroar...@gmail.com>:

> You can adjust async_pending_load_task_pool_size = 250
> and async_loading_load_task_pool_size=250 in fe.conf and have a try.
>
> On Fri, Mar 10, 2023 at 8:49 PM Nikita Sakharin <nikitasa1...@gmail.com>
> wrote:
>
> > Hi, Apache Doris Community!
> >
> > I'm a beginner user of Doris and I'm experiencing performance issues.
> >
> > I have *4 hosts for BE: Intel Xeon 80 cores, 768GB RAM each.* Frontend
> and
> > broker were deployed on a different machine with the same configuration.
> > All backends have "*mix*" as *Node Role*.
> > Doris version is doris-1.2.2-rc01.
> > Apache Hive cluster with HDFS throughput about 1 GB/sec. Replication
> factor
> > is 3
> > TPCDS benchmark <https://github.com/databricks/spark-sql-perf> was used:
> > 330 GB of parquet data, snappy compression. Total size (with replication)
> > is 1TB.
> >
> > The default configuration of Apache Doris was used except these
> parameters:
> > ADMIN SET FRONTEND CONFIG('broker_load_default_timeout_second' =
> '21600');
> > ADMIN SET FRONTEND CONFIG('default_db_max_running_txn_num' = '320');
> > ADMIN SET FRONTEND CONFIG('default_load_parallelism' = '80');
> > ADMIN SET FRONTEND CONFIG('default_max_query_instances' = '640');
> > ADMIN SET FRONTEND CONFIG('disable_balance' = 'false');
> > ADMIN SET FRONTEND CONFIG('disable_disk_balance' = 'false');
> > ADMIN SET FRONTEND CONFIG('max_balancing_tablets' = '320');
> > ADMIN SET FRONTEND CONFIG('max_broker_concurrency' = '80');
> > ADMIN SET FRONTEND CONFIG('max_bytes_per_broker_scanner' =
> '68719476736');
> > ADMIN SET FRONTEND CONFIG('max_routine_load_job_num' = '320');
> > ADMIN SET FRONTEND CONFIG('max_routine_load_task_num_per_be' = '80');
> > ADMIN SET FRONTEND CONFIG('max_routine_load_task_concurrent_num' =
> '320');
> > ADMIN SET FRONTEND CONFIG('max_running_txn_num_per_db' = '320');
> > ADMIN SET FRONTEND CONFIG('max_scheduling_tablets' = '20480');
> > ADMIN SET FRONTEND CONFIG('min_bytes_per_broker_scanner' = '1073741824');
> > ADMIN SET FRONTEND CONFIG('remote_fragment_exec_timeout_ms' = '60000');
> > ADMIN SET FRONTEND CONFIG('tablet_create_timeout_second' = '60');
> > ADMIN SET FRONTEND CONFIG('tablet_delete_timeout_second' = '60');
> >
> > SET GLOBAL cpu_resource_limit = 80;
> > SET GLOBAL enable_exchange_node_parallel_merge = true;
> > SET GLOBAL enable_profile = true;
> > SET GLOBAL enable_vectorized_engine = true;
> > SET GLOBAL exec_mem_limit = 768G;
> > SET GLOBAL parallel_exchange_instance_num = 80;
> > SET GLOBAL parallel_fragment_exec_instance_num = 80;
> > SET GLOBAL query_timeout = 86400;
> > SET GLOBAL send_batch_parallelism = 80;
> >
> > SET PROPERTY 'max_query_instances' = '640';
> > SET PROPERTY 'max_user_connections' = '1024';
> >
> > External tables were created in Apache Doris using commands:
> > CREATE EXTERNAL TABLE store_returns(
> >     sr_return_time_sk INT,
> >     sr_item_sk INT,
> >     ...
> >     sr_net_loss DECIMAL(7,2)
> > ) ENGINE=HIVE
> > COMMENT 'HIVE'
> > PROPERTIES(
> >     'hive.metastore.uris' = 'thrift://xyz.com:9085',
> >     'database' = 'tpcds_1000',
> >     'table' = 'store_returns'
> > );
> > The table in Apache Hive is partitioned by sr_return_date_sk field.
> >
> > Importing directly from HDFS utilizes *259 instances on four B**Es*:
> > 66 on first BE,
> > 65 on second BE,
> > 64 and 64 on third and fourth BEs.
> > Query:
> > INSERT INTO tpcds.store_returns
> >     SELECT NULL, *
> >     FROM HDFS(
> >         'uri' = 'hdfs://
> >
> >
> lang33.ca.sbrf.ru:8020/tmp/tpcds_1000/store_returns/sr_returned_date_sk=__HIVE_DEFAULT_PARTITION__/*
> > ',
> >         'fs.defaultFS' = 'hdfs://lang33.ca.sbrf.ru:8020',
> >         'hadoop.username' = 'hdfs',
> >         'format' = 'parquet'
> >     );
> >
> > Importing/querying Apache Hive utilizes *only 5 instances on four BEs*:
> > 2 on first BE,
> > 1 on second, third and fourth BEs.
> > Query:
> > INSERT INTO tpcds.store_returns
> >     SELECT
> >         sr_returned_date_sk,
> >         sr_customer_sk,
> >         sr_store_sk,
> >         sr_return_time_sk,
> >         sr_item_sk,
> >         sr_cdemo_sk,
> >         sr_hdemo_sk,
> >         sr_addr_sk,
> >         sr_reason_sk,
> >         sr_ticket_number,
> >         sr_return_quantity,
> >         sr_return_amt,
> >         sr_return_tax,
> >         sr_return_amt_inc_tax,
> >         sr_fee,
> >         sr_return_ship_cost,
> >         sr_refunded_cash,
> >         sr_reversed_charge,
> >         sr_store_credit,
> >         sr_net_loss
> >     FROM tpcds_10000.store_returns;
> >
> > How to make Apache Doris utilize more instances for importing/querying
> > external Hive tables? Kindly ask for your advice about
> parameters/settings
> > values.
> >
> > Kind regards,
> > Nikita
> >
>

Reply via email to