This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-doris.git
The following commit(s) were added to refs/heads/master by this push: new 5cbb4a2317 [Improvement](docs) Update EN doc (#9228) 5cbb4a2317 is described below commit 5cbb4a2317d90a74fa010c31f8a84c07ae860686 Author: Gabriel <gabrielleeb...@gmail.com> AuthorDate: Wed Apr 27 23:22:38 2022 +0800 [Improvement](docs) Update EN doc (#9228) --- .../en/administrator-guide/block-rule/sql-block.md | 8 +- docs/en/administrator-guide/bucket-shuffle-join.md | 10 +- docs/en/administrator-guide/config/be_config.md | 256 +++--- docs/en/administrator-guide/config/fe_config.md | 904 ++++++++++----------- docs/en/administrator-guide/dynamic-partition.md | 2 +- docs/en/administrator-guide/ldap.md | 4 +- .../load-data/binlog-load-manual.md | 2 +- .../load-data/routine-load-manual.md | 2 +- .../load-data/stream-load-manual.md | 10 +- .../administrator-guide/operation/disk-capacity.md | 8 +- docs/en/administrator-guide/running-profile.md | 4 +- .../commit-format-specification.md | 6 +- .../release-and-verify/release-complete.md | 6 +- docs/en/developer-guide/be-vscode-dev.md | 4 +- docs/en/developer-guide/benchmark-tool.md | 8 +- docs/en/developer-guide/cpp-diagnostic-code.md | 2 +- docs/en/developer-guide/fe-idea-dev.md | 10 +- docs/en/developer-guide/fe-vscode-dev.md | 2 +- docs/en/extending-doris/doris-on-es.md | 4 +- docs/en/extending-doris/flink-doris-connector.md | 6 +- docs/en/extending-doris/hive-bitmap-udf.md | 2 +- docs/en/extending-doris/spark-doris-connector.md | 6 +- .../udf/java-user-defined-function.md | 2 +- .../udf/remote-user-defined-function.md | 30 +- docs/en/installing/install-deploy.md | 4 +- .../bitmap-functions/bitmap_subset_limit.md | 6 +- .../sql-functions/string-functions/bit_length.md | 2 +- .../sql-reference/sql-functions/window-function.md | 26 +- .../Account Management/SET PROPERTY.md | 2 +- .../sql-statements/Administration/ALTER SYSTEM.md | 2 +- .../sql-statements/Data Definition/ALTER TABLE.md | 2 +- .../sql-statements/Data Definition/CANCEL ALTER.md | 4 +- .../Data Definition/create-function.md | 2 +- .../Data Manipulation/BROKER LOAD.md | 18 +- .../sql-statements/Data Manipulation/EXPORT.md | 6 +- .../sql-statements/Data Manipulation/LOAD.md | 8 +- .../sql-statements/Data Manipulation/OUTFILE.md | 6 +- .../Data Manipulation/SHOW CREATE ROUTINE LOAD.md | 6 +- 38 files changed, 696 insertions(+), 696 deletions(-) diff --git a/docs/en/administrator-guide/block-rule/sql-block.md b/docs/en/administrator-guide/block-rule/sql-block.md index bef442f279..0b167ae17c 100644 --- a/docs/en/administrator-guide/block-rule/sql-block.md +++ b/docs/en/administrator-guide/block-rule/sql-block.md @@ -38,13 +38,13 @@ Support SQL block rule by user level: SQL block rule CRUD - create SQL block rule - - sql:Regex pattern,Special characters need to be translated, "NULL" by default + - sql: Regex pattern,Special characters need to be translated, "NULL" by default - sqlHash: Sql hash value, Used to match exactly, We print it in fe.audit.log, This parameter is the only choice between sql and sql, "NULL" by default - partition_num: Max number of partitions will be scanned by a scan node, 0L by default - tablet_num: Max number of tablets will be scanned by a scan node, 0L by default - cardinality: An inaccurate number of scan rows of a scan node, 0L by default - global: Whether global(all users)is in effect, false by default - - enable:Whether to enable block rule,true by default + - enable: Whether to enable block rule,true by default ```sql CREATE SQL_BLOCK_RULE test_rule PROPERTIES( @@ -70,7 +70,7 @@ CREATE SQL_BLOCK_RULE test_rule2 PROPERTIES("partition_num" = "30", "cardinality ```sql SHOW SQL_BLOCK_RULE [FOR RULE_NAME] ``` -- alter SQL block rule,Allows changes sql/sqlHash/global/enable/partition_num/tablet_num/cardinality anyone +- alter SQL block rule,Allows changes sql/sqlHash/global/enable/partition_num/tablet_num/cardinality anyone - sql and sqlHash cannot be set both. It means if sql or sqlHash is set in a rule, another property will never be allowed to be altered - sql/sqlHash and partition_num/tablet_num/cardinality cannot be set together. For example, partition_num is set in a rule, then sql or sqlHash will never be allowed to be altered. ```sql @@ -81,7 +81,7 @@ ALTER SQL_BLOCK_RULE test_rule PROPERTIES("sql"="select \\* from test_table","en ALTER SQL_BLOCK_RULE test_rule2 PROPERTIES("partition_num" = "10","tablet_num"="300","enable"="true") ``` -- drop SQL block rule,Support multiple rules, separated by `,` +- drop SQL block rule,Support multiple rules, separated by `,` ```sql DROP SQL_BLOCK_RULE test_rule1,test_rule2 ``` diff --git a/docs/en/administrator-guide/bucket-shuffle-join.md b/docs/en/administrator-guide/bucket-shuffle-join.md index 2ac58a22f4..a2edaef4fc 100644 --- a/docs/en/administrator-guide/bucket-shuffle-join.md +++ b/docs/en/administrator-guide/bucket-shuffle-join.md @@ -28,7 +28,7 @@ under the License. Bucket Shuffle Join is a new function officially added in Doris 0.14. The purpose is to provide local optimization for some join queries to reduce the time-consuming of data transmission between nodes and speed up the query. -It's design, implementation can be referred to [ISSUE 4394](https://github.com/apache/incubator-doris/issues/4394)。 +It's design, implementation can be referred to [ISSUE 4394](https://github.com/apache/incubator-doris/issues/4394). ## Noun Interpretation @@ -40,7 +40,7 @@ It's design, implementation can be referred to [ISSUE 4394](https://github.com/a ## Principle The conventional distributed join methods supported by Doris is: `Shuffle Join, Broadcast Join`. Both of these join will lead to some network overhead. -For example, there are join queries for table A and table B. the join method is hashjoin. The cost of different join types is as follows: +For example, there are join queries for table A and table B. the join method is hashjoin. The cost of different join types is as follows: * **Broadcast Join**: If table a has three executing hashjoinnodes according to the data distribution, table B needs to be sent to the three HashJoinNode. Its network overhead is `3B `, and its memory overhead is `3B`. * **Shuffle Join**: Shuffle join will distribute the data of tables A and B to the nodes of the cluster according to hash calculation, so its network overhead is `A + B` and memory overhead is `B`. @@ -50,9 +50,9 @@ The data distribution information of each Doris table is saved in FE. If the joi The picture above shows how the Bucket Shuffle Join works. The SQL query is A table join B table. The equivalent expression of join hits the data distribution column of A. According to the data distribution information of table A. Bucket Shuffle Join sends the data of table B to the corresponding data storage and calculation node of table A. The cost of Bucket Shuffle Join is as follows: -* network cost: ``` B < min(3B, A + B) ``` +* network cost: ``` B < min(3B, A + B) ``` -* memory cost: ``` B <= min(3B, B) ``` +* memory cost: ``` B <= min(3B, B) ``` Therefore, compared with Broadcast Join and Shuffle Join, Bucket shuffle join has obvious performance advantages. It reduces the time-consuming of data transmission between nodes and the memory cost of join. Compared with Doris's original join method, it has the following advantages @@ -91,7 +91,7 @@ You can use the `explain` command to check whether the join is a Bucket Shuffle | | equal join conjunct: `test`.`k1` = `baseall`.`k1` ``` -The join type indicates that the join method to be used is:`BUCKET_SHUFFLE`。 +The join type indicates that the join method to be used is:`BUCKET_SHUFFLE`. ## Planning rules of Bucket Shuffle Join diff --git a/docs/en/administrator-guide/config/be_config.md b/docs/en/administrator-guide/config/be_config.md index 56d5e58f45..aa5a4dce50 100644 --- a/docs/en/administrator-guide/config/be_config.md +++ b/docs/en/administrator-guide/config/be_config.md @@ -101,25 +101,25 @@ There are two ways to configure BE configuration items: ### `alter_tablet_worker_count` -Default:3 +Default: 3 The number of threads making schema changes ### `base_compaction_check_interval_seconds` -Default:60 (s) +Default: 60 (s) BaseCompaction thread polling interval ### `base_compaction_interval_seconds_since_last_operation` -Default:86400 +Default: 86400 One of the triggering conditions of BaseCompaction: the interval since the last BaseCompaction ### `base_compaction_num_cumulative_deltas` -Default:5 +Default: 5 One of the triggering conditions of BaseCompaction: The limit of the number of Cumulative files to be reached. After reaching this limit, BaseCompaction will be triggered @@ -150,13 +150,13 @@ Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"in ### `base_compaction_write_mbytes_per_sec` -Default:5(MB) +Default: 5(MB) Maximum disk write speed per second of BaseCompaction task ### `base_cumulative_delta_ratio` -Default:0.3 (30%) +Default: 0.3 (30%) One of the trigger conditions of BaseCompaction: Cumulative file size reaches the proportion of Base file @@ -206,7 +206,7 @@ User can set this configuration to a larger value to get better QPS performance. ### `buffer_pool_clean_pages_limit` -默认值:20G +默认值: 20G Clean up pages that may be saved by the buffer pool @@ -226,25 +226,25 @@ The maximum amount of memory available in the BE buffer pool. The buffer pool is ### `check_consistency_worker_count` -Default:1 +Default: 1 The number of worker threads to calculate the checksum of the tablet ### `chunk_reserved_bytes_limit` -Default:2147483648 +Default: 2147483648 The reserved bytes limit of Chunk Allocator is 2GB by default. Increasing this variable can improve performance, but it will get more free memory that other modules cannot use. ### `clear_transaction_task_worker_count` -Default:1 +Default: 1 Number of threads used to clean up transactions ### `clone_worker_count` -Default:3 +Default: 3 Number of threads used to perform cloning tasks @@ -258,13 +258,13 @@ This value is usually delivered by the FE to the BE by the heartbeat, no need to ### `column_dictionary_key_ratio_threshold` -Default:0 +Default: 0 The value ratio of string type, less than this ratio, using dictionary compression algorithm ### `column_dictionary_key_size_threshold` -Default:0 +Default: 0 Dictionary compression column size, less than this value using dictionary compression algorithm @@ -305,7 +305,7 @@ tablet_score = compaction_tablet_scan_frequency_factor * tablet_scan_frequency + ### `create_tablet_worker_count` -Default:3 +Default: 3 Number of worker threads for BE to create a tablet @@ -325,19 +325,19 @@ Generally it needs to be turned off. When you want to manually operate the compa ### `cumulative_compaction_budgeted_bytes` -Default:104857600 +Default: 104857600 One of the trigger conditions of BaseCompaction: Singleton file size limit, 100MB ### `cumulative_compaction_check_interval_seconds` -Default:10 (s) +Default: 10 (s) CumulativeCompaction thread polling interval ### `cumulative_compaction_skip_window_seconds` -Default:30(s) +Default: 30(s) CumulativeCompaction skips the most recently released increments to prevent compacting versions that may be queried (in case the query planning phase takes some time). Change the parameter is to set the skipped window time size @@ -419,13 +419,13 @@ In some deployment environments, the `conf/` directory may be overwritten due to ### `delete_worker_count` -Default:3 +Default: 3 Number of threads performing data deletion tasks ### `disable_mem_pools` -Default:false +Default: false Whether to disable the memory cache pool, it is not disabled by default @@ -437,13 +437,13 @@ Whether to disable the memory cache pool, it is not disabled by default ### `disk_stat_monitor_interval` -Default:5(s) +Default: 5(s) Disk status check interval ### `doris_cgroups` -Default:empty +Default: empty Cgroups assigned to doris @@ -475,7 +475,7 @@ When the concurrency cannot be improved in high concurrency scenarios, try to re ### `doris_scanner_row_num` -Default:16384 +Default: 16384 The maximum number of data rows returned by each scanning thread in a single execution @@ -493,31 +493,31 @@ The maximum number of data rows returned by each scanning thread in a single exe ### `download_low_speed_limit_kbps` -Default:50 (KB/s) +Default: 50 (KB/s) Minimum download speed ### `download_low_speed_time` -Default:300(s) +Default: 300(s) Download time limit, 300 seconds by default ### `download_worker_count` -Default:1 +Default: 1 The number of download threads, the default is 1 ### `drop_tablet_worker_count` -Default:3 +Default: 3 Number of threads to delete tablet ### `enable_metric_calculator` -Default:true +Default: true If set to true, the metric calculator will run to collect BE-related indicator information, if set to false, it will not run @@ -540,31 +540,31 @@ If set to true, the metric calculator will run to collect BE-related indicator i ### `enable_system_metrics` -Default:true +Default: true User control to turn on and off system indicators. ### `enable_token_check` -Default:true +Default: true Used for forward compatibility, will be removed later. ### `es_http_timeout_ms` -Default:5000 (ms) +Default: 5000 (ms) The timeout period for connecting to ES via http, the default is 5 seconds. ### `es_scroll_keepalive` -Default:5m +Default: 5m es scroll Keeplive hold time, the default is 5 minutes ### `etl_thread_pool_queue_size` -Default:256 +Default: 256 The size of the ETL thread pool @@ -578,20 +578,20 @@ The size of the ETL thread pool ### `file_descriptor_cache_capacity` -Default:32768 +Default: 32768 File handle cache capacity, 32768 file handles are cached by default. ### `cache_clean_interval` -Default:1800(s) +Default: 1800(s) File handle cache cleaning interval, used to clean up file handles that have not been used for a long time. Also the clean interval of Segment Cache. ### `flush_thread_num_per_store` -Default:2 +Default: 2 The number of threads used to refresh the memory table per store @@ -599,17 +599,17 @@ The number of threads used to refresh the memory table per store ### `fragment_pool_queue_size` -Default:2048 +Default: 2048 The upper limit of query requests that can be processed on a single node ### `fragment_pool_thread_num_min` -Default:64 +Default: 64 ### `fragment_pool_thread_num_max` -Default:256 +Default: 256 The above two parameters are to set the number of query threads. By default, a minimum of 64 threads will be started, subsequent query requests will dynamically create threads, and a maximum of 256 threads will be created. @@ -626,7 +626,7 @@ The above two parameters are to set the number of query threads. By default, a m ### `ignore_broken_disk` -Default:false +Default: false When BE start, If there is a broken disk, BE process will exit by default.Otherwise, we will ignore the broken disk @@ -662,37 +662,37 @@ When configured as true, the program will run normally and ignore this error. In ### inc_rowset_expired_sec -Default:1800 (s) +Default: 1800 (s) Import activated data, storage engine retention time, used for incremental cloning ### `index_stream_cache_capacity` -Default:10737418240 +Default: 10737418240 BloomFilter/Min/Max and other statistical information cache capacity ### `kafka_broker_version_fallback` -Default:0.10.0 +Default: 0.10.0 If the dependent Kafka version is lower than the Kafka client version that routine load depends on, the value set by the fallback version kafka_broker_version_fallback will be used, and the valid values are: 0.9.0, 0.8.2, 0.8.1, 0.8.0. ### `load_data_reserve_hours` -Default:4(hour) +Default: 4(hour) Used for mini load. The mini load data file will be deleted after this time ### `load_error_log_reserve_hours` -Default:48 (hour) +Default: 48 (hour) The load error log will be deleted after this time ### `load_process_max_memory_limit_bytes` -Default:107374182400 +Default: 107374182400 The upper limit of memory occupied by all imported threads on a single node, default value: 100G @@ -700,7 +700,7 @@ Set these default values very large, because we don't want to affect load perfor ### `load_process_max_memory_limit_percent` -Default:80 (%) +Default: 80 (%) The percentage of the upper memory limit occupied by all imported threads on a single node, the default is 80% @@ -708,25 +708,25 @@ Set these default values very large, because we don't want to affect load perfor ### `log_buffer_level` -Default:empty +Default: empty The log flushing strategy is kept in memory by default ### `madvise_huge_pages` -Default:false +Default: false Whether to use linux memory huge pages, not enabled by default ### `make_snapshot_worker_count` -Default:5 +Default: 5 Number of threads making snapshots ### `max_client_cache_size_per_host` -Default:10 +Default: 10 The maximum number of client caches per host. There are multiple client caches in BE, but currently we use the same cache size configuration. If necessary, use different configurations to set up different client-side caches @@ -738,43 +738,43 @@ The maximum number of client caches per host. There are multiple client caches i ### `max_consumer_num_per_group` -Default:3 +Default: 3 The maximum number of consumers in a data consumer group, used for routine load ### `min_cumulative_compaction_num_singleton_deltas` -Default:5 +Default: 5 Cumulative compaction strategy: the minimum number of incremental files ### `max_cumulative_compaction_num_singleton_deltas` -Default:1000 +Default: 1000 Cumulative compaction strategy: the maximum number of incremental files ### `max_download_speed_kbps` -Default:50000 (KB/s) +Default: 50000 (KB/s) Maximum download speed limit ### `max_free_io_buffers` -Default:128 +Default: 128 For each io buffer size, the maximum number of buffers that IoMgr will reserve ranges from 1024B to 8MB buffers, up to about 2GB buffers. ### `max_garbage_sweep_interval` -Default:3600 +Default: 3600 The maximum interval for disk garbage cleaning, the default is one hour ### `max_memory_sink_batch_count` -Default:20 +Default: 20 The maximum external scan cache batch count, which means that the cache max_memory_cache_batch_count * batch_size row, the default is 20, and the default value of batch_size is 1024, which means that 20 * 1024 rows will be cached @@ -800,7 +800,7 @@ The maximum external scan cache batch count, which means that the cache max_memo ### `max_runnings_transactions_per_txn_map` -Default:100 +Default: 100 Max number of txns for every txn_partition_map in txn manager, this is a self protection to avoid too many txns saving in manager @@ -812,7 +812,7 @@ Max number of txns for every txn_partition_map in txn manager, this is a self pr ### `max_tablet_num_per_shard` -Default:1024 +Default: 1024 The number of sliced tablets, plan the layout of the tablet, and avoid too many tablet subdirectories in the repeated directory @@ -830,31 +830,31 @@ The number of sliced tablets, plan the layout of the tablet, and avoid too many ### `memory_limitation_per_thread_for_schema_change` -Default:2 (G) +Default: 2 (G) Maximum memory allowed for a single schema change task ### `memory_maintenance_sleep_time_s` -Default:10 +Default: 10 Sleep time (in seconds) between memory maintenance iterations ### `memory_max_alignment` -Default:16 +Default: 16 Maximum alignment memory ### `read_size` -Default:8388608 +Default: 8388608 The read size is the read size sent to the os. There is a trade-off between latency and the whole process, getting to keep the disk busy but not introducing seeks. For 8 MB reads, random io and sequential io have similar performance ### `min_buffer_size` -Default:1024 +Default: 1024 Minimum read buffer size (in bytes) @@ -873,19 +873,19 @@ Minimum read buffer size (in bytes) ### `min_file_descriptor_number` -Default:60000 +Default: 60000 The lower limit required by the file handle limit of the BE process ### `min_garbage_sweep_interval` -Default:180 +Default: 180 The minimum interval between disk garbage cleaning, time seconds ### `mmap_buffers` -Default:false +Default: false Whether to use mmap to allocate memory, not used by default @@ -897,67 +897,67 @@ Whether to use mmap to allocate memory, not used by default ### `num_disks` -Defalut:0 +Defalut: 0 Control the number of disks on the machine. If it is 0, it comes from the system settings ### `num_threads_per_core` -Default:3 +Default: 3 Control the number of threads that each core runs. Usually choose 2 times or 3 times the number of cores. This keeps the core busy without causing excessive jitter ### `num_threads_per_disk` -Default:0 +Default: 0 The maximum number of threads per disk is also the maximum queue depth of each disk ### `number_tablet_writer_threads` -Default:16 +Default: 16 Number of tablet write threads ### `path_gc_check` -Default:true +Default: true Whether to enable the recycle scan data thread check, it is enabled by default ### `path_gc_check_interval_second` -Default:86400 +Default: 86400 Recycle scan data thread check interval, in seconds ### `path_gc_check_step` -Default:1000 +Default: 1000 ### `path_gc_check_step_interval_ms` -Default:10 (ms) +Default: 10 (ms) ### `path_scan_interval_second` -Default:86400 +Default: 86400 ### `pending_data_expire_time_sec` -Default:1800 +Default: 1800 The maximum duration of unvalidated data retained by the storage engine, the default unit: seconds ### `periodic_counter_update_period_ms` -Default:500 +Default: 500 Update rate counter and sampling counter cycle, default unit: milliseconds ### `plugin_path` -Default:${DORIS_HOME}/plugin +Default: ${DORIS_HOME}/plugin pliugin path @@ -969,43 +969,43 @@ pliugin path ### `pprof_profile_dir` -Default :${DORIS_HOME}/log +Default : ${DORIS_HOME}/log pprof profile save directory ### `priority_networks` -Default:empty +Default: empty Declare a selection strategy for those servers with many IPs. Note that at most one ip should match this list. This is a semicolon-separated list in CIDR notation, such as 10.10.10.0/24. If there is no IP matching this rule, one will be randomly selected ### `priority_queue_remaining_tasks_increased_frequency` -Default:512 +Default: 512 the increased frequency of priority for remaining tasks in BlockingPriorityQueue ### `publish_version_worker_count` -Default:8 +Default: 8 the count of thread to publish version ### `pull_load_task_dir` -Default:${DORIS_HOME}/var/pull_load +Default: ${DORIS_HOME}/var/pull_load Pull the directory of the laod task ### `push_worker_count_high_priority` -Default:3 +Default: 3 Import the number of threads for processing HIGH priority tasks ### `push_worker_count_normal_priority` -Default:3 +Default: 3 Import the number of threads for processing NORMAL priority tasks @@ -1024,43 +1024,43 @@ Import the number of threads for processing NORMAL priority tasks ### `release_snapshot_worker_count` -Default:5 +Default: 5 Number of threads releasing snapshots ### `report_disk_state_interval_seconds` -Default:60 +Default: 60 The interval time for the agent to report the disk status to FE, unit (seconds) ### `report_tablet_interval_seconds` -Default:60 +Default: 60 The interval time for the agent to report the olap table to the FE, in seconds ### `report_task_interval_seconds` -Default:10 +Default: 10 The interval time for the agent to report the task signature to FE, unit (seconds) ### `result_buffer_cancelled_interval_time` -Default:300 +Default: 300 Result buffer cancellation time (unit: second) ### `routine_load_thread_pool_size` -Default:10 +Default: 10 The thread pool size of the routine load task. This should be greater than the FE configuration'max_concurrent_task_num_per_be' (default 5) ### `row_nums_check` -Default:true +Default: true Check row nums for BE/CE and schema change. true is open, false is closed @@ -1073,7 +1073,7 @@ Check row nums for BE/CE and schema change. true is open, false is closed ### `scan_context_gc_interval_min` -Default:5 +Default: 5 This configuration is used for the context gc thread scheduling cycle. Note: The unit is minutes, and the default is 5 minutes @@ -1096,43 +1096,43 @@ This configuration is used for the context gc thread scheduling cycle. Note: The ### `small_file_dir` -Default:${DORIS_HOME}/lib/small_file/ +Default: ${DORIS_HOME}/lib/small_file/ Directory for saving files downloaded by SmallFileMgr ### `snapshot_expire_time_sec` -Default:172800 +Default: 172800 Snapshot file cleaning interval, default value: 48 hours ### `status_report_interval` -Default:5 +Default: 5 Interval between profile reports; unit: seconds ### `storage_flood_stage_left_capacity_bytes` -Default:1073741824 +Default: 1073741824 -The min bytes that should be left of a data dir,default value:1G +The min bytes that should be left of a data dir,default value:1G ### `storage_flood_stage_usage_percent` -Default:95 (95%) +Default: 95 (95%) The storage_flood_stage_usage_percent and storage_flood_stage_left_capacity_bytes configurations limit the maximum usage of the capacity of the data directory. ### `storage_medium_migrate_count` -Default:1 +Default: 1 the count of thread to clone ### `storage_page_cache_limit` -Default:20% +Default: 20% Cache for storage page size @@ -1155,8 +1155,8 @@ Cache for storage page size eg.2: `storage_root_path=/home/disk1/doris,medium:hdd,capacity:50;/home/disk2/doris,medium:ssd,capacity:50` - * 1./home/disk1/doris,medium:hdd,capacity:10,capacity limit is 10GB, HDD; - * 2./home/disk2/doris,medium:ssd,capacity:50,capacity limit is 50GB, SSD; + * 1./home/disk1/doris,medium:hdd,capacity:10,capacity limit is 10GB, HDD; + * 2./home/disk2/doris,medium:ssd,capacity:50,capacity limit is 50GB, SSD; * Default: ${DORIS_HOME} @@ -1189,13 +1189,13 @@ Some data formats, such as JSON, cannot be split. Doris must read all the data i ### `streaming_load_rpc_max_alive_time_sec` -Default:1200 +Default: 1200 The lifetime of TabletsChannel. If the channel does not receive any data at this time, the channel will be deleted, unit: second ### `sync_tablet_meta` -Default:false +Default: false Whether the storage engine opens sync and keeps it to the disk @@ -1213,37 +1213,37 @@ Log Level: INFO < WARNING < ERROR < FATAL ### `sys_log_roll_mode` -Default:SIZE-MB-1024 +Default: SIZE-MB-1024 The size of the log split, one log file is split every 1G ### `sys_log_roll_num` -Default:10 +Default: 10 Number of log files kept ### `sys_log_verbose_level` -Defaultl:10 +Defaultl: 10 Log display level, used to control the log output at the beginning of VLOG in the code ### `sys_log_verbose_modules` -Default:empty +Default: empty Log printing module, writing olap will only print the log under the olap module ### `tablet_map_shard_size` -Default:1 +Default: 1 tablet_map_lock fragment size, the value is 2^n, n=0,1,2,3,4, this is for better tablet management ### `tablet_meta_checkpoint_min_interval_secs` -Default:600(s) +Default: 600(s) The polling interval of the TabletMeta Checkpoint thread @@ -1257,7 +1257,7 @@ The polling interval of the TabletMeta Checkpoint thread ### `tablet_stat_cache_update_interval_second` -默认值:10 +默认值: 10 The minimum number of Rowsets for TabletMeta Checkpoint @@ -1271,7 +1271,7 @@ When writing is too frequent and the disk time is insufficient, you can configur ### `tablet_writer_open_rpc_timeout_sec` -Default:300 +Default: 300 Update interval of tablet state cache, unit: second @@ -1285,7 +1285,7 @@ When meet '[E1011]The server is overcrowded' error, you can tune the configurati ### `tc_free_memory_rate` -Default:20 (%) +Default: 20 (%) Available memory, value range: [0-100] @@ -1299,7 +1299,7 @@ If the system is found to be in a high-stress scenario and a large number of thr ### `tc_use_memory_min` -Default:10737418240 +Default: 10737418240 The minimum memory of TCmalloc, when the memory used is less than this, it is not returned to the operating system @@ -1311,13 +1311,13 @@ The minimum memory of TCmalloc, when the memory used is less than this, it is no ### `thrift_connect_timeout_seconds` -Default:3 +Default: 3 The default thrift client connection timeout time (unit: seconds) ### `thrift_rpc_timeout_ms` -Default:5000 +Default: 5000 thrift default timeout time, default: 5 seconds @@ -1338,43 +1338,43 @@ If the parameter is `THREAD_POOL`, the model is a blocking I/O model. ### `trash_file_expire_time_sec` -Default:259200 +Default: 259200 The interval for cleaning the recycle bin is 72 hours. When the disk space is insufficient, the file retention period under trash may not comply with this parameter ### `txn_commit_rpc_timeout_ms` -Default:10000 +Default: 10000 txn submit rpc timeout, the default is 10 seconds ### `txn_map_shard_size` -Default:128 +Default: 128 txn_map_lock fragment size, the value is 2^n, n=0,1,2,3,4. This is an enhancement to improve the performance of managing txn ### `txn_shard_size` -Default:1024 +Default: 1024 txn_lock shard size, the value is 2^n, n=0,1,2,3,4, this is an enhancement function that can improve the performance of submitting and publishing txn ### `unused_rowset_monitor_interval` -Default:30 +Default: 30 Time interval for clearing expired Rowset, unit: second ### `upload_worker_count` -Default:1 +Default: 1 Maximum number of threads for uploading files ### `use_mmap_allocate_chunk` -Default:false +Default: false Whether to use mmap to allocate blocks. If you enable this feature, it is best to increase the value of vm.max_map_count, its default value is 65530. You can use "sysctl -w vm.max_map_count=262144" or "echo 262144> /proc/sys/vm/" to operate max_map_count as root. When this setting is true, you must set chunk_reserved_bytes_limit to a relatively low Big number, otherwise the performance is very very bad @@ -1386,7 +1386,7 @@ udf function directory ### `webserver_num_workers` -Default:48 +Default: 48 Webserver default number of worker threads @@ -1398,7 +1398,7 @@ Webserver default number of worker threads ### `write_buffer_size` -Default:104857600 +Default: 104857600 The size of the buffer before flashing @@ -1486,7 +1486,7 @@ The default value is currently only an empirical value, and may need to be modif ### `auto_refresh_brpc_channel` * Type: bool -* Description: When obtaining a brpc connection, judge the availability of the connection through hand_shake rpc, and re-establish the connection if it is not available 。 +* Description: When obtaining a brpc connection, judge the availability of the connection through hand_shake rpc, and re-establish the connection if it is not available . * Default value: false ### `high_priority_flush_thread_num_per_store` diff --git a/docs/en/administrator-guide/config/fe_config.md b/docs/en/administrator-guide/config/fe_config.md index 0d6382a6c2..f34658653c 100644 --- a/docs/en/administrator-guide/config/fe_config.md +++ b/docs/en/administrator-guide/config/fe_config.md @@ -124,17 +124,17 @@ There are two ways to configure FE configuration items: ### max_dynamic_partition_num -Default:500 +Default: 500 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Used to limit the maximum number of partitions that can be created when creating a dynamic partition table, to avoid creating too many partitions at one time. The number is determined by "start" and "end" in the dynamic partition parameters.. ### grpc_max_message_size_bytes -Default:1G +Default: 1G Used to set the initial flow window size of the GRPC client channel, and also used to max message size. When the result set is large, you may need to increase this value. @@ -152,49 +152,49 @@ Used to set maximal number of replication per tablet. ### enable_outfile_to_local -Default:false +Default: false Whether to allow the outfile function to export the results to the local disk. ### enable_access_file_without_broker -Default:false +Default: false -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true This config is used to try skip broker when access bos or other cloud storage via broker ### enable_bdbje_debug_mode -Default:false +Default: false If set to true, FE will be started in BDBJE debug mode ### enable_alpha_rowset -Default:false +Default: false Whether to support the creation of alpha rowset tables. The default is false and should only be used in emergency situations, this config should be remove in some future version ### enable_http_server_v2 -Default:The default is true after the official 0.14.0 version is released, and the default is false before +Default: The default is true after the official 0.14.0 version is released, and the default is false before HTTP Server V2 is implemented by SpringBoot. It uses an architecture that separates the front and back ends. Only when httpv2 is enabled can users use the new front-end UI interface. ### jetty_server_acceptors -Default:2 +Default: 2 ### jetty_server_selectors -Default:4 +Default: 4 ### jetty_server_workers -Default:0 +Default: 0 With the above three parameters, Jetty's thread architecture model is very simple, divided into acceptors, selectors and workers three thread pools. Acceptors are responsible for accepting new connections, and then hand them over to selectors to process the unpacking of the HTTP message protocol, and finally workers process the request. The first two thread pools adopt a non-blocking model, and one thread can handle the read and write of many sockets, so the number of thread pools is small. @@ -212,13 +212,13 @@ The maximum number of threads in the Jetty thread pool, the default is 400 ### jetty_server_max_http_post_size -Default:100 * 1024 * 1024 (100MB) +Default: 100 * 1024 * 1024 (100MB) This is the maximum number of bytes of the file uploaded by the put or post method, the default value: 100MB ### **`disable_mini_load`** -Whether to disable the mini load data import method, the default:true (Disabled) +Whether to disable the mini load data import method, the default: true (Disabled) ### frontend_address @@ -226,21 +226,21 @@ Status: Deprecated, not recommended use. This parameter may be deleted later Typ ### default_max_filter_ratio -Default:0 +Default: 0 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Maximum percentage of data that can be filtered (due to reasons such as data is irregularly) , The default value is 0. ### default_db_data_quota_bytes -Default:1PB +Default: 1PB -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Used to set the default database data quota size. To set the quota size of a single database, you can use: @@ -248,16 +248,16 @@ Used to set the default database data quota size. To set the quota size of a sin Set the database data quota, the unit is:B/K/KB/M/MB/G/GB/T/TB/P/PB ALTER DATABASE db_name SET DATA QUOTA quota; View configuration -show data (Detail:HELP SHOW DATA) +show data (Detail: HELP SHOW DATA) ``` ### default_db_replica_quota_size Default: 1073741824 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Used to set the default database replica quota. To set the quota size of a single database, you can use: @@ -265,26 +265,26 @@ Used to set the default database replica quota. To set the quota size of a singl Set the database replica quota ALTER DATABASE db_name SET REPLICA QUOTA quota; View configuration -show data (Detail:HELP SHOW DATA) +show data (Detail: HELP SHOW DATA) ``` ### enable_batch_delete_by_default -Default:false +Default: false -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Whether to add a delete sign column when create unique table ### recover_with_empty_tablet -Default:false +Default: false -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true In some very special circumstances, such as code bugs, or human misoperation, etc., all replicas of some tablets may be lost. In this case, the data has been substantially lost. However, in some scenarios, the business still hopes to ensure that the query will not report errors even if there is data loss, and reduce the perception of the user layer. At this point, we can use the blank Tablet to fill the missing replica to ensure that the query can be executed normally. @@ -292,41 +292,41 @@ Set to true so that Doris will automatically use blank replicas to fill tablets ### max_allowed_in_element_num_of_delete -Default:1024 +Default: 1024 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true This configuration is used to limit element num of InPredicate in delete statement. ### cache_result_max_row_count -Default:3000 +Default: 3000 -IsMutable:true +IsMutable: true -MasterOnly:false +MasterOnly: false In order to avoid occupying too much memory, the maximum number of rows that can be cached is 2000 by default. If this threshold is exceeded, the cache cannot be set ### cache_last_version_interval_second -Default:900 +Default: 900 -IsMutable:true +IsMutable: true -MasterOnly:false +MasterOnly: false The time interval of the latest partitioned version of the table refers to the time interval between the data update and the current version. It is generally set to 900 seconds, which distinguishes offline and real-time import ### cache_enable_partition_mode -Default:true +Default: true -IsMutable:true +IsMutable: true -MasterOnly:false +MasterOnly: false When this switch is turned on, the query result set will be cached according to the partition. If the interval between the query table partition time and the query time is less than cache_last_version_interval_second, the result set will be cached according to the partition. @@ -334,11 +334,11 @@ Part of the data will be obtained from the cache and some data from the disk whe ### cache_enable_sql_mode -Default:true +Default: true -IsMutable:true +IsMutable: true -MasterOnly:false +MasterOnly: false If this switch is turned on, the SQL query result set will be cached. If the interval between the last visit version time in all partitions of all tables in the query is greater than cache_last_version_interval_second, and the result set is less than cache_result_max_row_count, the result set will be cached, and the next same SQL will hit the cache @@ -351,11 +351,11 @@ If set to true, fe will enable sql result caching. This option is suitable for o ### min_clone_task_timeout_sec 和 max_clone_task_timeout_sec -Default:Minimum 3 minutes, maximum two hours +Default: Minimum 3 minutes, maximum two hours -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Type: long Description: Used to control the maximum timeout of a clone task. The unit is second. Default value: 7200 Dynamic modification: yes @@ -363,11 +363,11 @@ Can cooperate with `mix_clone_task_timeout_sec` to control the maximum and minim ### agent_task_resend_wait_time_ms -Default:5000 +Default: 5000 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true This configuration will decide whether to resend agent task when create_time for agent_task is set, only when current_time - create_time > agent_task_resend_wait_time_ms can ReportHandler do resend agent task. @@ -379,41 +379,41 @@ But at the same time, it will cause the submission of failed or failed execution ### enable_odbc_table -Default:false +Default: false -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Whether to enable the ODBC table, it is not enabled by default. You need to manually configure it when you use it. This parameter can be set by: ADMIN SET FRONTEND CONFIG("key"="value") ### enable_spark_load -Default:false +Default: false -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Whether to enable spark load temporarily, it is not enabled by default ### disable_storage_medium_check -Default:false +Default: false -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true If disable_storage_medium_check is true, ReportHandler would not check tablet's storage medium and disable storage cool down function, the default value is false. You can set the value true when you don't care what the storage medium of the tablet is. ### drop_backend_after_decommission -Default:false +Default: false -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true 1. This configuration is used to control whether the system drops the BE after successfully decommissioning the BE. If true, the BE node will be deleted after the BE is successfully offline. If false, after the BE successfully goes offline, the BE will remain in the DECOMMISSION state, but will not be dropped. @@ -426,31 +426,31 @@ MasterOnly:true ### period_of_auto_resume_min -Default:5 (s) +Default: 5 (s) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Automatically restore the cycle of Routine load ### max_tolerable_backend_down_num -Default:0 +Default: 0 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true As long as one BE is down, Routine Load cannot be automatically restored ### enable_materialized_view -Default:true +Default: true -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true This configuration is used to turn on and off the creation of materialized views. If set to true, the function to create a materialized view is enabled. The user can create a materialized view through the `CREATE MATERIALIZED VIEW` command. If set to false, materialized views cannot be created. @@ -460,47 +460,47 @@ This variable is a dynamic configuration, and users can modify the configuration ### check_java_version -Default:true +Default: true Doris will check whether the compiled and run Java versions are compatible, if not, it will throw a Java version mismatch exception message and terminate the startup ### max_running_rollup_job_num_per_table -Default:1 +Default: 1 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Control the concurrency limit of Rollup jobs ### dynamic_partition_enable -Default:true +Default: true -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Whether to enable dynamic partition, enabled by default ### dynamic_partition_check_interval_seconds -Default:600 (s) +Default: 600 (s) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Decide how often to check dynamic partition ### disable_cluster_feature -Default:true +Default: true -IsMutable:true +IsMutable: true -The multi cluster feature will be deprecated in version 0.12 ,set this config to true will disable all operations related to cluster feature, include: +The multi cluster feature will be deprecated in version 0.12 ,set this config to true will disable all operations related to cluster feature, include: create/drop cluster add free backend/add backend to cluster/decommission cluster balance change the backends num of cluster @@ -508,31 +508,31 @@ The multi cluster feature will be deprecated in version 0.12 ,set this config ### force_do_metadata_checkpoint -Default:false +Default: false -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true If set to true, the checkpoint thread will make the checkpoint regardless of the jvm memory used percent ### metadata_checkpoint_memory_threshold -Default:60 (60%) +Default: 60 (60%) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true If the jvm memory used percent(heap or old mem pool) exceed this threshold, checkpoint thread will not work to avoid OOM. ### max_distribution_pruner_recursion_depth -Default:100 +Default: 100 -IsMutable:true +IsMutable: true -MasterOnly:false +MasterOnly: false This will limit the max recursion depth of hash distribution pruner. eg: where a in (5 elements) and b in (4 elements) and c in (3 elements) and d in (2 elements). @@ -548,73 +548,73 @@ This configuration is mainly used to control the number of backup/restore tasks ### using_old_load_usage_pattern -Default:false +Default: false -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true If set to true, the insert stmt with processing error will still return a label to user. And user can use this label to check the load job's status. The default value is false, which means if insert operation encounter errors, exception will be thrown to user client directly without load label. ### small_file_dir -Default:DORIS_HOME_DIR/small_files +Default: DORIS_HOME_DIR/small_files Save small files ### max_small_file_size_bytes -Default:1M +Default: 1M -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true The max size of a single file store in SmallFileMgr ### max_small_file_number -Default:100 +Default: 100 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true The max number of files store in SmallFileMgr ### max_routine_load_task_num_per_be -Default:5 +Default: 5 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true the max concurrent routine load task num per BE. This is to limit the num of routine load tasks sending to a BE, and it should also less than BE config 'routine_load_thread_pool_size'(default 10), which is the routine load task thread pool size on BE. ### max_routine_load_task_concurrent_num -Default:5 +Default: 5 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true the max concurrent routine load task num of a single routine load job ### max_routine_load_job_num -Default:100 +Default: 100 the max routine load job num, including NEED_SCHEDULED, RUNNING, PAUSE ### max_running_txn_num_per_db -Default:100 +Default: 100 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true This configuration is mainly used to control the number of concurrent load jobs of the same database. @@ -630,17 +630,17 @@ Generally it is not recommended to increase this configuration value. An excessi ### enable_metric_calculator -Default:true +Default: true If set to true, metric collector will be run as a daemon timer to collect metrics at fix interval ### report_queue_size -Default: 100 +Default: 100 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true This threshold is to avoid piling up too many report task in FE, which may cause OOM exception. In some large Doris cluster, eg: 100 Backends with ten million replicas, a tablet report may cost several seconds after some modification of metadata(drop partition, etc..). And one Backend will report tablets info every 1 min, so unlimited receiving reports is unacceptable. we will optimize the processing speed of tablet report in future, but now, just discard the report if queue size exce [...] Some online time cost: @@ -651,85 +651,85 @@ MasterOnly:true ### partition_rebalance_max_moves_num_per_selection -Default:10 +Default: 10 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true -Valid only if use PartitionRebalancer, +Valid only if use PartitionRebalancer, ### partition_rebalance_move_expire_after_access -Default:600 (s) +Default: 600 (s) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Valid only if use PartitionRebalancer. If this changed, cached moves will be cleared ### tablet_rebalancer_type -Default:BeLoad +Default: BeLoad -MasterOnly:true +MasterOnly: true Rebalancer type(ignore case): BeLoad, Partition. If type parse failed, use BeLoad as default ### max_balancing_tablets -Default:100 +Default: 100 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true if the number of balancing tablets in TabletScheduler exceed max_balancing_tablets, no more balance check ### max_scheduling_tablets -Default:2000 +Default: 2000 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true if the number of scheduled tablets in TabletScheduler exceed max_scheduling_tablets skip checking. ### disable_balance -Default:false +Default: false -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true if set to true, TabletScheduler will not do balance. ### balance_load_score_threshold -Default:0.1 (10%) +Default: 0.1 (10%) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true the threshold of cluster balance score, if a backend's load score is 10% lower than average score, this backend will be marked as LOW load, if load score is 10% higher than average score, HIGH load will be marked ### schedule_slot_num_per_path -Default:2 +Default: 2 the default slot number per path in tablet scheduler , remove this config and dynamically adjust it by clone task statistic ### tablet_repair_delay_factor_second -Default:60 (s) +Default: 60 (s) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true the factor of delay time before deciding to repair tablet. if priority is VERY_HIGH, repair it immediately. @@ -739,27 +739,27 @@ the factor of delay time before deciding to repair tablet. if priority is VERY_ ### es_state_sync_interval_second -Default:10 +Default: 10 fe will call es api to get es index shard info every es_state_sync_interval_secs ### disable_hadoop_load -Default:false +Default: false -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Load using hadoop cluster will be deprecated in future. Set to true to disable this kind of load. ### db_used_data_quota_update_interval_secs -Default:300 (s) +Default: 300 (s) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true For better data load performance, in the check of whether the amount of data used by the database before data load exceeds the quota, we do not calculate the amount of data already used by the database in real time, but obtain the periodically updated value of the daemon thread. @@ -767,11 +767,11 @@ This configuration is used to set the time interval for updating the value of th ### disable_load_job -Default:false +Default: false -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true if this is set to true @@ -781,33 +781,33 @@ if this is set to true ### catalog_try_lock_timeout_ms -Default:5000 (ms) +Default: 5000 (ms) -IsMutable:true +IsMutable: true The tryLock timeout configuration of catalog lock. Normally it does not need to change, unless you need to test something. ### max_query_retry_time -Default:1 +Default: 1 -IsMutable:true +IsMutable: true The number of query retries. A query may retry if we encounter RPC exception and no result has been sent to user. You may reduce this number to avoid Avalanche disaster ### remote_fragment_exec_timeout_ms -Default:5000 (ms) +Default: 5000 (ms) -IsMutable:true +IsMutable: true -The timeout of executing async remote fragment. In normal case, the async remote fragment will be executed in a short time. If system are under high load condition,try to set this timeout longer. +The timeout of executing async remote fragment. In normal case, the async remote fragment will be executed in a short time. If system are under high load condition,try to set this timeout longer. ### enable_local_replica_selection -Default:false +Default: false -IsMutable:true +IsMutable: true If set to true, Planner will try to select replica of tablet on same host as this Frontend. This may reduce network transmission in following case: @@ -818,63 +818,63 @@ If set to true, Planner will try to select replica of tablet on same host as thi ### enable_local_replica_selection_fallback -Default:false +Default: false -IsMutable:true +IsMutable: true Used with enable_local_replica_selection. If the local replicas is not available, fallback to the nonlocal replicas. ### max_unfinished_load_job -Default:1000 +Default: 1000 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Max number of load jobs, include PENDING、ETL、LOADING、QUORUM_FINISHED. If exceed this number, load job is not allowed to be submitted ### max_bytes_per_broker_scanner -Default:3 * 1024 * 1024 * 1024L (3G) +Default: 3 * 1024 * 1024 * 1024L (3G) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Max bytes a broker scanner can process in one broker load job. Commonly, each Backends has one broker scanner. ### enable_auth_check -Default:true +Default: true if set to false, auth check will be disable, in case some goes wrong with the new privilege system. ### tablet_stat_update_interval_second -Default:300,(5min) +Default: 300,(5min) update interval of tablet stat , All frontends will get tablet stat from all backends at each interval ### storage_flood_stage_usage_percent -Default:95 (95%) +Default: 95 (95%) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true ### storage_flood_stage_left_capacity_bytes -Default: +Default: storage_flood_stage_usage_percent : 95 (95%) storage_flood_stage_left_capacity_bytes : 1 * 1024 * 1024 * 1024 (1GB) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true If capacity of disk reach the 'storage_flood_stage_usage_percent' and 'storage_flood_stage_left_capacity_bytes', the following operation will be rejected: @@ -883,59 +883,59 @@ If capacity of disk reach the 'storage_flood_stage_usage_percent' and 'storage_ ### storage_high_watermark_usage_percent -Default:85 (85%) +Default: 85 (85%) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true ### storage_min_left_capacity_bytes -Default: 2 * 1024 * 1024 * 1024 (2GB) +Default: 2 * 1024 * 1024 * 1024 (2GB) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true 'storage_high_watermark_usage_percent' limit the max capacity usage percent of a Backend storage path. 'storage_min_left_capacity_bytes' limit the minimum left capacity of a Backend storage path. If both limitations are reached, this storage path can not be chose as tablet balance destination. But for tablet recovery, we may exceed these limit for keeping data integrity as much as possible. ### backup_job_default_timeout_ms -Default:86400 * 1000 (1day) +Default: 86400 * 1000 (1day) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true default timeout of backup job ### with_k8s_certs -Default:false +Default: false If use k8s deploy manager locally, set this to true and prepare the certs files ### dpp_hadoop_client_path -Default:/lib/hadoop-client/hadoop/bin/hadoop +Default: /lib/hadoop-client/hadoop/bin/hadoop ### dpp_bytes_per_reduce -Default:100 * 1024 * 1024L; // 100M +Default: 100 * 1024 * 1024L; // 100M ### dpp_default_cluster -Default:palo-dpp +Default: palo-dpp ### dpp_default_config_str -Default:{ +Default: { hadoop_configs : 'mapred.job.priority=NORMAL;mapred.job.map.capacity=50;mapred.job.reduce.capacity=50;mapred.hce.replace.streaming=false;abaci.long.stored.job=true;dce.shuffle.enable=false;dfs.client.authserver.force_stop=true;dfs.client.auth.method=0' } ### dpp_config_str -Default:{ +Default: { palo-dpp : { hadoop_palo_path : '/dir', hadoop_configs : 'fs.default.name=hdfs://host:port;mapred.job.tracker=host:port;hadoop.job.ugi=user,password' @@ -944,7 +944,7 @@ Default:{ ### enable_deploy_manager -Default:disable +Default: disable Set to true if you deploy Palo using thirdparty deploy manager Valid options are: @@ -955,47 +955,47 @@ Default:disable ### enable_token_check -Default:true +Default: true For forward compatibility, will be removed later. check token when download image file. ### expr_depth_limit -Default:3000 +Default: 3000 -IsMutable:true +IsMutable: true Limit on the depth of an expr tree. Exceed this limit may cause long analysis time while holding db read lock. Do not set this if you know what you are doing ### expr_children_limit -Default:10000 +Default: 10000 -IsMutable:true +IsMutable: true Limit on the number of expr children of an expr tree. Exceed this limit may cause long analysis time while holding database read lock. ### proxy_auth_magic_prefix -Default:x@8 +Default: x@8 ### proxy_auth_enable -Default:false +Default: false ### meta_publish_timeout_ms -Default:1000 (ms) +Default: 1000 (ms) The default user resource publishing timeout ### disable_colocate_balance -Default:false +Default: false -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true This configs can set to true to disable the automatic colocate tables's relocate and balance. If 'disable_colocate_balance' is set to true, ColocateTableBalancer will not relocate and balance colocate tables. **Attention**: @@ -1006,87 +1006,87 @@ This configs can set to true to disable the automatic colocate tables's relocate ### query_colocate_join_memory_limit_penalty_factor -Default:1 +Default: 1 -IsMutable:true +IsMutable: true colocote join PlanFragment instance的memory_limit = exec_mem_limit / min (query_colocate_join_memory_limit_penalty_factor, instance_num) ### max_connection_scheduler_threads_num -Default:4096 +Default: 4096 Maximal number of thread in connection-scheduler-pool. ### qe_max_connection -Default:1024 +Default: 1024 Maximal number of connections per FE. ### check_consistency_default_timeout_second -Default:600 (10分钟) +Default: 600 (10分钟) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Default timeout of a single consistency check task. Set long enough to fit your tablet size ### consistency_check_start_time -Default:23 +Default: 23 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Consistency checker will run from *consistency_check_start_time* to *consistency_check_end_time*. Default is from 23:00 to 04:00 ### consistency_check_end_time -Default:04 +Default: 04 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Consistency checker will run from *consistency_check_start_time* to *consistency_check_end_time*. Default is from 23:00 to 04:00 ### export_tablet_num_per_task -Default:5 +Default: 5 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Number of tablets per export query plan ### export_task_default_timeout_second -Default:2 * 3600 (2 hour) +Default: 2 * 3600 (2 hour) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Default timeout of export jobs. ### export_running_job_num_limit -Default:5 +Default: 5 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Limitation of the concurrency of running export jobs. Default is 5. 0 is unlimited ### export_checker_interval_second -Default:5 +Default: 5 Export checker's running interval. @@ -1094,102 +1094,102 @@ Export checker's running interval. Default: 1 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Default parallelism of the broker load execution plan on a single node. If the user to set the parallelism when the broker load is submitted, this parameter will be ignored. ### max_broker_concurrency -Default:10 +Default: 10 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Maximal concurrency of broker scanners. ### min_bytes_per_broker_scanner -Default:67108864L (64M) +Default: 67108864L (64M) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Minimum bytes that a single broker scanner will read. ### catalog_trash_expire_second -Default:86400L (1day) +Default: 86400L (1day) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true After dropping database(table/partition), you can recover it by using RECOVER stmt. And this specifies the maximal data retention time. After time, the data will be deleted permanently. ### storage_cooldown_second -Default:30 * 24 * 3600L (30day) +Default: 30 * 24 * 3600L (30day) When create a table(or partition), you can specify its storage medium(HDD or SSD). If set to SSD, this specifies the default duration that tablets will stay on SSD. After that, tablets will be moved to HDD automatically. You can set storage cooldown time in CREATE TABLE stmt. ### default_storage_medium -Default:HDD +Default: HDD When create a table(or partition), you can specify its storage medium(HDD or SSD). If not set, this specifies the default medium when creat. ### max_backend_down_time_second -Default:3600 (1hour) +Default: 3600 (1hour) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true If a backend is down for *max_backend_down_time_second*, a BACKEND_DOWN event will be triggered. ### alter_table_timeout_second -Default:86400 (1day) +Default: 86400 (1day) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Maximal timeout of ALTER TABLE request. Set long enough to fit your table data size. ### capacity_used_percent_high_water -Default:0.75 (75%) +Default: 0.75 (75%) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true The high water of disk capacity used percent. This is used for calculating load score of a backend ### clone_distribution_balance_threshold -Default:0.2 +Default: 0.2 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Balance threshold of num of replicas in Backends. ### clone_capacity_balance_threshold -Default:0.2 +Default: 0.2 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Balance threshold of data size in BE. The balance algorithm is: @@ -1201,88 +1201,88 @@ Balance threshold of data size in BE. ### replica_delay_recovery_second -Default:0 +Default: 0 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true the minimal delay seconds between a replica is failed and fe try to recovery it using clone. ### clone_high_priority_delay_second -Default:0 +Default: 0 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true HIGH priority clone job's delay trigger time. ### clone_normal_priority_delay_second -Default:300 (5min) +Default: 300 (5min) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true NORMAL priority clone job's delay trigger time ### clone_low_priority_delay_second -Default:600 (10min) +Default: 600 (10min) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true LOW priority clone job's delay trigger time. A clone job contains a tablet which need to be cloned(recovery or migration). If the priority is LOW, it will be delayed *clone_low_priority_delay_second* after the job creation and then be executed. This is to avoid a large number of clone jobs running at same time only because a host is down for a short time. **NOTICE** that this config(and *clone_normal_priority_delay_second* as well) will not work if it's smaller then *clone_checker_interval_second* ### clone_max_job_num -Default:100 +Default: 100 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Concurrency of LOW priority clone jobs. Concurrency of High priority clone jobs is currently unlimited. ### clone_job_timeout_second -Default:7200 (2小时) +Default: 7200 (2小时) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Default timeout of a single clone job. Set long enough to fit your replica size. The larger the replica data size is, the more time is will cost to finish clone ### clone_checker_interval_second -Default:300 (5min) +Default: 300 (5min) Clone checker's running interval ### tablet_delete_timeout_second -Default:2 +Default: 2 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Same meaning as *tablet_create_timeout_second*, but used when delete a tablet. ### async_loading_load_task_pool_size -Default:10 +Default: 10 -IsMutable:false +IsMutable: false -MasterOnly:true +MasterOnly: true The loading_load task executor pool size. This pool size limits the max running loading_load tasks. @@ -1290,11 +1290,11 @@ Currently, it only limits the loading_load task of broker load ### async_pending_load_task_pool_size -Default:10 +Default: 10 -IsMutable:false +IsMutable: false -MasterOnly:true +MasterOnly: true The pending_load task executor pool size. This pool size limits the max running pending_load tasks. @@ -1304,244 +1304,244 @@ It should be less than 'max_running_txn_num_per_db' ### async_load_task_pool_size -Default:10 +Default: 10 -IsMutable:false +IsMutable: false -MasterOnly:true +MasterOnly: true This configuration is just for compatible with old version, this config has been replaced by async_loading_load_task_pool_size, it will be removed in the future. ### disable_show_stream_load -Default:false +Default: false -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Whether to disable show stream load and clear stream load records in memory. ### max_stream_load_record_size -Default:5000 +Default: 5000 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Default max number of recent stream load record that can be stored in memory. ### fetch_stream_load_record_interval_second -Default:120 +Default: 120 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true fetch stream load record interval. ### desired_max_waiting_jobs -Default:100 +Default: 100 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true -Default number of waiting jobs for routine load and version 2 of load , This is a desired number. In some situation, such as switch the master, the current number is maybe more than desired_max_waiting_jobs. +Default number of waiting jobs for routine load and version 2 of load , This is a desired number. In some situation, such as switch the master, the current number is maybe more than desired_max_waiting_jobs. ### yarn_config_dir -Default:PaloFe.DORIS_HOME_DIR + "/lib/yarn-config" +Default: PaloFe.DORIS_HOME_DIR + "/lib/yarn-config" -Default yarn config file directory ,Each time before running the yarn command, we need to check that the config file exists under this path, and if not, create them. +Default yarn config file directory ,Each time before running the yarn command, we need to check that the config file exists under this path, and if not, create them. ### yarn_client_path -Default:DORIS_HOME_DIR + "/lib/yarn-client/hadoop/bin/yarn" +Default: DORIS_HOME_DIR + "/lib/yarn-client/hadoop/bin/yarn" Default yarn client path ### spark_launcher_log_dir -Default: sys_log_dir + "/spark_launcher_log" +Default: sys_log_dir + "/spark_launcher_log" The specified spark launcher log dir ### spark_resource_path -Default:none +Default: none Default spark dependencies path ### spark_home_default_dir -Default:DORIS_HOME_DIR + "/lib/spark2x" +Default: DORIS_HOME_DIR + "/lib/spark2x" Default spark home dir ### spark_load_default_timeout_second -Default:86400 (1天) +Default: 86400 (1天) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Default spark load timeout ### spark_dpp_version -Default:1.0.0 +Default: 1.0.0 Default spark dpp version ### hadoop_load_default_timeout_second -Default:86400 * 3 (3天) +Default: 86400 * 3 (3天) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Default hadoop load timeout ### min_load_timeout_second -Default:1 (1s) +Default: 1 (1s) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Min stream load timeout applicable to all type of load ### max_stream_load_timeout_second -Default:259200 (3天) +Default: 259200 (3天) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true This configuration is specifically used to limit timeout setting for stream load. It is to prevent that failed stream load transactions cannot be canceled within a short time because of the user's large timeout setting ### max_load_timeout_second -Default:259200 (3天) +Default: 259200 (3天) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Max load timeout applicable to all type of load except for stream load ### stream_load_default_timeout_second -Default:600 (s) +Default: 600 (s) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Default stream load and streaming mini load timeout ### insert_load_default_timeout_second -Default:3600 (1 hour) +Default: 3600 (1 hour) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Default insert load timeout ### mini_load_default_timeout_second -Default:3600 (1 hour) +Default: 3600 (1 hour) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Default non-streaming mini load timeout ### broker_load_default_timeout_second -Default:14400 (4 hour) +Default: 14400 (4 hour) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Default broker load timeout ### load_running_job_num_limit -Default:0 +Default: 0 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true The number of loading tasks is limited, the default is 0, no limit ### load_input_size_limit_gb -Default:0 +Default: 0 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true The size of the data entered by the Load job, the default is 0, unlimited ### delete_thread_num -Default:10 +Default: 10 Concurrency of delete jobs. ### load_etl_thread_num_normal_priority -Default:10 +Default: 10 Concurrency of NORMAL priority etl load jobs. Do not change this if you know what you are doing. ### load_etl_thread_num_high_priority -Default:3 +Default: 3 Concurrency of HIGH priority etl load jobs. Do not change this if you know what you are doing ### load_pending_thread_num_normal_priority -Default:10 +Default: 10 Concurrency of NORMAL priority pending load jobs. Do not change this if you know what you are doing. ### load_pending_thread_num_high_priority -Default:3 +Default: 3 Concurrency of HIGH priority pending load jobs. Load job priority is defined as HIGH or NORMAL. All mini batch load jobs are HIGH priority, other types of load jobs are NORMAL priority. Priority is set to avoid that a slow load job occupies a thread for a long time. This is just a internal optimized scheduling policy. Currently, you can not specified the job priority manually, and do not change this if you know what you are doing. ### load_checker_interval_second -Default:5 (s) +Default: 5 (s) The load scheduler running interval. A load job will transfer its state from PENDING to LOADING to FINISHED. The load scheduler will transfer load job from PENDING to LOADING while the txn callback will transfer load job from LOADING to FINISHED. So a load job will cost at most one interval to finish when the concurrency has not reached the upper limit. ### max_layout_length_per_row -Default:100000 +Default: 100000 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Maximal memory layout length of a row. default is 100 KB. In BE, the maximal size of a RowBlock is 100MB(Configure as max_unpacked_row_block_size in be.conf). And each RowBlock contains 1024 rows. So the maximal size of a row is approximately 100 KB. eg. @@ -1553,11 +1553,11 @@ Maximal memory layout length of a row. default is 100 KB. In BE, the maximal siz ### load_straggler_wait_second -Default:300 +Default: 300 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Maximal wait seconds for straggler node in load eg. @@ -1572,43 +1572,43 @@ Maximal wait seconds for straggler node in load ### thrift_server_max_worker_threads -Default:4096 +Default: 4096 The thrift server max worker threads ### publish_version_interval_ms -Default:10 (ms) +Default: 10 (ms) minimal intervals between two publish version action ### publish_version_timeout_second -Default:30 (s) +Default: 30 (s) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Maximal waiting time for all publish version tasks of one transaction to be finished ### max_create_table_timeout_second -Default:60 (s) +Default: 60 (s) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true In order not to wait too long for create table(index), set a max timeout. ### tablet_create_timeout_second -Default:1(s) +Default: 1(s) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Maximal waiting time for creating a single replica. eg. @@ -1617,7 +1617,7 @@ Maximal waiting time for creating a single replica. ### max_mysql_service_task_threads_num -Default:4096 +Default: 4096 When FeEstarts the MySQL server based on NIO model, the number of threads responsible for Task events. Only `mysql_service_nio_enabled` is true takes effect. @@ -1632,43 +1632,43 @@ This variable is a session variable, and the session level takes effect. ### cluster_id -Default:-1 +Default: -1 node(FE or BE) will be considered belonging to the same Palo cluster if they have same cluster id. Cluster id is usually a random integer generated when master FE start at first time. You can also specify one. ### auth_token -Default:空 +Default: 空 Cluster token used for internal authentication. ### cluster_name -Default: Apache doris +Default: Apache doris Cluster name will be shown as the title of web page ### mysql_service_io_threads_num -Default:4 +Default: 4 When FeEstarts the MySQL server based on NIO model, the number of threads responsible for IO events. Only `mysql_service_nio_enabled` is true takes effect. ### mysql_service_nio_enabled -Default:true +Default: true Whether FE starts the MySQL server based on NiO model. It is recommended to turn off this option when the query connection is less than 1000 or the concurrency scenario is not high ### query_port -Default:9030 +Default: 9030 FE MySQL server port ### rpc_port -Default:9020 +Default: 9020 FE Thrift Server port @@ -1684,13 +1684,13 @@ If this parameter is `THREAD_POOL`, then the `TThreadPoolServer` model is used, ### thrift_backlog_num -Default:1024 +Default: 1024 -The backlog_num for thrift server , When you enlarge this backlog_num, you should ensure it's value larger than the linux /proc/sys/net/core/somaxconn config +The backlog_num for thrift server , When you enlarge this backlog_num, you should ensure it's value larger than the linux /proc/sys/net/core/somaxconn config ### thrift_client_timeout_ms -Default:0 +Default: 0 The connection timeout and socket timeout config for thrift server. @@ -1698,35 +1698,35 @@ The value for thrift_client_timeout_ms is set to be larger than zero to prevent ### mysql_nio_backlog_num -Default:1024 +Default: 1024 The backlog_num for mysql nio server, When you enlarge this backlog_num, you should enlarge the value in the linux /proc/sys/net/core/somaxconn file at the same time ### http_backlog_num -Default:1024 +Default: 1024 The backlog_num for netty http server, When you enlarge this backlog_num, you should enlarge the value in the linux /proc/sys/net/core/somaxconn file at the same time ### http_max_line_length -Default:4096 +Default: 4096 The max length of an HTTP URL. The unit of this configuration is BYTE. Defaults to 4096. ### http_max_header_size -Default:8192 +Default: 8192 The max size of allowed HTTP headers. The unit of this configuration is BYTE. Defaults to 8192. ### http_max_chunk_size -Default:8192 +Default: 8192 ### http_port -Default:8030 +Default: 8030 HTTP bind port. Defaults to 8030 @@ -1738,128 +1738,128 @@ The default is empty, that is, not set ### max_bdbje_clock_delta_ms -Default:5000 (5s) +Default: 5000 (5s) Set the maximum acceptable clock skew between non-master FE to Master FE host. This value is checked whenever a non-master FE establishes a connection to master FE via BDBJE. The connection is abandoned if the clock skew is larger than this value. ### ignore_meta_check -Default:false +Default: false -IsMutable:true +IsMutable: true If true, non-master FE will ignore the meta data delay gap between Master FE and its self, even if the metadata delay gap exceeds *meta_delay_toleration_second*. Non-master FE will still offer read service. This is helpful when you try to stop the Master FE for a relatively long time for some reason, but still wish the non-master FE can offer read service. ### metadata_failure_recovery -Default:false +Default: false If true, FE will reset bdbje replication group(that is, to remove all electable nodes info) and is supposed to start as Master. If all the electable nodes can not start, we can copy the meta data to another node and set this config to true to try to restart the FE.. ### priority_networks -Default:none +Default: none -Declare a selection strategy for those servers have many ips. Note that there should at most one ip match this list. this is a list in semicolon-delimited format, in CIDR notation, e.g. 10.10.10.0/24 , If no ip match this rule, will choose one randomly.. +Declare a selection strategy for those servers have many ips. Note that there should at most one ip match this list. this is a list in semicolon-delimited format, in CIDR notation, e.g. 10.10.10.0/24 , If no ip match this rule, will choose one randomly.. ### txn_rollback_limit -Default:100 +Default: 100 the max txn number which bdbje can rollback when trying to rejoin the group ### max_agent_task_threads_num -Default:4096 +Default: 4096 -MasterOnly:true +MasterOnly: true max num of thread to handle agent task in agent task thread-pool. ### heartbeat_mgr_blocking_queue_size -Default:1024 +Default: 1024 -MasterOnly:true +MasterOnly: true blocking queue size to store heartbeat task in heartbeat_mgr. ### heartbeat_mgr_threads_num -Default:8 +Default: 8 -MasterOnly:true +MasterOnly: true num of thread to handle heartbeat events in heartbeat_mgr. ### bdbje_replica_ack_timeout_second -Default:10 (s) +Default: 10 (s) -The replica ack timeout when writing to bdbje , When writing some relatively large logs, the ack time may time out, resulting in log writing failure. At this time, you can increase this value appropriately. +The replica ack timeout when writing to bdbje , When writing some relatively large logs, the ack time may time out, resulting in log writing failure. At this time, you can increase this value appropriately. ### bdbje_lock_timeout_second -Default:1 +Default: 1 -The lock timeout of bdbje operation, If there are many LockTimeoutException in FE WARN log, you can try to increase this value +The lock timeout of bdbje operation, If there are many LockTimeoutException in FE WARN log, you can try to increase this value ### bdbje_heartbeat_timeout_second -Default:30 +Default: 30 The heartbeat timeout of bdbje between master and follower. the default is 30 seconds, which is same as default value in bdbje. If the network is experiencing transient problems, of some unexpected long java GC annoying you, you can try to increase this value to decrease the chances of false timeouts ### replica_ack_policy -Default:SIMPLE_MAJORITY +Default: SIMPLE_MAJORITY -OPTION:ALL, NONE, SIMPLE_MAJORITY +OPTION: ALL, NONE, SIMPLE_MAJORITY Replica ack policy of bdbje. more info, see: http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/Durability.ReplicaAckPolicy.html ### replica_sync_policy -Default:SYNC +Default: SYNC -选项:SYNC, NO_SYNC, WRITE_NO_SYNC +选项: SYNC, NO_SYNC, WRITE_NO_SYNC Follower FE sync policy of bdbje. ### master_sync_policy -Default:SYNC +Default: SYNC -选项:SYNC, NO_SYNC, WRITE_NO_SYNC +选项: SYNC, NO_SYNC, WRITE_NO_SYNC Master FE sync policy of bdbje. If you only deploy one Follower FE, set this to 'SYNC'. If you deploy more than 3 Follower FE, you can set this and the following 'replica_sync_policy' to WRITE_NO_SYNC. more info, see: http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/Durability.SyncPolicy.html ### meta_delay_toleration_second -Default:300 (5分钟) +Default: 300 (5分钟) Non-master FE will stop offering service if meta data delay gap exceeds *meta_delay_toleration_second* ### edit_log_roll_num -Default:50000 +Default: 50000 -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Master FE will save image every *edit_log_roll_num* meta journals. ### edit_log_port -Default:9010 +Default: 9010 bdbje port ### edit_log_type -Default:BDB +Default: BDB Edit log type. BDB: write log to bdbje @@ -1867,13 +1867,13 @@ Edit log type. ### tmp_dir -Default:PaloFe.DORIS_HOME_DIR + "/temp_dir" +Default: PaloFe.DORIS_HOME_DIR + "/temp_dir" temp dir is used to save intermediate results of some process, such as backup and restore process. file in this dir will be cleaned after these process is finished. ### meta_dir -Default:DORIS_HOME_DIR + "/doris-meta" +Default: DORIS_HOME_DIR + "/doris-meta" Type: string Description: Doris meta data will be saved here.The storage of this dir is highly recommended as to be: @@ -1882,7 +1882,7 @@ Type: string Description: Doris meta data will be saved here.The storage of this ### custom_config_dir -Default:PaloFe.DORIS_HOME_DIR + "/conf" +Default: PaloFe.DORIS_HOME_DIR + "/conf" Configure the location of the `fe_custom.conf` file. The default is in the `conf/` directory. @@ -1890,13 +1890,13 @@ In some deployment environments, the `conf/` directory may be overwritten due to ### log_roll_size_mb -Default:1024 (1G) +Default: 1024 (1G) The max size of one sys log and audit log ### sys_log_dir -Default:PaloFe.DORIS_HOME_DIR + "/log" +Default: PaloFe.DORIS_HOME_DIR + "/log" sys_log_dir: This specifies FE log dir. FE will produces 2 log files: @@ -1905,29 +1905,29 @@ sys_log_dir: ### sys_log_level -Default:INFO +Default: INFO -log level:INFO, WARNING, ERROR, FATAL +log level: INFO, WARNING, ERROR, FATAL ### sys_log_roll_num -Default:10 +Default: 10 Maximal FE log files to be kept within an sys_log_roll_interval. default is 10, which means there will be at most 10 log files in a day ### sys_log_verbose_modules -Default:{} +Default: {} Verbose modules. VERBOSE level is implemented by log4j DEBUG level. -eg: +eg: sys_log_verbose_modules = org.apache.doris.catalog This will only print debug log of files in package org.apache.doris.catalog and all its sub packages. ### sys_log_roll_interval -Default:DAY +Default: DAY sys_log_roll_interval: @@ -1936,7 +1936,7 @@ sys_log_roll_interval: ### sys_log_delete_age -Default:7d +Default: 7d sys_log_delete_age: default is 7 days, if log's last modify time is 7 days ago, it will be deleted. @@ -1950,40 +1950,40 @@ sys_log_delete_age: ### audit_log_dir -Default:DORIS_HOME_DIR + "/log" +Default: DORIS_HOME_DIR + "/log" -audit_log_dir: +audit_log_dir: This specifies FE audit log dir.. Audit log fe.audit.log contains all requests with related infos such as user, host, cost, status, etc ### audit_log_roll_num -Default:90 +Default: 90 Maximal FE audit log files to be kept within an audit_log_roll_interval. ### audit_log_modules -Default:{"slow_query", "query", "load", "stream_load"} +Default: {"slow_query", "query", "load", "stream_load"} Slow query contains all queries which cost exceed *qe_slow_log_ms* ### qe_slow_log_ms -Default:5000 (5秒) +Default: 5000 (5秒) If the response time of a query exceed this threshold, it will be recorded in audit log as slow_query. ### audit_log_roll_interval -Default:DAY +Default: DAY -DAY: logsuffix is :yyyyMMdd -HOUR: logsuffix is :yyyyMMddHH +DAY: logsuffix is : yyyyMMdd +HOUR: logsuffix is : yyyyMMddHH ### audit_log_delete_age -Default:30d +Default: 30d default is 30 days, if log's last modify time is 30 days ago, it will be deleted. @@ -1995,7 +1995,7 @@ default is 30 days, if log's last modify time is 30 days ago, it will be deleted ### plugin_dir -Default:DORIS_HOME + "/plugins +Default: DORIS_HOME + "/plugins plugin install directory @@ -2003,63 +2003,63 @@ plugin install directory Default:true -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true Whether the plug-in is enabled, enabled by default ### label_keep_max_second -Default:3 * 24 * 3600 (3day) +Default: 3 * 24 * 3600 (3day) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true -labels of finished or cancelled load jobs will be removed after *label_keep_max_second* , The removed labels can be reused. Set a short time will lower the FE memory usage. (Because all load jobs' info is kept in memory before being removed) +labels of finished or cancelled load jobs will be removed after *label_keep_max_second* , The removed labels can be reused. Set a short time will lower the FE memory usage. (Because all load jobs' info is kept in memory before being removed) In the case of high concurrent writes, if there is a large backlog of jobs and call frontend service failed, check the log. If the metadata write takes too long to lock, you can adjust this value to 12 hours, or 6 hours less ### streaming_label_keep_max_second -Default:43200 (12 hour) +Default: 43200 (12 hour) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true For some high-frequency load work, such as: INSERT, STREAMING LOAD, ROUTINE_LOAD_TASK. If it expires, delete the completed job or task. ### history_job_keep_max_second -Default:7 * 24 * 3600 (7 day) +Default: 7 * 24 * 3600 (7 day) -IsMutable:true +IsMutable: true -MasterOnly:true +MasterOnly: true The max keep time of some kind of jobs. like schema change job and rollup job. ### label_clean_interval_second -Default:4 * 3600 (4 hour) +Default: 4 * 3600 (4 hour) Load label cleaner will run every *label_clean_interval_second* to clean the outdated jobs. ### delete_info_keep_max_second -Default:3 * 24 * 3600 (3day) +Default: 3 * 24 * 3600 (3day) -IsMutable:true +IsMutable: true -MasterOnly:false +MasterOnly: false Delete all deleteInfo older than *delete_info_keep_max_second* , Setting a shorter time will reduce FE memory usage and image file size. (Because all deleteInfo is stored in memory and image files before being deleted) ### transaction_clean_interval_second -Default:30 +Default: 30 the transaction will be cleaned after transaction_clean_interval_second seconds if the transaction is visible or aborted we should make this interval as short as possible and each clean cycle as soon as possible @@ -2092,7 +2092,7 @@ When there are a large number of replicas waiting to be balanced or repaired in Default: false -IsMutable:true +IsMutable: true MasterOnly: true diff --git a/docs/en/administrator-guide/dynamic-partition.md b/docs/en/administrator-guide/dynamic-partition.md index 04471b38b5..bf4c1d56ef 100644 --- a/docs/en/administrator-guide/dynamic-partition.md +++ b/docs/en/administrator-guide/dynamic-partition.md @@ -159,7 +159,7 @@ The rules of dynamic partition are prefixed with `dynamic_partition.`: The range of reserved history periods. It should be in the form of `[yyyy-MM-dd,yyyy-MM-dd],[...,...]` while the `dynamic_partition.time_unit` is "DAY, WEEK, and MONTH". And it should be in the form of `[yyyy-MM-dd HH:mm:ss,yyyy-MM-dd HH:mm:ss],[...,...]` while the dynamic_partition.time_unit` is "HOUR". And no more spaces expected. The default value is `"NULL"`, which means it is not set. - Let us give an example. Suppose today is 2021-09-06,partitioned by day, and the properties of dynamic partition are set to: + Let us give an example. Suppose today is 2021-09-06,partitioned by day, and the properties of dynamic partition are set to: ```time_unit="DAY/WEEK/MONTH", end=3, start=-3, reserved_history_periods="[2020-06-01,2020-06-20],[2020-10-31,2020-11-15]"```. diff --git a/docs/en/administrator-guide/ldap.md b/docs/en/administrator-guide/ldap.md index cb3cb25b99..ceaebb7c05 100644 --- a/docs/en/administrator-guide/ldap.md +++ b/docs/en/administrator-guide/ldap.md @@ -43,7 +43,7 @@ LDAP group authorization, is to map the group in LDAP to the Role in Doris, if t You need to configure the LDAP basic information in the fe/conf/ldap.conf file, and the LDAP administrator password needs to be set using sql statements. -#### Configure the fe/conf/ldap.conf file: +#### Configure the fe/conf/ldap.conf file: * ldap_authentication_enabled = false Set the value to "true" to enable LDAP authentication; when the value is "false", LDAP authentication is not enabled and all other configuration items of this profile are invalid.Set the value to "true" to enable LDAP authentication; when the value is "false", LDAP authentication is not enabled and all other configuration items of this profile are invalid. @@ -66,7 +66,7 @@ You need to configure the LDAP basic information in the fe/conf/ldap.conf file, For example, if you use the LDAP user node uid attribute as the username to log into Doris, you can configure it as: ldap_user_filter = (&(uid={login})); This item can be configured using the LDAP user mailbox prefix as the user name: - ldap_user_filter = (&(mail={login}@baidu.com))。 + ldap_user_filter = (&(mail={login}@baidu.com)). * ldap_group_basedn = ou=group,dc=domain,dc=com base dn when Doris searches for group information in LDAP. if this item is not configured, LDAP group authorization will not be enabled. diff --git a/docs/en/administrator-guide/load-data/binlog-load-manual.md b/docs/en/administrator-guide/load-data/binlog-load-manual.md index a13e52000d..772162d2c2 100644 --- a/docs/en/administrator-guide/load-data/binlog-load-manual.md +++ b/docs/en/administrator-guide/load-data/binlog-load-manual.md @@ -497,7 +497,7 @@ The following configuration belongs to the system level configuration of SyncJob * `max_bytes_sync_commit` - The maximum size of the data when the transaction is committed. If the data size received by Fe is larger than it, it will immediately commit the transaction and send the accumulated data. The default value is 64MB. If you want to modify this configuration, please ensure that this value is greater than the product of `canal.instance.memory.buffer.size` and `canal.instance.memory.buffer.mmemunit` on the canal side (16MB by default) and `min_bytes_sync_commit`。 + The maximum size of the data when the transaction is committed. If the data size received by Fe is larger than it, it will immediately commit the transaction and send the accumulated data. The default value is 64MB. If you want to modify this configuration, please ensure that this value is greater than the product of `canal.instance.memory.buffer.size` and `canal.instance.memory.buffer.mmemunit` on the canal side (16MB by default) and `min_bytes_sync_commit`. * `max_sync_task_threads_num` diff --git a/docs/en/administrator-guide/load-data/routine-load-manual.md b/docs/en/administrator-guide/load-data/routine-load-manual.md index d52f9eb68e..8d54bc0480 100644 --- a/docs/en/administrator-guide/load-data/routine-load-manual.md +++ b/docs/en/administrator-guide/load-data/routine-load-manual.md @@ -301,7 +301,7 @@ The user can control the stop, pause and restart of the job by the three command 7. The difference between STOP and PAUSE - the FE will automatically clean up stopped ROUTINE LOAD,while paused ROUTINE LOAD can be resumed + the FE will automatically clean up stopped ROUTINE LOAD,while paused ROUTINE LOAD can be resumed ## Related parameters diff --git a/docs/en/administrator-guide/load-data/stream-load-manual.md b/docs/en/administrator-guide/load-data/stream-load-manual.md index edfcb8c96c..83303c6f35 100644 --- a/docs/en/administrator-guide/load-data/stream-load-manual.md +++ b/docs/en/administrator-guide/load-data/stream-load-manual.md @@ -171,10 +171,10 @@ The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL` + two\_phase\_commit - Stream load supports the two-phase commit mode。The mode could be enabled by declaring ```two_phase_commit=true``` in http header. This mode is disabled by default. - the two-phase commit mode means:During Stream load, after data is written, the message will be returned to the client, the data is invisible at this point and the transaction status is PRECOMMITTED. The data will be visible only after COMMIT is triggered by client。 + Stream load supports the two-phase commit mode.The mode could be enabled by declaring ```two_phase_commit=true``` in http header. This mode is disabled by default. + the two-phase commit mode means: During Stream load, after data is written, the message will be returned to the client, the data is invisible at this point and the transaction status is PRECOMMITTED. The data will be visible only after COMMIT is triggered by client. - 1. User can invoke the following interface to trigger commit operations for transaction: + 1. User can invoke the following interface to trigger commit operations for transaction: ``` curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" http://fe_host:http_port/api/{db}/_stream_load_2pc ``` @@ -183,7 +183,7 @@ The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL` curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" http://be_host:webserver_port/api/{db}/_stream_load_2pc ``` - 2. User can invoke the following interface to trigger abort operations for transaction: + 2. User can invoke the following interface to trigger abort operations for transaction: ``` curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" http://fe_host:http_port/api/{db}/_stream_load_2pc ``` @@ -360,7 +360,7 @@ Cluster situation: The concurrency of Stream load is not affected by cluster siz In the community version 0.14.0 and earlier versions, the connection reset exception occurred after Http V2 was enabled, because the built-in web container is tomcat, and Tomcat has pits in 307 (Temporary Redirect). There is a problem with the implementation of this protocol. All In the case of using Stream load to import a large amount of data, a connect reset exception will occur. This is because tomcat started data transmission before the 307 jump, which resulted in the lack of aut [...] - After the upgrade, also upgrade the http client version of your program to `4.5.13`,Introduce the following dependencies in your pom.xml file + After the upgrade, also upgrade the http client version of your program to `4.5.13`,Introduce the following dependencies in your pom.xml file ```xml <dependency> diff --git a/docs/en/administrator-guide/operation/disk-capacity.md b/docs/en/administrator-guide/operation/disk-capacity.md index 71027d9f6e..77473cf775 100644 --- a/docs/en/administrator-guide/operation/disk-capacity.md +++ b/docs/en/administrator-guide/operation/disk-capacity.md @@ -32,9 +32,9 @@ If Doris' data disk capacity is not controlled, the process will hang because th ## Glossary -* FE:Doris Frontend Node. Responsible for metadata management and request access. -* BE:Doris Backend Node. Responsible for query execution and data storage. -* Data Dir:Data directory, each data directory specified in the `storage_root_path` of the BE configuration file `be.conf`. Usually a data directory corresponds to a disk, so the following **disk** also refers to a data directory. +* FE: Doris Frontend Node. Responsible for metadata management and request access. +* BE: Doris Backend Node. Responsible for query execution and data storage. +* Data Dir: Data directory, each data directory specified in the `storage_root_path` of the BE configuration file `be.conf`. Usually a data directory corresponds to a disk, so the following **disk** also refers to a data directory. ## Basic Principles @@ -125,7 +125,7 @@ When the disk capacity is higher than High Watermark or even Flood Stage, many o When the BE has crashed because the disk is full and cannot be started (this phenomenon may occur due to untimely detection of FE or BE), you need to delete some temporary files in the data directory to ensure that the BE process can start. Files in the following directories can be deleted directly: - * log/:Log files in the log directory. + * log/: Log files in the log directory. * snapshot/: Snapshot files in the snapshot directory. * trash/ Trash files in the trash directory. diff --git a/docs/en/administrator-guide/running-profile.md b/docs/en/administrator-guide/running-profile.md index 8dff6f4c4f..17ce052f09 100644 --- a/docs/en/administrator-guide/running-profile.md +++ b/docs/en/administrator-guide/running-profile.md @@ -124,9 +124,9 @@ There are many statistical information collected at BE. so we list the correspo - BytesReceived: Size of bytes received by network - DataArrivalWaitTime: Total waiting time of sender to push data - MergeGetNext: When there is a sort in the lower level node, exchange node will perform a unified merge sort and output an ordered result. This indicator records the total time consumption of merge sorting, including the time consumption of MergeGetNextBatch. - - MergeGetNextBatch:It takes time for merge node to get data. If it is single-layer merge sort, the object to get data is network queue. For multi-level merge sorting, the data object is child merger. + - MergeGetNextBatch: It takes time for merge node to get data. If it is single-layer merge sort, the object to get data is network queue. For multi-level merge sorting, the data object is child merger. - ChildMergeGetNext: When there are too many senders in the lower layer to send data, single thread merge will become a performance bottleneck. Doris will start multiple child merge threads to do merge sort in parallel. The sorting time of child merge is recorded, which is the cumulative value of multiple threads. - - ChildMergeGetNextBatch: It takes time for child merge to get data,If the time consumption is too large, the bottleneck may be the lower level data sending node. + - ChildMergeGetNextBatch: It takes time for child merge to get data,If the time consumption is too large, the bottleneck may be the lower level data sending node. - FirstBatchArrivalWaitTime: The time waiting for the first batch come from sender - DeserializeRowBatchTimer: Time consuming to receive data deserialization - SendersBlockedTotalTimer(*): When the DataStreamRecv's queue buffer is full, wait time of sender diff --git a/docs/en/community/how-to-contribute/commit-format-specification.md b/docs/en/community/how-to-contribute/commit-format-specification.md index da4fb59203..3b9034f106 100644 --- a/docs/en/community/how-to-contribute/commit-format-specification.md +++ b/docs/en/community/how-to-contribute/commit-format-specification.md @@ -53,7 +53,7 @@ Commit is divided into ‘ title ’ and ‘ content ’ , the title should be l * deps: Modification of third-party dependency Library * community: Such as modification of Github issue template. - Some tips: + Some tips: 1. If there are multiple types in one commit, multiple types need to be added 2. If code refactoring brings performance improvement, [refactor][optimize] can be added at the same time @@ -80,7 +80,7 @@ Commit is divided into ‘ title ’ and ‘ content ’ , the title should be l * config * docs - Some tips: + Some tips: 1. Try to use options that already exist in the list. If you need to add, please update this document in time @@ -93,7 +93,7 @@ Commit is divided into ‘ title ’ and ‘ content ’ , the title should be l commit message should follow the following format: ``` - issue:#7777 + issue: #7777 your message ``` diff --git a/docs/en/community/release-and-verify/release-complete.md b/docs/en/community/release-and-verify/release-complete.md index e9db21ce84..7abfec5727 100644 --- a/docs/en/community/release-and-verify/release-complete.md +++ b/docs/en/community/release-and-verify/release-complete.md @@ -44,10 +44,10 @@ https://dist.apache.org/repos/dist/release/incubator/doris/ For the first release, you need to copy the KEYS file as well. Then add it to the svn release. ``` -add 成功后就可以在下面网址上看到你发布的文件 +After add succeeds, you can see the files you published on the following website https://dist.apache.org/repos/dist/release/incubator/doris/0.xx.0-incubating/ -稍等一段时间后,能在 apache 官网看到: +After a while, you can see on the official website of Apache: http://www.apache.org/dist/incubator/doris/0.9.0-incubating/ ``` @@ -150,7 +150,7 @@ Title: [ANNOUNCE] Apache Doris (incubating) 0.9.0 Release ``` -To mail: +To mail: ``` d...@doris.apache.org diff --git a/docs/en/developer-guide/be-vscode-dev.md b/docs/en/developer-guide/be-vscode-dev.md index 612f6f8710..86c3c7f452 100644 --- a/docs/en/developer-guide/be-vscode-dev.md +++ b/docs/en/developer-guide/be-vscode-dev.md @@ -32,7 +32,7 @@ under the License. 1. Download the doris source code - URL:[apache/incubator-doris: Apache Doris (Incubating) (github.com)](https://github.com/apache/incubator-doris) + URL: [apache/incubator-doris: Apache Doris (Incubating) (github.com)](https://github.com/apache/incubator-doris) 2. Install GCC 8.3.1+, Oracle JDK 1.8+, Python 2.7+, confirm that the gcc, java, python commands point to the correct version, and set the JAVA_HOME environment variable @@ -132,7 +132,7 @@ Need to create this folder, this is where the be data is stored mkdir -p /soft/be/storage ``` -3. Open vscode, and open the directory where the be source code is located. In this case, open the directory as **/home/workspace/incubator-doris/**,For details on how to vscode, refer to the online tutorial +3. Open vscode, and open the directory where the be source code is located. In this case, open the directory as **/home/workspace/incubator-doris/**,For details on how to vscode, refer to the online tutorial 4. Install the vscode ms c++ debugging plug-in, the plug-in identified by the red box in the figure below diff --git a/docs/en/developer-guide/benchmark-tool.md b/docs/en/developer-guide/benchmark-tool.md index 536881d7d4..74b1ce3da1 100644 --- a/docs/en/developer-guide/benchmark-tool.md +++ b/docs/en/developer-guide/benchmark-tool.md @@ -33,7 +33,7 @@ It can be used to test the performance of some parts of the BE storage layer (fo ## Compilation -1. To ensure that the environment has been able to successfully compile the Doris ontology, you can refer to [Installation and deployment] (https://doris.apache.org/master/en/installing/compilation.html)。 +1. To ensure that the environment has been able to successfully compile the Doris ontology, you can refer to [Installation and deployment] (https://doris.apache.org/master/en/installing/compilation.html). 2. Execute`run-be-ut.sh` @@ -53,9 +53,9 @@ The data set is generated according to the following rules. >int: Random in [1,1000000]. The data character set of string type is uppercase and lowercase English letters, and the length varies according to the type. -> char: Length random in [1,8]。 -> varchar: Length random in [1,128]。 -> string: Length random in [1,100000]。 +> char: Length random in [1,8]. +> varchar: Length random in [1,128]. +> string: Length random in [1,100000]. `rows_number` indicates the number of rows of data, the default value is `10000`. diff --git a/docs/en/developer-guide/cpp-diagnostic-code.md b/docs/en/developer-guide/cpp-diagnostic-code.md index dd172d8206..642ce2595c 100644 --- a/docs/en/developer-guide/cpp-diagnostic-code.md +++ b/docs/en/developer-guide/cpp-diagnostic-code.md @@ -26,7 +26,7 @@ under the License. # C++ Code Diagnostic -Doris support to use [Clangd](https://clangd.llvm.org/) and [Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/) to diagnostic code. Clangd and Clang-Tidy already has in [LDB-toolchain](https://doris.apache.org/zh-CN/installing/compilation-with-ldb-toolchain),also can install by self. +Doris support to use [Clangd](https://clangd.llvm.org/) and [Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/) to diagnostic code. Clangd and Clang-Tidy already has in [LDB-toolchain](https://doris.apache.org/zh-CN/installing/compilation-with-ldb-toolchain),also can install by self. ### Clang-Tidy Clang-Tidy can do some diagnostic cofig, config file `.clang-tidy` is in Doris root path. Compared with vscode-cpptools, clangd can provide more powerful and accurate code jumping for vscode, and integrates the analysis and quick-fix functions of clang-tidy. diff --git a/docs/en/developer-guide/fe-idea-dev.md b/docs/en/developer-guide/fe-idea-dev.md index 4146046a4b..afc90a0635 100644 --- a/docs/en/developer-guide/fe-idea-dev.md +++ b/docs/en/developer-guide/fe-idea-dev.md @@ -46,16 +46,16 @@ under the License. Doris build against `thrift` 0.13.0 ( note : `Doris` 0.15 and later version build against `thrift` 0.13.0 , the previous version is still `thrift` 0.9.3) Windows: - 1. Download:`http://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.exe` - 2. Copy:copy the file to `./thirdparty/installed/bin` + 1. Download: `http://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.exe` + 2. Copy: copy the file to `./thirdparty/installed/bin` MacOS: - 1. Download:`brew install thrift@0.13.0` - 2. Establish soft connection: + 1. Download: `brew install thrift@0.13.0` + 2. Establish soft connection: `mkdir -p ./thirdparty/installed/bin` `ln -s /opt/homebrew/Cellar/thrift@0.13.0/0.13.0/bin/thrift ./thirdparty/installed/bin/thrift` - Note:The error that the version cannot be found may be reported when MacOS execute `brew install thrift@0.13.0`. The solution is execute at the terminal as follows: + Note: The error that the version cannot be found may be reported when MacOS execute `brew install thrift@0.13.0`. The solution is execute at the terminal as follows: 1. `brew tap-new $USER/local-tap` 2. `brew extract --version='0.13.0' thrift $USER/local-tap` 3. `brew install thrift@0.13.0` diff --git a/docs/en/developer-guide/fe-vscode-dev.md b/docs/en/developer-guide/fe-vscode-dev.md index e90fc05269..e839449a7f 100644 --- a/docs/en/developer-guide/fe-vscode-dev.md +++ b/docs/en/developer-guide/fe-vscode-dev.md @@ -47,7 +47,7 @@ Create `settings.json` in `.vscode/` , and set settings: * `"java.configuration.runtimes"` * `"java.jdt.ls.java.home"` -- must set it to the directory of JDK11+, used for vscode-java plugin -* `"maven.executable.path"` -- maven path,for maven-language-server plugin +* `"maven.executable.path"` -- maven path,for maven-language-server plugin example: diff --git a/docs/en/extending-doris/doris-on-es.md b/docs/en/extending-doris/doris-on-es.md index a653986c35..79aa207109 100644 --- a/docs/en/extending-doris/doris-on-es.md +++ b/docs/en/extending-doris/doris-on-es.md @@ -349,7 +349,7 @@ PROPERTIES ( ); ``` -Parameter Description: +Parameter Description: Parameter | Description ---|--- @@ -378,7 +378,7 @@ PROPERTIES ( ); ``` -Parameter Description: +Parameter Description: Parameter | Description ---|--- diff --git a/docs/en/extending-doris/flink-doris-connector.md b/docs/en/extending-doris/flink-doris-connector.md index 08972d014a..acd7c016f8 100644 --- a/docs/en/extending-doris/flink-doris-connector.md +++ b/docs/en/extending-doris/flink-doris-connector.md @@ -81,14 +81,14 @@ Note: Executing `brew install thrift@0.13.0` on MacOS may report an error that t Reference link: `https://gist.github.com/tonydeng/02e571f273d6cce4230dc8d5f394493c` Linux: - 1.Download source package:`wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz` - 2.Install dependencies:`yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++` + 1.Download source package: `wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz` + 2.Install dependencies: `yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++` 3.`tar zxvf thrift-0.13.0.tar.gz` 4.`cd thrift-0.13.0` 5.`./configure --without-tests` 6.`make` 7.`make install` - Check the version after installation is complete:thrift --version + Check the version after installation is complete: thrift --version Note: If you have compiled Doris, you do not need to install thrift, you can directly use $DORIS_HOME/thirdparty/installed/bin/thrift ``` diff --git a/docs/en/extending-doris/hive-bitmap-udf.md b/docs/en/extending-doris/hive-bitmap-udf.md index 76c5fd2b85..40cb13cf3f 100644 --- a/docs/en/extending-doris/hive-bitmap-udf.md +++ b/docs/en/extending-doris/hive-bitmap-udf.md @@ -49,7 +49,7 @@ CREATE TABLE IF NOT EXISTS `hive_bitmap_table`( ``` -### Hive Bitmap UDF Usage: +### Hive Bitmap UDF Usage: Hive Bitmap UDF used in Hive/Spark diff --git a/docs/en/extending-doris/spark-doris-connector.md b/docs/en/extending-doris/spark-doris-connector.md index 8a565f8509..b7145654c0 100644 --- a/docs/en/extending-doris/spark-doris-connector.md +++ b/docs/en/extending-doris/spark-doris-connector.md @@ -77,14 +77,14 @@ Note: Executing `brew install thrift@0.13.0` on MacOS may report an error that t Reference link: `https://gist.github.com/tonydeng/02e571f273d6cce4230dc8d5f394493c` Linux: - 1.Download source package:`wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz` - 2.Install dependencies:`yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++` + 1.Download source package: `wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz` + 2.Install dependencies: `yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++` 3.`tar zxvf thrift-0.13.0.tar.gz` 4.`cd thrift-0.13.0` 5.`./configure --without-tests` 6.`make` 7.`make install` - Check the version after installation is complete:thrift --version + Check the version after installation is complete: thrift --version Note: If you have compiled Doris, you do not need to install thrift, you can directly use $DORIS_HOME/thirdparty/installed/bin/thrift ``` diff --git a/docs/en/extending-doris/udf/java-user-defined-function.md b/docs/en/extending-doris/udf/java-user-defined-function.md index e77c1ce465..efbd293f78 100644 --- a/docs/en/extending-doris/udf/java-user-defined-function.md +++ b/docs/en/extending-doris/udf/java-user-defined-function.md @@ -61,7 +61,7 @@ Instructions: 3. The UDF call type represented by `type` in properties is native by default. When using java UDF, it is transferred to `Java_UDF`. 4. `name`: A function belongs to a DB and name is of the form`dbName`.`funcName`. When `dbName` is not explicitly specified, the db of the current session is used`dbName`. -Sample: +Sample: ```sql CREATE FUNCTION java_udf_add_one(int) RETURNS int PROPERTIES ( "file"="file:///path/to/java-udf-demo-jar-with-dependencies.jar", diff --git a/docs/en/extending-doris/udf/remote-user-defined-function.md b/docs/en/extending-doris/udf/remote-user-defined-function.md index 1774a2c671..aa8cc3a3c9 100644 --- a/docs/en/extending-doris/udf/remote-user-defined-function.md +++ b/docs/en/extending-doris/udf/remote-user-defined-function.md @@ -46,17 +46,17 @@ Copy gensrc/proto/function_service.proto and gensrc/proto/types.proto to Rpc ser - function_service.proto - PFunctionCallRequest - - function_name:The function name, corresponding to the symbol specified when the function was created - - args:The parameters passed by the method - - context:Querying context Information + - function_name:The function name, corresponding to the symbol specified when the function was created + - args:The parameters passed by the method + - context:Querying context Information - PFunctionCallResponse - - result:Return result - - status:Return Status, 0 indicates normal + - result:Return result + - status:Return Status, 0 indicates normal - PCheckFunctionRequest - - function:Function related information - - match_type:Matching type + - function:Function related information + - match_type:Matching type - PCheckFunctionResponse - - status:Return status, 0 indicates normal + - status:Return status, 0 indicates normal ### Generated interface @@ -65,9 +65,9 @@ Use protoc generate code, and specific parameters are viewed using protoc -h ### Implementing an interface The following three methods need to be implemented -- fnCall:Used to write computational logic -- checkFn:Used to verify function names, parameters, and return values when creating UDFs -- handShake:Used for interface probe +- fnCall:Used to write computational logic +- checkFn:Used to verify function names, parameters, and return values when creating UDFs +- handShake:Used for interface probe ## Create UDF @@ -81,10 +81,10 @@ PROPERTIES (["key"="value"][,...]) ``` Instructions: -1. PROPERTIES中`symbol`Represents the name of the method passed by the RPC call, which must be set。 -2. PROPERTIES中`object_file`Represents the RPC service address. Currently, a single address and a cluster address in BRPC-compatible format are supported. Refer to the cluster connection mode[Format specification](https://github.com/apache/incubator-brpc/blob/master/docs/cn/client.md#%E8%BF%9E%E6%8E%A5%E6%9C%8D%E5%8A%A1%E9%9B%86%E7%BE%A4)。 -3. PROPERTIES中`type`Indicates the UDF call type, which is Native by default. Rpc is transmitted when Rpc UDF is used。 -4. name: A function belongs to a DB and name is of the form`dbName`.`funcName`. When `dbName` is not explicitly specified, the db of the current session is used`dbName`。 +1. PROPERTIES中`symbol`Represents the name of the method passed by the RPC call, which must be set. +2. PROPERTIES中`object_file`Represents the RPC service address. Currently, a single address and a cluster address in BRPC-compatible format are supported. Refer to the cluster connection mode[Format specification](https://github.com/apache/incubator-brpc/blob/master/docs/cn/client.md#%E8%BF%9E%E6%8E%A5%E6%9C%8D%E5%8A%A1%E9%9B%86%E7%BE%A4). +3. PROPERTIES中`type`Indicates the UDF call type, which is Native by default. Rpc is transmitted when Rpc UDF is used. +4. name: A function belongs to a DB and name is of the form`dbName`.`funcName`. When `dbName` is not explicitly specified, the db of the current session is used`dbName`. Sample: ```sql diff --git a/docs/en/installing/install-deploy.md b/docs/en/installing/install-deploy.md index 77f9e657a7..bf572044dd 100644 --- a/docs/en/installing/install-deploy.md +++ b/docs/en/installing/install-deploy.md @@ -215,8 +215,8 @@ See the section on `lower_case_table_names` variables in [Variables](../administ **instructions** - * 1./home/disk1/doris,medium:hdd,capacity:10,capacity limit is 10GB, HDD; - * 2./home/disk2/doris,medium:ssd,capacity:50,capacity limit is 50GB, SSD; + * 1./home/disk1/doris,medium:hdd,capacity:10,capacity limit is 10GB, HDD; + * 2./home/disk2/doris,medium:ssd,capacity:50,capacity limit is 50GB, SSD; * BE webserver_port configuration diff --git a/docs/en/sql-reference/sql-functions/bitmap-functions/bitmap_subset_limit.md b/docs/en/sql-reference/sql-functions/bitmap-functions/bitmap_subset_limit.md index 0e4948ade7..5c641b4afc 100644 --- a/docs/en/sql-reference/sql-functions/bitmap-functions/bitmap_subset_limit.md +++ b/docs/en/sql-reference/sql-functions/bitmap-functions/bitmap_subset_limit.md @@ -33,8 +33,8 @@ under the License. `BITMAP BITMAP_SUBSET_LIMIT(BITMAP src, BIGINT range_start, BIGINT cardinality_limit)` Create subset of the BITMAP, begin with range from range_start, limit by cardinality_limit -range_start:start value for the range -cardinality_limit:subset upper limit +range_start: start value for the range +cardinality_limit: subset upper limit ## example @@ -50,7 +50,7 @@ mysql> select bitmap_to_string(bitmap_subset_limit(bitmap_from_string('1,2,3,4,5 +-------+ | value | +-------+ -| 4,5 | +| 4,5 | +-------+ ``` diff --git a/docs/en/sql-reference/sql-functions/string-functions/bit_length.md b/docs/en/sql-reference/sql-functions/string-functions/bit_length.md index e4a9ca29bf..bb4d8f7436 100644 --- a/docs/en/sql-reference/sql-functions/string-functions/bit_length.md +++ b/docs/en/sql-reference/sql-functions/string-functions/bit_length.md @@ -31,7 +31,7 @@ under the License. `INT bit_length (VARCHAR str)` -Return length of argument in bits。 +Return length of argument in bits. ## example diff --git a/docs/en/sql-reference/sql-functions/window-function.md b/docs/en/sql-reference/sql-functions/window-function.md index 8b9113b439..c2abeb9337 100644 --- a/docs/en/sql-reference/sql-functions/window-function.md +++ b/docs/en/sql-reference/sql-functions/window-function.md @@ -101,7 +101,7 @@ This section introduces the methods that can be used as analysis functions in Do ### AVG() -grammar: +grammar: ```sql AVG([DISTINCT | ALL] *expression*) [OVER (*analytic_clause*)] @@ -136,7 +136,7 @@ from int_t where property in ('odd','even'); ### COUNT() -grammar: +grammar: ```sql COUNT([DISTINCT | ALL] expression) [OVER (analytic_clause)] @@ -173,7 +173,7 @@ from int_t where property in ('odd','even'); The DENSE_RANK() function is used to indicate the ranking. Unlike RANK(), DENSE_RANK() does not have vacant numbers. For example, if there are two parallel ones, the third number of DENSE_RANK() is still 2, and the third number of RANK() is 3. -grammar: +grammar: ```sql DENSE_RANK() OVER(partition_by_clause order_by_clause) @@ -202,7 +202,7 @@ The following example shows the ranking of the x column grouped by the property FIRST_VALUE() returns the first value in the window range. -grammar: +grammar: ```sql FIRST_VALUE(expr) OVER(partition_by_clause order_by_clause [window_clause]) @@ -224,7 +224,7 @@ We have the following data | Mats | Sweden | Tja | ``` -Use FIRST_VALUE() to group by country and return the value of the first greeting in each group: +Use FIRST_VALUE() to group by country and return the value of the first greeting in each group: ```sql select country, name, @@ -244,7 +244,7 @@ over (partition by country order by name, greeting) as greeting from mail_merge; The LAG() method is used to calculate the value of several lines forward from the current line. -grammar: +grammar: ```sql LAG (expr, offset, default) OVER (partition_by_clause order_by_clause) @@ -274,7 +274,7 @@ order by closing_date; LAST_VALUE() returns the last value in the window range. Contrary to FIRST_VALUE(). -grammar: +grammar: ```sql LAST_VALUE(expr) OVER(partition_by_clause order_by_clause [window_clause]) @@ -301,7 +301,7 @@ from mail_merge; The LEAD() method is used to calculate the value of several rows from the current row. -grammar: +grammar: ```sql LEAD (expr, offset, default]) OVER (partition_by_clause order_by_clause) @@ -334,7 +334,7 @@ order by closing_date; ### MAX() -grammar: +grammar: ```sql MAX([DISTINCT | ALL] expression) [OVER (analytic_clause)] @@ -365,7 +365,7 @@ from int_t where property in ('prime','square'); ### MIN() -grammar: +grammar: ```sql MIN([DISTINCT | ALL] expression) [OVER (analytic_clause)] @@ -398,7 +398,7 @@ from int_t where property in ('prime','square'); The RANK() function is used to indicate ranking. Unlike DENSE_RANK(), RANK() will have vacant numbers. For example, if there are two parallel 1s, the third number in RANK() is 3, not 2. -grammar: +grammar: ```sql RANK() OVER(partition_by_clause order_by_clause) @@ -427,7 +427,7 @@ select x, y, rank() over(partition by x order by y) as rank from int_t; For each row of each Partition, an integer that starts from 1 and increases continuously is returned. Unlike RANK() and DENSE_RANK(), the value returned by ROW_NUMBER() will not be repeated or vacant, and is continuously increasing. -grammar: +grammar: ```sql ROW_NUMBER() OVER(partition_by_clause order_by_clause) @@ -452,7 +452,7 @@ select x, y, row_number() over(partition by x order by y) as rank from int_t; ### SUM() -grammar: +grammar: ```sql SUM([DISTINCT | ALL] expression) [OVER (analytic_clause)] diff --git a/docs/en/sql-reference/sql-statements/Account Management/SET PROPERTY.md b/docs/en/sql-reference/sql-statements/Account Management/SET PROPERTY.md index d0ac580b43..3439e9b137 100644 --- a/docs/en/sql-reference/sql-statements/Account Management/SET PROPERTY.md +++ b/docs/en/sql-reference/sql-statements/Account Management/SET PROPERTY.md @@ -40,7 +40,7 @@ key: Super user rights: max_user_connections: Maximum number of connections. max_query_instances: Maximum number of query instance user can use when query. -sql_block_rules: set sql block rules。After setting, if the query user execute match the rules, it will be rejected. +sql_block_rules: set sql block rules.After setting, if the query user execute match the rules, it will be rejected. cpu_resource_limit: limit the cpu resource usage of a query. See session variable `cpu_resource_limit`. exec_mem_limit: Limit the memory usage of the query. See the description of the session variable `exec_mem_limit` for details. -1 means not set. load_mem_limit: Limit memory usage for imports. See the introduction of the session variable `load_mem_limit` for details. -1 means not set. diff --git a/docs/en/sql-reference/sql-statements/Administration/ALTER SYSTEM.md b/docs/en/sql-reference/sql-statements/Administration/ALTER SYSTEM.md index f62ce1ecf5..7c3d4902a4 100644 --- a/docs/en/sql-reference/sql-statements/Administration/ALTER SYSTEM.md +++ b/docs/en/sql-reference/sql-statements/Administration/ALTER SYSTEM.md @@ -78,7 +78,7 @@ under the License. Other properties: Other information necessary to access remote storage, such as authentication information. 7) Modify BE node attributes currently supports the following attributes: - 1. tag.location:Resource tag + 1. tag.location: Resource tag 2. disable_query: Query disabled attribute 3. disable_load: Load disabled attribute diff --git a/docs/en/sql-reference/sql-statements/Data Definition/ALTER TABLE.md b/docs/en/sql-reference/sql-statements/Data Definition/ALTER TABLE.md index ad99e6d5a9..0d4a1f6f04 100644 --- a/docs/en/sql-reference/sql-statements/Data Definition/ALTER TABLE.md +++ b/docs/en/sql-reference/sql-statements/Data Definition/ALTER TABLE.md @@ -199,7 +199,7 @@ under the License. 9. Modify default buckets number of partition grammer: MODIFY DISTRIBUTION DISTRIBUTED BY HASH (k1[,k2 ...]) BUCKETS num - note: + note: 1)Only support non colocate table with RANGE partition and HASH distribution 10. Modify table comment diff --git a/docs/en/sql-reference/sql-statements/Data Definition/CANCEL ALTER.md b/docs/en/sql-reference/sql-statements/Data Definition/CANCEL ALTER.md index b3c1782020..bb9339be38 100644 --- a/docs/en/sql-reference/sql-statements/Data Definition/CANCEL ALTER.md +++ b/docs/en/sql-reference/sql-statements/Data Definition/CANCEL ALTER.md @@ -51,12 +51,12 @@ Grammar: ## example [CANCEL ALTER TABLE COLUMN] -1. 撤销针对 my_table 的 ALTER COLUMN 操作。 +1. Cancel ALTER COLUMN operation for my_table. CANCEL ALTER TABLE COLUMN FROM example_db.my_table; [CANCEL ALTER TABLE ROLLUP] -1. 撤销 my_table 下的 ADD ROLLUP 操作。 +1. Cancel ADD ROLLUP operation for my_table. CANCEL ALTER TABLE ROLLUP FROM example_db.my_table; diff --git a/docs/en/sql-reference/sql-statements/Data Definition/create-function.md b/docs/en/sql-reference/sql-statements/Data Definition/create-function.md index 11f6bf9681..7e29591fe7 100644 --- a/docs/en/sql-reference/sql-statements/Data Definition/create-function.md +++ b/docs/en/sql-reference/sql-statements/Data Definition/create-function.md @@ -79,7 +79,7 @@ CREATE [AGGREGATE] [ALIAS] FUNCTION function_name > "prepare_fn": Function signature of the prepare function for > finding the entry from the dynamic library. This option is optional for > custom functions > > "close_fn": Function signature of the close function for finding > the entry from the dynamic library. This option is optional for custom > functions -> "type": Function type, RPC for remote udf, NATIVE for c++ native udf +> "type": Function type, RPC for remote udf, NATIVE for c++ native udf diff --git a/docs/en/sql-reference/sql-statements/Data Manipulation/BROKER LOAD.md b/docs/en/sql-reference/sql-statements/Data Manipulation/BROKER LOAD.md index 8a311a1df3..312901e324 100644 --- a/docs/en/sql-reference/sql-statements/Data Manipulation/BROKER LOAD.md +++ b/docs/en/sql-reference/sql-statements/Data Manipulation/BROKER LOAD.md @@ -36,7 +36,7 @@ under the License. 2. Baidu AFS: afs for Baidu. Only be used inside Baidu. 3. Baidu Object Storage(BOS): BOS on Baidu Cloud. 4. Apache HDFS. - 5. Amazon S3:Amazon S3。 + 5. Amazon S3: Amazon S3. ### Syntax: @@ -137,14 +137,14 @@ under the License. read_properties: Used to specify some special parameters. - Syntax: + Syntax: [PROPERTIES ("key"="value", ...)] You can specify the following parameters: - line_delimiter: Used to specify the line delimiter in the load file. The default is `\n`. You can use a combination of multiple characters as the column separator. + line_delimiter: Used to specify the line delimiter in the load file. The default is `\n`. You can use a combination of multiple characters as the column separator. - fuzzy_parse: Boolean type, true to indicate that parse json schema as the first line, this can make import more faster,but need all key keep the order of first line, default value is false. Only use for json format. + fuzzy_parse: Boolean type, true to indicate that parse json schema as the first line, this can make import more faster,but need all key keep the order of first line, default value is false. Only use for json format. jsonpaths: There are two ways to import json: simple mode and matched mode. simple mode: it is simple mode without setting the jsonpaths parameter. In this mode, the json data is required to be the object type. For example: @@ -152,7 +152,7 @@ under the License. matched mode: the json data is relatively complex, and the corresponding value needs to be matched through the jsonpaths parameter. - strip_outer_array: Boolean type, true to indicate that json data starts with an array object and flattens objects in the array object, default value is false. For example: + strip_outer_array: Boolean type, true to indicate that json data starts with an array object and flattens objects in the array object, default value is false. For example: [ {"k1" : 1, "v1" : 2}, {"k1" : 3, "v1" : 4} @@ -207,9 +207,9 @@ under the License. dfs.client.failover.proxy.provider: Specify the provider that client connects to namenode by default: org. apache. hadoop. hdfs. server. namenode. ha. Configured Failover ProxyProvider. 4.4. Amazon S3 - fs.s3a.access.key:AmazonS3的access key - fs.s3a.secret.key:AmazonS3的secret key - fs.s3a.endpoint:AmazonS3的endpoint + fs.s3a.access.key: AmazonS3的access key + fs.s3a.secret.key: AmazonS3的secret key + fs.s3a.endpoint: AmazonS3的endpoint 4.5. If using the S3 protocol to directly connect to the remote storage, you need to specify the following attributes ( @@ -230,7 +230,7 @@ under the License. ) fs.defaultFS: defaultFS hdfs_user: hdfs user - namenode HA: + namenode HA: By configuring namenode HA, new namenode can be automatically identified when the namenode is switched dfs.nameservices: hdfs service name, customize, eg: "dfs.nameservices" = "my_ha" dfs.ha.namenodes.xxx: Customize the name of a namenode, separated by commas. XXX is a custom name in dfs. name services, such as "dfs. ha. namenodes. my_ha" = "my_nn" diff --git a/docs/en/sql-reference/sql-statements/Data Manipulation/EXPORT.md b/docs/en/sql-reference/sql-statements/Data Manipulation/EXPORT.md index 7ca3c37153..b1646c7972 100644 --- a/docs/en/sql-reference/sql-statements/Data Manipulation/EXPORT.md +++ b/docs/en/sql-reference/sql-statements/Data Manipulation/EXPORT.md @@ -76,12 +76,12 @@ under the License. 7. hdfs Specify to use libhdfs export to hdfs - Grammar: + Grammar: WITH HDFS ("key"="value"[,...]) The following parameters can be specified: - fs.defaultFS: Set the fs such as:hdfs://ip:port - hdfs_user:Specify hdfs user name + fs.defaultFS: Set the fs such as:hdfs://ip:port + hdfs_user:Specify hdfs user name ## example diff --git a/docs/en/sql-reference/sql-statements/Data Manipulation/LOAD.md b/docs/en/sql-reference/sql-statements/Data Manipulation/LOAD.md index 001706c9ea..114af8331e 100644 --- a/docs/en/sql-reference/sql-statements/Data Manipulation/LOAD.md +++ b/docs/en/sql-reference/sql-statements/Data Manipulation/LOAD.md @@ -162,10 +162,10 @@ Date class (DATE/DATETIME): 2017-10-03, 2017-06-13 12:34:03. NULL value: N 6. S3 Storage - fs.s3a.access.key user AK,required - fs.s3a.secret.key user SK,required - fs.s3a.endpoint user endpoint,required - fs.s3a.impl.disable.cache whether disable cache,default true,optional + fs.s3a.access.key user AK,required + fs.s3a.secret.key user SK,required + fs.s3a.endpoint user endpoint,required + fs.s3a.impl.disable.cache whether disable cache,default true,optional '35;'35; example diff --git a/docs/en/sql-reference/sql-statements/Data Manipulation/OUTFILE.md b/docs/en/sql-reference/sql-statements/Data Manipulation/OUTFILE.md index a8fce9e200..9a97dffbf6 100644 --- a/docs/en/sql-reference/sql-statements/Data Manipulation/OUTFILE.md +++ b/docs/en/sql-reference/sql-statements/Data Manipulation/OUTFILE.md @@ -29,7 +29,7 @@ under the License. The `SELECT INTO OUTFILE` statement can export the query results to a file. Currently supports export to remote storage through Broker process, or directly through S3, HDFS protocol such as HDFS, S3, BOS and COS(Tencent Cloud) through the Broker process. The syntax is as follows: - Grammar: + Grammar: query_stmt INTO OUTFILE "file_path" [format_as] @@ -50,7 +50,7 @@ under the License. 3. properties Specify the relevant attributes. Currently it supports exporting through the Broker process, or through the S3, HDFS protocol. - Grammar: + Grammar: [PROPERTIES ("key"="value", ...)] The following parameters can be specified: column_separator: Specifies the exported column separator, defaulting to t. Supports invisible characters, such as'\x07'. @@ -173,7 +173,7 @@ under the License. "AWS_SECRET_KEY" = "xxx", "AWS_REGION" = "bd" ) - The final generated file prefix is `my_file_{fragment_instance_id}_`。 + The final generated file prefix is `my_file_{fragment_instance_id}_`. 7. Use the s3 protocol to export to bos, and enable concurrent export of session variables. set enable_parallel_outfile = true; diff --git a/docs/en/sql-reference/sql-statements/Data Manipulation/SHOW CREATE ROUTINE LOAD.md b/docs/en/sql-reference/sql-statements/Data Manipulation/SHOW CREATE ROUTINE LOAD.md index 499b82cfe8..80df442139 100644 --- a/docs/en/sql-reference/sql-statements/Data Manipulation/SHOW CREATE ROUTINE LOAD.md +++ b/docs/en/sql-reference/sql-statements/Data Manipulation/SHOW CREATE ROUTINE LOAD.md @@ -30,11 +30,11 @@ under the License. The kafka partition and offset in the result show the currently consumed partition and the corresponding offset to be consumed. - grammar: + grammar: SHOW [ALL] CREATE ROUTINE LOAD for load_name; - Description: - `ALL`: optional,Is for getting all jobs, including history jobs + Description: + `ALL`: optional,Is for getting all jobs, including history jobs `load_name`: routine load name ## example --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org