Re: [DISCUSS] Ready to release version 0.14.0

ling miao Mon, 08 Feb 2021 19:22:55 -0800

Hi zhengguo,

I look forward to the release of the new version.
I think there are some bug fixes in our pr list. Do you want to check which
ones need to be incorporated during this release?


Ling Miao

寒江雪 <yangz...@gmail.com> 于2021年2月9日周二 上午10:00写道：

> Hi all:
>      Since the release of 0.13,  Apache Doris (incubating)  contains around
> 390 new features, bug fixes, performance enhancements, documentation
> improvements, code refactors from 60+ contributors.
>      Now we are ready to release Apache Doris (incubating) 0.14.0.  I will
> be the release manager of this version.  This release is expected to
> include the following content:
>
> # New Feature
>
> ### Import and delete
>
> Support to delete multiple pieces of data at one time through the import
> method to avoid performance degradation caused by multiple deletions. For
> tables of the UniqueKey model, support to specify the Sequence column when
> importing. Doris will judge the sequence of the data according to the value
> of the Sequence column to ensure that the data is imported Time order
>  [#4310] [#4256]
>
> ### Support database backup
>
> The support in the backup stmt specifies the backup content (metadata and
> data).
> Support exclude backup and restore some tables in stmt. When backing up the
> entire database, you can exclude some very large and unimportant tables.
> Supports backing up and restoring the entire database instead of declaring
> each table name in the backup and restore statement.
>
>  [#5314]
>
> ### ODBC external table support
>
> Support access to external tables such as MySQL, postgresql, Oracle, etc.
> through ODBC protocol
>
>  [#4798] [#4438] [#4559] [#4699]
>
> ### Support SQL level and Partition level result Cache
>
> Support for caching query results to improve the efficiency of repeated
> queries, support SQL-level and Partition-level results Cache [#4330]
>
> ### Built-in functions
>
> - Support bitmap_xor function [#5098]
> - Add replace() function [#4347]
> - Add the time_round function to support time alignment according to
> multiple time granularities [#4640]
>
> ### FE interface and HTTP interface
>
> - The new FE UI interface can be enabled by setting the FE configuration
> item enable_http_server_v2 [#4684]
>
> - BE adds an http interface to show the distribution of all tablets in a
> partition among different disks in a BE [#5096]
> - BE adds an http interface to manually migrate a tablet to other disks on
> the same node [#5101]
> - Support to modify the configuration items of FE and BE through http, and
> persist these modifications [#4704]
> -
>
> ### Compatibility with MySQL
>
> - Added support for views table in the information_schema database [#4778]
> - Added table_privileges, schema_privileges and user_privileges to the
> information_schema library for compatibility with certain MySQL
> applications [#4899]
> - A new statistic table is added to the information_schema meta-database
> for compatibility with some MySQL tools [#4991]
>
> ### Monitoring
>
> - BE added tablet-level monitoring indicators, including scanned data
> volume and row number, written data volume and row number, to help locate
> hot tablets [#4428]
>
> - BE added metrics to view the usage of various LRU caches [#4688]
>
> ### Table building related
>
> - Added CREATE TABLE LIKE statement to facilitate the creation of a table
> metadata copy [#4705]
> - Support atomic replacement of two tables through replace statement
> [#4669]
>
> ### Other
>
> - Support adding Optimizer Hints of type SET_VAR in the Select statement to
> set session variables [#4504]
>
> - Support to repair damaged tablets by filling in empty tablets [#4255]
> - Support Bucket Shuffle Join function (when the Join condition column is a
> subset of the table bucket column, the right table will be shuffled to the
> node where the data in the left table is located, which can significantly
> reduce the network overhead caused by Shuffle Join and improve query speed)
> [# 4677]
> - Support batch cancel import tasks through cancel load statement [#4515]
> - Add a Session variable to set whether to allow the partition column to be
> NULL [#5013]
> - Support TopN aggregation function [#4803]
> - Support a new data balancing logic based on the number of partitions and
> buckets [#5010]
> - Support creating indexes on the value column of unique table [#5305]
>
> # Enhancement
>
> ### Performance improvement
>
> - Implemented a new compaction selection algorithm, providing lower write
> amplification and a more reasonable compaction strategy [#4212]
> - Optimize bit operation efficiency in variable length coding [#4366]
> - Improve the execution efficiency of monery_format function [#4672]
> - Optimize query execution plan: When the bucket column of the table is a
> subset of the GroupBy column in SQL, reduce the data shuffle step [#4482]
> - Improve the efficiency of column name search on BE [#4779]
> - Improve the performance of the BE side LRU Cache [#4781]
> - Optimized the tablet selection strategy of Compaction, reducing the
> number of invalid selections [#4964]
> - Optimized the reading efficiency of Unique Key table [#4958]
> - Optimized the memory usage of LoadJob on the FE side and reduced the
> memory overhead on the FE side [#4993]
> - Reduce the lock granularity in FE metadata from Database level to Table
> level to support more fine-grained concurrent access to metadata [#3775]
> - Avoid unnecessary memory copy when creating hash table [#5301]
> - Remove the path check when BE starts to speed up BE startup speed [#5268]
> - Optimize the import performance of Json data [#5114]
>
> ### Functional improvements
>
> - SQL supports collate utf8_general_ci syntax to improve MySQL syntax
> compatibility [#4365]
> - Improve the function of Batch delete, improve and optimize the related
> compaction process [#4425]
> - Enhance the function of parse_url() function, support lowercase, support
> parsing port [#4429]
> - When SQL execution specifies the execution mode of join (Join Hint), the
> Colocation Join function will be disabled by default [#4497]
> - Dynamic partition support hour level [#4514]
> - HTTP interface on BE side supports gzip compression [#4533]
> - Optimized the use of threads on the BE side [#4440]
> - Optimize the checking process and error message of the rand() function in
> the query analysis stage [#4439]
> - Optimize the compaction triggering and execution logic to better limit
> the resource overhead (mainly memory overhead) of the compaction operation,
> and trigger the compaction operation more reasonably [#4670]
> - Support pushing Limit conditions to ODBC/MySQL external tables [#4707]
> - Increase the limit on the number of tablet versions on the BE side to
> prevent excessive data versions from causing abnormal cluster load [#4687]
> - When an RPC error occurs in a query, it can quickly return specific error
> information to prevent the query from being stuck [#4702]
> - Support automatic mapping of count(distinct if(bool, bitmap, null)) to
> bitmap_union_count function [#4201]
> - Support set sql_mode = concat(@@sql_mode, "STRICT_TRANS_TABLES")
> statement [#4359]
> - Support all stream load features in multiload [#4717]
> - Optimize BE’s strategy for selecting disks when creating tablets, and use
> the "two random choices" algorithm to ensure tablet copies are more even
> [#4373]
> - When creating a materialized view, the bitmap_union aggregation method
> only supports integer columns, and hll_union does not support decimal
> columns [#4432]
> - Optimize the log level of some FEs to avoid log writing becoming a
> bottleneck [#4766]
> - In the describe table statement, display the definition expression of the
> aggregate column of the materialized view [#4446]
> - Support convert() function [#4364]
>     -Support cast (expr as signed/unsigned int) syntax to be compatible
> with MySQL ecology
>     -Add more columns to the information_schema.columns table to be
> compatible with the MySQL ecosystem
> - In Spark Load function, use yarn command line instead of yarn-client API
> to kill job or get job status [#4383]
> - Persistence of stale rowset meta-information to ensure that this
> information will not be lost after BE restarts [#4454]
> - Return an error code in the schema change result to more clearly inform
> the user of the specific error [#4388]
> - Optimize the rowset selection logic of some compactions to make the
> selection strategy more accurate [#5152]
> - Optimize the Page Cache on the BE side, divide Page into data cache and
> index cache [#5008]
> - Optimized the accuracy of functions such as variance and standard
> deviation on Decimal type [#4959]
> - Optimized the processing logic of predicates pushed down to ScanNode to
> avoid repeated filtering of predicate conditions at the query layer and
> improve query efficiency [#4999]
> - Optimized the predicate push-down logic of Unique Key table, and supports
> push-down the conditions of non-primary key columns [#5022]
> - Support pushing down "not in" and "!=" to the storage layer to improve
> query efficiency [#5207]
> - Support writing multiple memtables of a tablet in parallel during import.
> Improve import efficiency [#5163]
> - Optimize the creation logic of ZoneMap. When the number of rows on a page
> is too small, ZoneMap will not be created anymore [#5260]
> - Added histogram monitoring indicator class on BE [#5148]
> - When importing Parquet files, if there is a parsing error, the specific
> file name will be displayed in the error message [#4954]
> - Optimize the creation logic of dynamic partitions, the table under
> construction directly triggers the creation of dynamic partitions [#5209]
> - In the result of the SHOW BACKENDS command, display the real start time
> of BE [#4872]
> - Support column names start with @ symbol, mainly used to support mapping
> ES tables [#5006]
> - Optimize the logic of the mapping and conversion relationship of the
> declared columns in the import statement to make the use more clear [#5140]
> - Optimize the execution logic of colocation join to make the query plan
> more evenly executed on multiple BE nodes [#5104]
> - Optimize the predicate pushdown logic, and support pushdown of is null
> and is not null to the storage engine [#5092]
> - Optimize the BE node selection logic in bucket join [#5133]
> - Support UDF in import operation [#4863]
>
> ### Other
>
> - Added support for IN Predicate in delete statement [#4404]
> - Update the Dockerfile of the development image and add some new
> dependencies [#4474]
> - Fix various spelling errors in the code and documentation [#4714] [#4712]
> [#4722] [#4723] [#4724] [#4725] [#4726] [#4727]
> - Added two segment-related indicators in the OlapScanNode of the query
> profile to display the total number of segments and the number of filtered
> segments [#4348]
> - Add batch delete function description document [#4435]
> - Added Spark Load syntax manual [#4463]
> - Added the display of cumulative compaction strategy name and rowset data
> size in BE's /api/compaction/show API [#4466]
> - Redirect the Spark Launcher log in Spark Load to a separate log file for
> easy viewing [#4470]
> - The BE configuration item streaming_load_max_batch_size_mb was renamed
> streaming_load_json_max_mb to make its meaning more clear [#4791]
> - Adjust the default value of the FE configuration item
> thrift_client_timeout_ms to solve the problem of too long access to the
> information_schema library [#4808]
> - CPU or memory sampling of BE process is supported on BE web page to
> facilitate performance debugging [#4632]
> - Extend the data slicing balance class on the FE side, so that it can
> extend more balance logic [#4771]
> - The reorganized OLAP_SCAN_NODE profile information makes the profile
> clearer and easier to read [#4825]
> - Added monitoring indicators on the BE side to monitor cancelled Query
> Fragment [#4862]
> - Reorganized the profile information of HASH_JOIN_NODE, CROSS_JOIN_NODE,
> UNION_NODE, ANALYTIC_EVAL_NODE to make the Profile more clear and easy to
> read [#4878]
> - Modify the default value of
> query_colocate_join_memory_limit_penalty_factor to 1 to ensure that the
> default memory limit of the execution plan fragment is consistent with the
> user setting during the colocation join operation [#4895]
> - Added consideration of tablet scanning frequency in the selection of
> compaction strategy on the BE side [#4837]
> - Optimize the strategy of sending Query Fragments and reduce the number of
> sending public attributes to improve query plan scheduling performance
> [#4904]
> - Optimized the accuracy of load statistics for unavailable nodes when the
> query scheduler is scheduling query plans [#4914]
> - Add the code version information of the FE node in the result of the SHOW
> FRONTENDS statement [#4943]
> - Support more column type conversion, such as support conversion from CHAR
> to numeric type, etc. [#4938]
> - Import function to identify complex types in Parquet files [#4968]
> - In the BE monitoring indicators, increase the monitoring of used permits
> and waiting permits in the compaction logic [#4893]
> - Optimize the execution time of BE single test [#5131]
> - Added more JVM-related monitoring items on the FE side [#5112]
> - Add a session variable to control the timeout period for the transaction
> to take effect in the insert operation [#5170]
> - Optimize the logic of selecting scan nodes for query execution plans, and
> consider all ScanNode nodes in a query [#4984]
> - Add more system monitoring indicators for FE nodes [#5149]
> - Use of VLOG in unified BE code [#5264]
>
> # Other
>
> - Add some non-Apache protocol code protocol declarations to the NOTICE
> file [#4831]
>
> - Reformatted the code of BE using clang-format [#4965]
>
> - Added clang-format checking and formatting scripts to unify the C++ code
> style of BE before submission [#4934]
>
> - The third-party library adds the AWS S3 SDK, which can be used to
> directly read the data in the object storage through the SDK [#5234]
>
> - Fixed some issues related to License: [#4371]
>
>     1. The dependencies of the two third-party libraries, MySQL client and
> LZO, will no longer be enabled in the default compilation options. If users
> need MySQL external table function, they need to turn it on
>
>     2. Removed the js and css code in the code library and introduced it in
> the form of a third-party library dependency
>
> - Updated the Docker development environment image build-env-1.2
>
> - Updated the compilation method of the UnixODBC tripartite library, so
> that the BE process no longer depends on the libltdl.so dynamic library of
> the system when it is running
>
> - Added third-party UDF to support more efficient set calculation of
> orthogonal bitmap data [#4198]
>
> - Added UnixODBC third-party library dependency to support ODBC external
> table function [#4377]
>
> # API Change
>
> - Prohibit the creation of segment v1 tables [#4913]
> - Rename the configuration item `streaming_load_max_batch_size_mb` to
> `streaming_load_json_max_mb` [#4791]
> - Support column reference passing in column definition of load statement
> [#5140]
> - Support creating indexes on the value column of unique table [#5305]
> - Support atomic replacement of two tables through replace statement
> [#4669]
> - Support CREATE TABLE LIKE statement
>
>      To get more details please refers to ISSUE
> https://github.com/apache/incubator-doris/issues/5374
>      If you have any important feature that are in progress or not  merged
> into the master and related to version 0.14, please reply to me by email.
>

Re: [DISCUSS] Ready to release version 0.14.0

Reply via email to