Re: Re: [DISCUSS] Ready to release version 0.14.0

寒江雪 Wed, 24 Feb 2021 01:15:58 -0800

Hi all
I have merged all prs to  branch 0.14.0, it ready to start a vote

寒江雪 <yangz...@gmail.com> 于2021年2月24日周三 下午1:13写道：


> Hi All
> I found a bug that compaction may failed after deletion, and this is fix
> https://github.com/apache/incubator-doris/pull/5413, I will merge this pr
> to version 0.14.0
>
> 寒江雪 <yangz...@gmail.com> 于2021年2月19日周五 上午10:18写道：
>
>> Hi  Mingyu Chen
>> I will merge those prs to this release
>>
>> 陈明雨 <morning...@163.com> 于2021年2月14日周日 下午4:40写道：
>>
>>> I think the following PRs need to be merged into branch-0.14
>>> before releasing:
>>>
>>>
>>> #5388 [Docs] Reorder docs index in sidebar
>>> #5378 [Bug] Fix NPE when replaying modify table property
>>> #5377 [FE] Fix overflow in RuntimeProfile.sortChildren.
>>> #5363 [Doris on ES] Fix bug when ES field value is null
>>> #5365 [Doc]: correct wrong num in create table help doc
>>>
>>>
>>>
>>>
>>> --
>>>
>>> 此致！Best Regards
>>> 陈明雨 Mingyu Chen
>>>
>>> Email:
>>> chenmin...@apache.org
>>>
>>>
>>>
>>>
>>>
>>> 在 2021-02-09 16:11:00，"ling miao" <lingm...@apache.org> 写道：
>>> >Hi Zhengguo,
>>> >
>>> >I have no problem here ~
>>> >
>>> >Ling Miao
>>> >
>>> >寒江雪 <yangz...@gmail.com> 于2021年2月9日周二 上午11:41写道：
>>> >
>>> >> HI Ling
>>> >> I have check the pr list this morning， there is no critical bug in
>>> master
>>> >> ，and I have create branch-0.14 synced with master
>>> >>
>>> >> ling miao <lingm...@apache.org> 于2021年2月9日周二 上午11:23写道：
>>> >>
>>> >> > Hi zhengguo,
>>> >> >
>>> >> > I look forward to the release of the new version.
>>> >> > I think there are some bug fixes in our pr list. Do you want to
>>> check
>>> >> which
>>> >> > ones need to be incorporated during this release?
>>> >> >
>>> >> > Ling Miao
>>> >> >
>>> >> > 寒江雪 <yangz...@gmail.com> 于2021年2月9日周二 上午10:00写道：
>>> >> >
>>> >> > > Hi all:
>>> >> > >      Since the release of 0.13,  Apache Doris (incubating)
>>> contains
>>> >> > around
>>> >> > > 390 new features, bug fixes, performance enhancements,
>>> documentation
>>> >> > > improvements, code refactors from 60+ contributors.
>>> >> > >      Now we are ready to release Apache Doris (incubating)
>>> 0.14.0.  I
>>> >> > will
>>> >> > > be the release manager of this version.  This release is expected
>>> to
>>> >> > > include the following content:
>>> >> > >
>>> >> > > # New Feature
>>> >> > >
>>> >> > > ### Import and delete
>>> >> > >
>>> >> > > Support to delete multiple pieces of data at one time through the
>>> >> import
>>> >> > > method to avoid performance degradation caused by multiple
>>> deletions.
>>> >> For
>>> >> > > tables of the UniqueKey model, support to specify the Sequence
>>> column
>>> >> > when
>>> >> > > importing. Doris will judge the sequence of the data according to
>>> the
>>> >> > value
>>> >> > > of the Sequence column to ensure that the data is imported Time
>>> order
>>> >> > >  [#4310] [#4256]
>>> >> > >
>>> >> > > ### Support database backup
>>> >> > >
>>> >> > > The support in the backup stmt specifies the backup content
>>> (metadata
>>> >> and
>>> >> > > data).
>>> >> > > Support exclude backup and restore some tables in stmt. When
>>> backing up
>>> >> > the
>>> >> > > entire database, you can exclude some very large and unimportant
>>> >> tables.
>>> >> > > Supports backing up and restoring the entire database instead of
>>> >> > declaring
>>> >> > > each table name in the backup and restore statement.
>>> >> > >
>>> >> > >  [#5314]
>>> >> > >
>>> >> > > ### ODBC external table support
>>> >> > >
>>> >> > > Support access to external tables such as MySQL, postgresql,
>>> Oracle,
>>> >> etc.
>>> >> > > through ODBC protocol
>>> >> > >
>>> >> > >  [#4798] [#4438] [#4559] [#4699]
>>> >> > >
>>> >> > > ### Support SQL level and Partition level result Cache
>>> >> > >
>>> >> > > Support for caching query results to improve the efficiency of
>>> repeated
>>> >> > > queries, support SQL-level and Partition-level results Cache
>>> [#4330]
>>> >> > >
>>> >> > > ### Built-in functions
>>> >> > >
>>> >> > > - Support bitmap_xor function [#5098]
>>> >> > > - Add replace() function [#4347]
>>> >> > > - Add the time_round function to support time alignment according
>>> to
>>> >> > > multiple time granularities [#4640]
>>> >> > >
>>> >> > > ### FE interface and HTTP interface
>>> >> > >
>>> >> > > - The new FE UI interface can be enabled by setting the FE
>>> >> configuration
>>> >> > > item enable_http_server_v2 [#4684]
>>> >> > >
>>> >> > > - BE adds an http interface to show the distribution of all
>>> tablets in
>>> >> a
>>> >> > > partition among different disks in a BE [#5096]
>>> >> > > - BE adds an http interface to manually migrate a tablet to other
>>> disks
>>> >> > on
>>> >> > > the same node [#5101]
>>> >> > > - Support to modify the configuration items of FE and BE through
>>> http,
>>> >> > and
>>> >> > > persist these modifications [#4704]
>>> >> > > -
>>> >> > >
>>> >> > > ### Compatibility with MySQL
>>> >> > >
>>> >> > > - Added support for views table in the information_schema database
>>> >> > [#4778]
>>> >> > > - Added table_privileges, schema_privileges and user_privileges
>>> to the
>>> >> > > information_schema library for compatibility with certain MySQL
>>> >> > > applications [#4899]
>>> >> > > - A new statistic table is added to the information_schema
>>> >> meta-database
>>> >> > > for compatibility with some MySQL tools [#4991]
>>> >> > >
>>> >> > > ### Monitoring
>>> >> > >
>>> >> > > - BE added tablet-level monitoring indicators, including scanned
>>> data
>>> >> > > volume and row number, written data volume and row number, to help
>>> >> locate
>>> >> > > hot tablets [#4428]
>>> >> > >
>>> >> > > - BE added metrics to view the usage of various LRU caches [#4688]
>>> >> > >
>>> >> > > ### Table building related
>>> >> > >
>>> >> > > - Added CREATE TABLE LIKE statement to facilitate the creation of
>>> a
>>> >> table
>>> >> > > metadata copy [#4705]
>>> >> > > - Support atomic replacement of two tables through replace
>>> statement
>>> >> > > [#4669]
>>> >> > >
>>> >> > > ### Other
>>> >> > >
>>> >> > > - Support adding Optimizer Hints of type SET_VAR in the Select
>>> >> statement
>>> >> > to
>>> >> > > set session variables [#4504]
>>> >> > >
>>> >> > > - Support to repair damaged tablets by filling in empty tablets
>>> [#4255]
>>> >> > > - Support Bucket Shuffle Join function (when the Join condition
>>> column
>>> >> > is a
>>> >> > > subset of the table bucket column, the right table will be
>>> shuffled to
>>> >> > the
>>> >> > > node where the data in the left table is located, which can
>>> >> significantly
>>> >> > > reduce the network overhead caused by Shuffle Join and improve
>>> query
>>> >> > speed)
>>> >> > > [# 4677]
>>> >> > > - Support batch cancel import tasks through cancel load statement
>>> >> [#4515]
>>> >> > > - Add a Session variable to set whether to allow the partition
>>> column
>>> >> to
>>> >> > be
>>> >> > > NULL [#5013]
>>> >> > > - Support TopN aggregation function [#4803]
>>> >> > > - Support a new data balancing logic based on the number of
>>> partitions
>>> >> > and
>>> >> > > buckets [#5010]
>>> >> > > - Support creating indexes on the value column of unique table
>>> [#5305]
>>> >> > >
>>> >> > > # Enhancement
>>> >> > >
>>> >> > > ### Performance improvement
>>> >> > >
>>> >> > > - Implemented a new compaction selection algorithm, providing
>>> lower
>>> >> write
>>> >> > > amplification and a more reasonable compaction strategy [#4212]
>>> >> > > - Optimize bit operation efficiency in variable length coding
>>> [#4366]
>>> >> > > - Improve the execution efficiency of monery_format function
>>> [#4672]
>>> >> > > - Optimize query execution plan: When the bucket column of the
>>> table
>>> >> is a
>>> >> > > subset of the GroupBy column in SQL, reduce the data shuffle step
>>> >> [#4482]
>>> >> > > - Improve the efficiency of column name search on BE [#4779]
>>> >> > > - Improve the performance of the BE side LRU Cache [#4781]
>>> >> > > - Optimized the tablet selection strategy of Compaction, reducing
>>> the
>>> >> > > number of invalid selections [#4964]
>>> >> > > - Optimized the reading efficiency of Unique Key table [#4958]
>>> >> > > - Optimized the memory usage of LoadJob on the FE side and
>>> reduced the
>>> >> > > memory overhead on the FE side [#4993]
>>> >> > > - Reduce the lock granularity in FE metadata from Database level
>>> to
>>> >> Table
>>> >> > > level to support more fine-grained concurrent access to metadata
>>> >> [#3775]
>>> >> > > - Avoid unnecessary memory copy when creating hash table [#5301]
>>> >> > > - Remove the path check when BE starts to speed up BE startup
>>> speed
>>> >> > [#5268]
>>> >> > > - Optimize the import performance of Json data [#5114]
>>> >> > >
>>> >> > > ### Functional improvements
>>> >> > >
>>> >> > > - SQL supports collate utf8_general_ci syntax to improve MySQL
>>> syntax
>>> >> > > compatibility [#4365]
>>> >> > > - Improve the function of Batch delete, improve and optimize the
>>> >> related
>>> >> > > compaction process [#4425]
>>> >> > > - Enhance the function of parse_url() function, support lowercase,
>>> >> > support
>>> >> > > parsing port [#4429]
>>> >> > > - When SQL execution specifies the execution mode of join (Join
>>> Hint),
>>> >> > the
>>> >> > > Colocation Join function will be disabled by default [#4497]
>>> >> > > - Dynamic partition support hour level [#4514]
>>> >> > > - HTTP interface on BE side supports gzip compression [#4533]
>>> >> > > - Optimized the use of threads on the BE side [#4440]
>>> >> > > - Optimize the checking process and error message of the rand()
>>> >> function
>>> >> > in
>>> >> > > the query analysis stage [#4439]
>>> >> > > - Optimize the compaction triggering and execution logic to better
>>> >> limit
>>> >> > > the resource overhead (mainly memory overhead) of the compaction
>>> >> > operation,
>>> >> > > and trigger the compaction operation more reasonably [#4670]
>>> >> > > - Support pushing Limit conditions to ODBC/MySQL external tables
>>> >> [#4707]
>>> >> > > - Increase the limit on the number of tablet versions on the BE
>>> side to
>>> >> > > prevent excessive data versions from causing abnormal cluster load
>>> >> > [#4687]
>>> >> > > - When an RPC error occurs in a query, it can quickly return
>>> specific
>>> >> > error
>>> >> > > information to prevent the query from being stuck [#4702]
>>> >> > > - Support automatic mapping of count(distinct if(bool, bitmap,
>>> null))
>>> >> to
>>> >> > > bitmap_union_count function [#4201]
>>> >> > > - Support set sql_mode = concat(@@sql_mode, "STRICT_TRANS_TABLES")
>>> >> > > statement [#4359]
>>> >> > > - Support all stream load features in multiload [#4717]
>>> >> > > - Optimize BE’s strategy for selecting disks when creating
>>> tablets, and
>>> >> > use
>>> >> > > the "two random choices" algorithm to ensure tablet copies are
>>> more
>>> >> even
>>> >> > > [#4373]
>>> >> > > - When creating a materialized view, the bitmap_union aggregation
>>> >> method
>>> >> > > only supports integer columns, and hll_union does not support
>>> decimal
>>> >> > > columns [#4432]
>>> >> > > - Optimize the log level of some FEs to avoid log writing
>>> becoming a
>>> >> > > bottleneck [#4766]
>>> >> > > - In the describe table statement, display the definition
>>> expression of
>>> >> > the
>>> >> > > aggregate column of the materialized view [#4446]
>>> >> > > - Support convert() function [#4364]
>>> >> > >     -Support cast (expr as signed/unsigned int) syntax to be
>>> compatible
>>> >> > > with MySQL ecology
>>> >> > >     -Add more columns to the information_schema.columns table to
>>> be
>>> >> > > compatible with the MySQL ecosystem
>>> >> > > - In Spark Load function, use yarn command line instead of
>>> yarn-client
>>> >> > API
>>> >> > > to kill job or get job status [#4383]
>>> >> > > - Persistence of stale rowset meta-information to ensure that this
>>> >> > > information will not be lost after BE restarts [#4454]
>>> >> > > - Return an error code in the schema change result to more clearly
>>> >> inform
>>> >> > > the user of the specific error [#4388]
>>> >> > > - Optimize the rowset selection logic of some compactions to make
>>> the
>>> >> > > selection strategy more accurate [#5152]
>>> >> > > - Optimize the Page Cache on the BE side, divide Page into data
>>> cache
>>> >> and
>>> >> > > index cache [#5008]
>>> >> > > - Optimized the accuracy of functions such as variance and
>>> standard
>>> >> > > deviation on Decimal type [#4959]
>>> >> > > - Optimized the processing logic of predicates pushed down to
>>> ScanNode
>>> >> to
>>> >> > > avoid repeated filtering of predicate conditions at the query
>>> layer and
>>> >> > > improve query efficiency [#4999]
>>> >> > > - Optimized the predicate push-down logic of Unique Key table, and
>>> >> > supports
>>> >> > > push-down the conditions of non-primary key columns [#5022]
>>> >> > > - Support pushing down "not in" and "!=" to the storage layer to
>>> >> improve
>>> >> > > query efficiency [#5207]
>>> >> > > - Support writing multiple memtables of a tablet in parallel
>>> during
>>> >> > import.
>>> >> > > Improve import efficiency [#5163]
>>> >> > > - Optimize the creation logic of ZoneMap. When the number of rows
>>> on a
>>> >> > page
>>> >> > > is too small, ZoneMap will not be created anymore [#5260]
>>> >> > > - Added histogram monitoring indicator class on BE [#5148]
>>> >> > > - When importing Parquet files, if there is a parsing error, the
>>> >> specific
>>> >> > > file name will be displayed in the error message [#4954]
>>> >> > > - Optimize the creation logic of dynamic partitions, the table
>>> under
>>> >> > > construction directly triggers the creation of dynamic partitions
>>> >> [#5209]
>>> >> > > - In the result of the SHOW BACKENDS command, display the real
>>> start
>>> >> time
>>> >> > > of BE [#4872]
>>> >> > > - Support column names start with @ symbol, mainly used to support
>>> >> > mapping
>>> >> > > ES tables [#5006]
>>> >> > > - Optimize the logic of the mapping and conversion relationship
>>> of the
>>> >> > > declared columns in the import statement to make the use more
>>> clear
>>> >> > [#5140]
>>> >> > > - Optimize the execution logic of colocation join to make the
>>> query
>>> >> plan
>>> >> > > more evenly executed on multiple BE nodes [#5104]
>>> >> > > - Optimize the predicate pushdown logic, and support pushdown of
>>> is
>>> >> null
>>> >> > > and is not null to the storage engine [#5092]
>>> >> > > - Optimize the BE node selection logic in bucket join [#5133]
>>> >> > > - Support UDF in import operation [#4863]
>>> >> > >
>>> >> > > ### Other
>>> >> > >
>>> >> > > - Added support for IN Predicate in delete statement [#4404]
>>> >> > > - Update the Dockerfile of the development image and add some new
>>> >> > > dependencies [#4474]
>>> >> > > - Fix various spelling errors in the code and documentation
>>> [#4714]
>>> >> > [#4712]
>>> >> > > [#4722] [#4723] [#4724] [#4725] [#4726] [#4727]
>>> >> > > - Added two segment-related indicators in the OlapScanNode of the
>>> query
>>> >> > > profile to display the total number of segments and the number of
>>> >> > filtered
>>> >> > > segments [#4348]
>>> >> > > - Add batch delete function description document [#4435]
>>> >> > > - Added Spark Load syntax manual [#4463]
>>> >> > > - Added the display of cumulative compaction strategy name and
>>> rowset
>>> >> > data
>>> >> > > size in BE's /api/compaction/show API [#4466]
>>> >> > > - Redirect the Spark Launcher log in Spark Load to a separate log
>>> file
>>> >> > for
>>> >> > > easy viewing [#4470]
>>> >> > > - The BE configuration item streaming_load_max_batch_size_mb was
>>> >> renamed
>>> >> > > streaming_load_json_max_mb to make its meaning more clear [#4791]
>>> >> > > - Adjust the default value of the FE configuration item
>>> >> > > thrift_client_timeout_ms to solve the problem of too long access
>>> to the
>>> >> > > information_schema library [#4808]
>>> >> > > - CPU or memory sampling of BE process is supported on BE web
>>> page to
>>> >> > > facilitate performance debugging [#4632]
>>> >> > > - Extend the data slicing balance class on the FE side, so that
>>> it can
>>> >> > > extend more balance logic [#4771]
>>> >> > > - The reorganized OLAP_SCAN_NODE profile information makes the
>>> profile
>>> >> > > clearer and easier to read [#4825]
>>> >> > > - Added monitoring indicators on the BE side to monitor cancelled
>>> Query
>>> >> > > Fragment [#4862]
>>> >> > > - Reorganized the profile information of HASH_JOIN_NODE,
>>> >> CROSS_JOIN_NODE,
>>> >> > > UNION_NODE, ANALYTIC_EVAL_NODE to make the Profile more clear and
>>> easy
>>> >> to
>>> >> > > read [#4878]
>>> >> > > - Modify the default value of
>>> >> > > query_colocate_join_memory_limit_penalty_factor to 1 to ensure
>>> that the
>>> >> > > default memory limit of the execution plan fragment is consistent
>>> with
>>> >> > the
>>> >> > > user setting during the colocation join operation [#4895]
>>> >> > > - Added consideration of tablet scanning frequency in the
>>> selection of
>>> >> > > compaction strategy on the BE side [#4837]
>>> >> > > - Optimize the strategy of sending Query Fragments and reduce the
>>> >> number
>>> >> > of
>>> >> > > sending public attributes to improve query plan scheduling
>>> performance
>>> >> > > [#4904]
>>> >> > > - Optimized the accuracy of load statistics for unavailable nodes
>>> when
>>> >> > the
>>> >> > > query scheduler is scheduling query plans [#4914]
>>> >> > > - Add the code version information of the FE node in the result
>>> of the
>>> >> > SHOW
>>> >> > > FRONTENDS statement [#4943]
>>> >> > > - Support more column type conversion, such as support conversion
>>> from
>>> >> > CHAR
>>> >> > > to numeric type, etc. [#4938]
>>> >> > > - Import function to identify complex types in Parquet files
>>> [#4968]
>>> >> > > - In the BE monitoring indicators, increase the monitoring of used
>>> >> > permits
>>> >> > > and waiting permits in the compaction logic [#4893]
>>> >> > > - Optimize the execution time of BE single test [#5131]
>>> >> > > - Added more JVM-related monitoring items on the FE side [#5112]
>>> >> > > - Add a session variable to control the timeout period for the
>>> >> > transaction
>>> >> > > to take effect in the insert operation [#5170]
>>> >> > > - Optimize the logic of selecting scan nodes for query execution
>>> plans,
>>> >> > and
>>> >> > > consider all ScanNode nodes in a query [#4984]
>>> >> > > - Add more system monitoring indicators for FE nodes [#5149]
>>> >> > > - Use of VLOG in unified BE code [#5264]
>>> >> > >
>>> >> > > # Other
>>> >> > >
>>> >> > > - Add some non-Apache protocol code protocol declarations to the
>>> NOTICE
>>> >> > > file [#4831]
>>> >> > >
>>> >> > > - Reformatted the code of BE using clang-format [#4965]
>>> >> > >
>>> >> > > - Added clang-format checking and formatting scripts to unify the
>>> C++
>>> >> > code
>>> >> > > style of BE before submission [#4934]
>>> >> > >
>>> >> > > - The third-party library adds the AWS S3 SDK, which can be used
>>> to
>>> >> > > directly read the data in the object storage through the SDK
>>> [#5234]
>>> >> > >
>>> >> > > - Fixed some issues related to License: [#4371]
>>> >> > >
>>> >> > >     1. The dependencies of the two third-party libraries, MySQL
>>> client
>>> >> > and
>>> >> > > LZO, will no longer be enabled in the default compilation
>>> options. If
>>> >> > users
>>> >> > > need MySQL external table function, they need to turn it on
>>> >> > >
>>> >> > >     2. Removed the js and css code in the code library and
>>> introduced
>>> >> it
>>> >> > in
>>> >> > > the form of a third-party library dependency
>>> >> > >
>>> >> > > - Updated the Docker development environment image build-env-1.2
>>> >> > >
>>> >> > > - Updated the compilation method of the UnixODBC tripartite
>>> library, so
>>> >> > > that the BE process no longer depends on the libltdl.so dynamic
>>> library
>>> >> > of
>>> >> > > the system when it is running
>>> >> > >
>>> >> > > - Added third-party UDF to support more efficient set calculation
>>> of
>>> >> > > orthogonal bitmap data [#4198]
>>> >> > >
>>> >> > > - Added UnixODBC third-party library dependency to support ODBC
>>> >> external
>>> >> > > table function [#4377]
>>> >> > >
>>> >> > > # API Change
>>> >> > >
>>> >> > > - Prohibit the creation of segment v1 tables [#4913]
>>> >> > > - Rename the configuration item
>>> `streaming_load_max_batch_size_mb` to
>>> >> > > `streaming_load_json_max_mb` [#4791]
>>> >> > > - Support column reference passing in column definition of load
>>> >> statement
>>> >> > > [#5140]
>>> >> > > - Support creating indexes on the value column of unique table
>>> [#5305]
>>> >> > > - Support atomic replacement of two tables through replace
>>> statement
>>> >> > > [#4669]
>>> >> > > - Support CREATE TABLE LIKE statement
>>> >> > >
>>> >> > >      To get more details please refers to ISSUE
>>> >> > > https://github.com/apache/incubator-doris/issues/5374
>>> >> > >      If you have any important feature that are in progress or not
>>> >> > merged
>>> >> > > into the master and related to version 0.14, please reply to me by
>>> >> email.
>>> >> > >
>>> >> >
>>> >>
>>>
>>

Re: Re: [DISCUSS] Ready to release version 0.14.0

Reply via email to