sijie commented on a change in pull request #402: Issue 356: Release notes 4.5.0 URL: https://github.com/apache/bookkeeper/pull/402#discussion_r131840106
########## File path: site/docs/latest/releaseNotes.md ########## @@ -0,0 +1,483 @@ +--- +title: Apache BookKeeper 4.5.0 Release Notes +layout: community +--- + +This is the fifth release of BookKeeper as an Apache Top Level Project! + +The 4.5.0 release incorporates hundreds of new fixes, improvements, and features since previous major release, 4.4.0, +which was released over a year ago. It is a big milestone in Apache BookKeeper community, converging from three +main branches (Salesforce, Twitter and Yahoo). + +Apache BookKeeper users are encouraged to upgrade to `4.5.0`. The technical details of this release are summarized +below. + +## Highlights + +The main features in 4.5.0 cover are around four areas: + +- Security +- Public API +- Performance +- Operations + +### Security + +Prior to this release, Apache BookKeeper only supports simple `DIGEST-MD5` type authentication. + +With this release of Apache BookKeeper, a number of feature are introduced that can be used, together of separately, +to secure a BookKeeper cluster. + +The following security features are currently supported. + +- Authentication of connections to bookies from clients, using either `TLS` or `SASL (Kerberos). +- Authentication of connections from clients, bookies, autorecovery daemons to `ZooKeeper`, when using zookeeper + based ledger managers. +- Encryption of data transferred between bookies and clients, between bookies and autorecovery daemons using `TLS`. + +It's worth noting that those security features are optional - non-secured clusters are supported, as well as a mix +of authenticated, unauthenticated, encrypted and non-encrypted clients. + +For more details, have a look at [BookKeeper Security](../security). + +### Public API + +There are multiple new client features introduced in 4.5.0. + +#### LedgerHandleAdv + +The [Ledger API] is the low level API provides by BookKeeper for interacting with `ledgers` in a bookkeeper cluster. +It is simple but not flexible on ledger id or entry id generation. Apache BookKeeper introduces `LedgerHandleAdv` +as an extension of existing `LedgerHandle` for advanced usage. The new `LedgerHandleAdv` allows applications providing +its own `ledger-id` and assigning `entry-id` on adding entries. + +See [Ledger Advanced API](../api/ledger-adv-api) for more details. + +#### Long Poll + +`Long Poll` is a main feature that [DistributedLog](https://distributedlog.io) uses to achieve low-latency tailing. +This big feature has been merged back in 4.5.0 and available to BookKeeper users. It allows tailing-reads without +polling `LastAddConfirmed` everytime after the readers exhaust known entries. + +Although `Long Poll` brings great latency improvements on tailing reads, it is still a very low-level primitive. +It is still recommended to use high level API (e.g. [DistributedLog API](../api/distributedlog-api)) for tailing and streaming use cases. + +See [Streaming Reads](https://distributedlog.incubator.apache.org/docs/latest/user_guide/design/main.html#streaming-reads) for more details. + +#### Explicit LAC + +Prior to 4.5.0, the `LAC` is only advanced when subsequent entries are added. If there is no subsequent entries added, +the last entry written will not be visible to readers until the ledger is closed. High-level client (e.g. DistributedLog) or applications +has to work around this by writing some sort of `control records` to advance `LAC`. + +In 4.5.0, a new `explicit lac` feature is introduced to periodically advance `LAC` if there are not subsequent entries added. This feature +can be enabled by setting `explicitLacInterval` to a positive value. + +### Performance + +There are a lot for performance related bug fixes and improvements in 4.5.0. The major performance improvement introduced in 4.5.0, is +upgrading netty from 3.x to [4.x](http://netty.io/wiki/new-and-noteworthy-in-4.0.html). + +For more details, please read [upgrade guide](../upgrade) about the netty related tips when upgrading bookkeeper from 4.4.0 to 4.5.0. + +Besides netty 4 upgrade, there are other performance related changes highlighted as below: + +#### Delay Ensemble Change + +`Ensemble Change` is a feature that Apache BookKeeper uses to achieve high availability. However it is an expensive metadata operation. +Especially when Apache BookKeeper is deployed in a multiple data-centers environment, losing a data center will cause churn of metadata +operations due to ensemble changes. `Delay Ensemble Change` is introduced in 4.5.0 to overcome this problem. Enabling this feature means +an `Ensemble Change` will only occur when clients can't receive enough valid responses to satisfy `ack-quorum` constraint. This feature +improves the tail latency. + +To enable this feature, please set `delayEnsembleChange` to `true` on your clients. + +#### Parallel Ledger Recovery + +BookKeeper clients recovers entries one-by-one during ledger recovery. If a ledger has very large volumn of traffic, it will have +large number of entries to recover when client failures occur. BookKeeper introduces `parallel ledger recovery` in 4.5.0 to allow +batch recovery to improve ledger recovery speed. + +To enable this feature, please set `enableParallelRecoveryRead` to `true` on your clients. You can also set `recoveryReadBatchSize` +to control the batch size of recovery read. + +#### multiple journals + +Prior to 4.5.0, bookies are only allowed to configure one journal device. If you want to have high write bandwidth, you can raid multiple +disks into one device and mount that device for jouranl directory. However because there is only one journal thread, this approach doesn't +actually improve the write bandwidth. + +BookKeeper introduces multiple journal directories support in 4.5.0. Users can configure multiple devices for journal directories. + +To enable this feature, please use `journalDirectories` rather than `journalDirectory`. + +### Operations + +#### LongHierarchicalLedgerManager + +Apache BookKeeper supports pluggable metadata store. By default, it uses Apache ZooKeeper as its metadata store. Among the zookeeper-based +ledger manager implementations, `HierarchicalLedgerManager` is the most popular and widely adopted ledger manager. However it has a major +limitation, which it assumes `ledger-id` is a 32-bits integer. It limits the number of ledgers to `2^32`. + +`LongHierarchicalLedgerManager` is introduced to overcome this limitation. + +See [Ledger Manager](../develop/ledger-manager) for more details and learn how to migrate `HierarchicalLedgerManager` to `LongHierarchicalLedgerManager`. + +#### Weight-based placement policy + +`Rack-Aware` and `Region-Aware` placement polices are the two available placement policies in BookKeeper client. It places ensembles based +on users' configured network topology. However they both assume that all nodes are equal. `weight-based` placement is introduced in 4.5.0 to +improve the existing placement polices. `weight-based` placement was not built as separated polices. It is built in the existing placement policies. +If you are using `Rack-Aware` or `Region-Aware`, you can simply enable `weight-based` placement by setting `diskWeightBasedPlacementEnabled` to `true`. + +#### Customized Ledger Metadata + +A `Map<String, byte[]>` is introduced in ledger metadata in 4.5.0. Clients now are allowed to pass in a key/value map when creating ledgers. +This customized ledger metadata can be later on used by user defined placement policy. This extends the flexibility of bookkeeper API. + +#### Add Prometheus stats provider + +A new [Prometheus](https://prometheus.io/) [stats provider](https://github.com/apache/bookkeeper/tree/master/bookkeeper-stats-providers/prometheus-metrics-provider) +is introduce in 4.5.0. It simplies the metric collection when running bookkeeper on [kubernetes](https://kubernetes.io/). Review comment: I don't think test improvements need to be in `highlights`. I will include the tool improvements. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services