Dear Bookkeeper devs, I was wondering if anyone would be willing to review Apache Heron's use of Bookkeeper as a storage mechanism. The choice to use Bookkeeper predated my joining the team, and I worry that we are not properly using it.
Apache Heron is a streaming analytic framework which has primarily two uses of BK in Heron. 1. Uploader/Downloader. When analytics are submitted to a Heron cluster, the binary artifacts are uploaded to Bookkeeper. Then when the analytic Statefulset is created, each pod downloads the binary artifact from Bookkeeper. a. Uploader: https://github.com/apache/incubator-heron/tree/master/heron/uploaders/src/java/org/apache/heron/uploader/dlog b. Downloader: https://github.com/apache/incubator-heron/blob/master/heron/downloaders/src/java/org/apache/heron/downloader/DLDownloader.java 2. Stateful Storage: BK is used for storing checkpoint data which can be retrieved for recovery. https://github.com/apache/incubator-heron/blob/master/heron/statefulstorages/src/java/org/apache/heron/statefulstorage/dlog/DlogStorage.java In addition to the code in which various size and time based rolling is disabled. We have a few interesting config items that were added to address a bug in which Bookkeeper was filling up. But I suspect these settings are incorrect. https://github.com/apache/incubator-heron/blob/ebd7ceaeb7cb4aeddf21e8a51a233d53e2afca0d/deploy/kubernetes/helm/values.yaml.template#L93-L95 Any assistance in reviewing our use of the distributedlog API would be greatly appreciated. Nick