nsivabalan commented on code in PR #11514: URL: https://github.com/apache/hudi/pull/11514#discussion_r1687094555
########## rfc/rfc-78/rfc-78.md: ########## @@ -0,0 +1,339 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +# RFC-76: [Bridge release for 1.x] + +## Proposers + +- @nsivabalan +- @vbalaji + +## Approvers + - @yihua + - @codope + +## Status + +JIRA: https://issues.apache.org/jira/browse/HUDI-7882 + +> Please keep the status updated in `rfc/README.md`. + +## Abstract + +[Hudi 1.x](https://github.com/apache/hudi/blob/ae1ee05ab8c2bd732e57bee11c8748926b05ec4b/rfc/rfc-69/rfc-69.md) is a powerful +re-imagination of the transactional database layer in Hudi to power continued innovation across the community in the coming +years. It introduces lot of differentiating features for Apache Hudi. Feel free to checkout the +[release page](https://hudi.apache.org/releases/release-1.0.0-beta1) for more info. We had beta1 and beta2 releases which was meant for +interested developers/users to give a spin on some of the advanced features. But as we are working towards 1.0 GA, we are proposing +a bridge release (0.16.0) for smoother migration for existing hudi users. + +## Objectives +Goal is to have a smooth migration experience for the users from 0.x to 1.0. We plan to have a 0.16.0 bridge release asking everyone to first migrate to 0.16.0 before they can upgrade to 1.x. + +A typical organization might have a medallion architecture deployed to run 1000s of Hudi pipelines i.e. bronze, silver and gold layer. +For this layout of pipelines, here is how a typical migration might look like(w/o a bridge release) + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate gold pipelines to 1.x. +- We need to strictly migrate only gold to 1x. Bcoz, a 0.15.0 reader may not be able to read 1.x hudi tables. So, if we migrate any of silver pipelines to 1.x before migrating entire gold layer, we might end up in a situation, +where a 0.15.0 reader (gold) might end up reading 1.x table (silver). This might lead to failures. So, we have to follow certain order in which we migrate pipelines. +c. Once all of gold is migrated to 1.x, we can move all of silver to 1.x. +d. Once all of gold and silver pipelines are migrated to 1.x, finally we can move all of bronze to 1.x. + +In the end, we would have migrated all of existing hudi pipelines from 0.15.0 to 1.x. +But as you could see, we need some coordination with which we need to migrate. And in a very large organization, sometimes we may not have good control over downstream consumers. +Hence, coordinating entire migration workflow and orchestrating the same might be challenging. + +Hence to ease the migration workflow for 1.x, we are introducing 0.16.0 as a bridge release. + +Here are the objectives with this bridge release: + +- 1.x reader should be able to read 0.14.x to 0.16.x tables w/o any loss in functionality and no data inconsistencies. +- 0.16.x should have read capability for 1.x tables w/ some limitations. For features ported over from 0.x, no loss in functionality should be guaranteed. +But for new features that was introduced in 1.x, we may not be able to support all of them. Will be calling out which new features may not work with 0.16.x reader. +- In this case, we explicitly request users to not turn on these features untill all readers are completely migrated to 1.x so as to not break any readers as applicable. + +Connecting back to our example above, lets see how the migration might look like for an existing user. + +a. Existing pipelines are in 0.15.x. (bronze, silver, gold) +b. Migrate pipelines to 0.16.0 (in any order. we do not have any constraints around which pipeline should be migrated first). +c. Ensure all pipelines are in 0.16.0 (both readers and writers) +d. Start migrating pipelines in a rolling fashion to 1.x. At this juncture, we could have few pipelines in 1.x and few pipelines in 0.16.0. but since 0.16.x +can read 1.x tables, we should be ok here. Just that do not enable new features like Non blocking concurrency control yet. +e. Migrate all of 0.16.0 to 1.x version. +f. Once all readers and writers are in 1.x, we are good to enable any new features (like NBCC) with 1.x tables. + +As you could see, company/org wide coordination to migrate gold before migrating silver or bronze is relaxed with the bridge release. Only requirement to keep a tab on, +is to ensure to migrate all pipelines completely to 0.16.x before starting to migrate to 1.x. + +So, here are the objectives of this RFC with the bridge release. Review Comment: Above I have given justification for need for a bridge release. here, I am listing what are the deliverables from this RFC. not sure if we can combine them. I have taken a stab -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
