Thanks Pifta, 1. Solution where SCM validates containers from DN on ICR will be added that will resolve both secure and non-secure environment.
2. *Agree* that for secure env, pipeline validation *will not add much value* (with above point handled) and *impact will be very low, AS:* - primary write access is already validated using block token having container and block info - its very unlikely that client with having valid access will write to different Datanode maliciously and these impact is controlled within time limit of 2HB. Considering this, I think it does not need add extra pipeline authorization as impact is very low. Regards Sumit. On Wed, Dec 7, 2022 at 4:52 AM István Fajth <fapi...@gmail.com> wrote: > Hi Sumit, > > sorry for getting back somewhat late on this, let me share my opinion here > as well as I will do in the JIRA ticket shortly. > > As we discussed, the problem is that currently a rogue client can write > blocks to DataNodes that are different from the Pipeline information that > is provided for the client from Ozone Manager. This is true in secure and > non-secure environments. > As Neil mentioned this might compromise a container when SCM checks the > replicas and figures out which are the over replicated container and if > there are excess replicas which ones to delete, as if a rogue client writes > a container to 3 nodes (even via STANDALONE replication type) and properly > sync these writes bcsid associated with the container might go above the > one in the good containers, and with that take over the precedence and make > the old valid data to be removed potentially. > > As this can happen in a non secure environment, I strongly believe we > should not touch the tokens as that does not solves the problem at all, as > tokens are present only in a secured environment. > > I think the solution is within SCM, as if a DN does not have the container > yet (it does not have a valid replica of the container), then at container > creation an ICR is being triggered, and while that ICR is processed, that > container should be marked as an invalid replica and SCM should issue a > delete container to the DataNode reported the invalid container. (We should > be able to determine that the container is invalid during ICR processing, > as SCM should know which container belongs to which Pipeline and if the DN > is not part of the Pipeline it should not report creation of a container > with the specific container ID.) > If possible Ozone Manager also should refuse the write and metadata update, > based on information provided by SCM (either by caching the in flight write > Pipelines and then the Pipelines reported by the client at the end of the > write, or by directly checking the write location with SCM to validate the > write). > > We should not include this information in the tokens I believe, as we don't > gain anything with that, after implementing proper measures to deal with > such rouge clients. Here is why: if the SCM instructs the DN within 2 > heartbeats to remove the rogue container, then rogue clients will have 2HB > of time (1 min by default if no container creation happens in between the 2 > HB, but it happens... so less than 1 min) to occupy space from the cluster > with garbage data, but in order to do that they need access permission the > first time, and if they have access permissions, they can write garbage > anyway to valid locations, so the only thing we need to prevent is messing > up the container space and the OM metadata, and that is done with the > proposed check in ICR and with the check at committing the write from the > client to OM. > > Regards, > Pifta > > Sumit Agrawal <sumitagra...@cloudera.com.invalid> ezt írta (időpont: 2022. > nov. 29., K, 7:20): > > > Hi Devs, > > > > > > 1. Related to HDDS-7454 < > > https://issues.apache.org/jira/browse/HDDS-7454>, > > need opinion if this requires handling or not, based on impact and > > complexity. Below is given brief and same is present in Jira. > > 2. > > > > > > Please share opinion ... > > > > *For non-secure env* with raw/malicious client, below are cases > > > > 1) Writing to new DN will cause addition of container, can cause data > loss > > - Raised JIRA: HDDS-7552 < > https://issues.apache.org/jira/browse/HDDS-7552> > > > > Will avoid writing / delete the container to the DN. > > > > 2) Writing new block to DN having container, causes additional blocks and > > consuming space > > > > Impact: additional space consumption > > > > Note: no way to control in current design as OM and DN do not have > any > > sync, may need solution in future including Recon which can have OM, SMC > > and DN information and mapping. > > > > 3) Writing with unknown container to DN causing addition of container - > > Already handled using HDDS-3241 > > <https://issues.apache.org/jira/browse/HDDS-3241> > > > > > > > > *For Secure env* as current bug, need opinion if required to be handled > > based on impact, > > > > 1. Authorization of pipeline / DNs: Currently its not present as part > of > > this bug. Its suggested to be add as part of block token. > > > > > > > > Pros: > > > > - Avoid writing to DN for which its is not intended, and avoid > malicious > > impact of data loss, space consumption as shown above for non-secure > env > > impact. > > > > Cons: > > > > - Need have code for adding pipeline in token generation, passing and > > validation at DNs > > - Code will be complex, EC have different way of sync, inducing > > complexity and failure points > > > > *Security Impact if this is not handled:* > > > > - SCM need validate new container using ICR which is Async, and need > > atleast 2 heart beat to notify DN to avoid writting (30+ seconds). > > - During this time, client can add a lot of block data during that > time > > - Exploitation is easy, but client should be authorized to get block > > write permission > > > > > > > > -- > > *Sumit Agrawal* | Senior Staff Engineer > > cloudera.com <https://www.cloudera.com> > > [image: Cloudera] <https://www.cloudera.com/> > > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: > > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: > Cloudera > > on LinkedIn] <https://www.linkedin.com/company/cloudera> > > ------------------------------ > > > > > -- > Pifta > -- *Sumit Agrawal* | Senior Staff Engineer cloudera.com <https://www.cloudera.com> [image: Cloudera] <https://www.cloudera.com/> [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image: Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera on LinkedIn] <https://www.linkedin.com/company/cloudera> ------------------------------