Thanks Pifta,

1. Solution where SCM validates containers from DN on ICR will be added
that will resolve both secure and non-secure environment.

2. *Agree* that for secure env, pipeline validation *will not add
much value* (with above point handled) and *impact will be very low, AS:*
- primary write access is already validated using block token having
container and block info
- its very unlikely that client with having valid access will write to
different Datanode maliciously and these impact is controlled within time
limit of 2HB.

Considering this, I think it does not need add extra pipeline authorization
as impact is very low.

Regards
Sumit.


On Wed, Dec 7, 2022 at 4:52 AM István Fajth <fapi...@gmail.com> wrote:

> Hi Sumit,
>
> sorry for getting back somewhat late on this, let me share my opinion here
> as well as I will do in the JIRA ticket shortly.
>
> As we discussed, the problem is that currently a rogue client can write
> blocks to DataNodes that are different from the Pipeline information that
> is provided for the client from Ozone Manager. This is true in secure and
> non-secure environments.
> As Neil mentioned this might compromise a container when SCM checks the
> replicas and figures out which are the over replicated container and if
> there are excess replicas which ones to delete, as if a rogue client writes
> a container to 3 nodes (even via STANDALONE replication type) and properly
> sync these writes bcsid associated with the container might go above the
> one in the good containers, and with that take over the precedence and make
> the old valid data to be removed potentially.
>
> As this can happen in a non secure environment, I strongly believe we
> should not touch the tokens as that does not solves the problem at all, as
> tokens are present only in a secured environment.
>
> I think the solution is within SCM, as if a DN does not have the container
> yet (it does not have a valid replica of the container), then at container
> creation an ICR is being triggered, and while that ICR is processed, that
> container should be marked as an invalid replica and SCM should issue a
> delete container to the DataNode reported the invalid container. (We should
> be able to determine that the container is invalid during ICR processing,
> as SCM should know which container belongs to which Pipeline and if the DN
> is not part of the Pipeline it should not report creation of a container
> with the specific container ID.)
> If possible Ozone Manager also should refuse the write and metadata update,
> based on information provided by SCM (either by caching the in flight write
> Pipelines and then the Pipelines reported by the client at the end of the
> write, or by directly checking the write location with SCM to validate the
> write).
>
> We should not include this information in the tokens I believe, as we don't
> gain anything with that, after implementing proper measures to deal with
> such rouge clients. Here is why: if the SCM instructs the DN within 2
> heartbeats to remove the rogue container, then rogue clients will have 2HB
> of time (1 min by default if no container creation happens in between the 2
> HB, but it happens... so less than 1 min) to occupy space from the cluster
> with garbage data, but in order to do that they need access permission the
> first time, and if they have access permissions, they can write garbage
> anyway to valid locations, so the only thing we need to prevent is messing
> up the container space and the OM metadata, and that is done with the
> proposed check in ICR and with the check at committing the write from the
> client to OM.
>
> Regards,
> Pifta
>
> Sumit Agrawal <sumitagra...@cloudera.com.invalid> ezt írta (időpont: 2022.
> nov. 29., K, 7:20):
>
> > Hi Devs,
> >
> >
> >    1. Related to HDDS-7454 <
> > https://issues.apache.org/jira/browse/HDDS-7454>,
> >    need opinion if this requires handling or not, based on impact and
> >    complexity. Below is given brief and same is present in Jira.
> >    2.
> >
> >
> > Please share opinion ...
> >
> > *For non-secure env* with raw/malicious client, below are cases
> >
> > 1) Writing to new DN will cause addition of container, can cause data
> loss
> > - Raised JIRA: HDDS-7552 <
> https://issues.apache.org/jira/browse/HDDS-7552>
> >
> >     Will avoid writing / delete the container to the DN.
> >
> > 2) Writing new block to DN having container, causes additional blocks and
> > consuming space
> >
> >     Impact: additional space consumption
> >
> >     Note: no way to control in current design as OM and DN do not have
> any
> > sync, may need solution in future including Recon which can have OM, SMC
> > and DN information and mapping.
> >
> > 3) Writing with unknown container to DN causing addition of container -
> > Already handled using HDDS-3241
> > <https://issues.apache.org/jira/browse/HDDS-3241>
> >
> >
> >
> > *For Secure env* as current bug, need opinion if required to be handled
> > based on impact,
> >
> >    1. Authorization of pipeline / DNs: Currently its not present as part
> of
> >    this bug. Its suggested to be add as part of block token.
> >
> >
> >
> > Pros:
> >
> >    - Avoid writing to DN for which its is not intended, and avoid
> malicious
> >    impact of data loss, space consumption as shown above for non-secure
> env
> >    impact.
> >
> > Cons:
> >
> >    - Need have code for adding pipeline in token generation, passing and
> >    validation at DNs
> >    - Code will be complex, EC have different way of sync, inducing
> >    complexity and failure points
> >
> > *Security Impact if this is not handled:*
> >
> >    - SCM need validate new container using ICR which is Async, and need
> >    atleast 2 heart beat to notify DN to avoid writting (30+ seconds).
> >    - During this time, client can add a lot of block data during that
> time
> >    - Exploitation is easy, but client should be authorized to get block
> >    write permission
> >
> >
> >
> > --
> > *Sumit Agrawal* | Senior Staff Engineer
> > cloudera.com <https://www.cloudera.com>
> > [image: Cloudera] <https://www.cloudera.com/>
> > [image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
> > Cloudera on Facebook] <https://www.facebook.com/cloudera> [image:
> Cloudera
> > on LinkedIn] <https://www.linkedin.com/company/cloudera>
> > ------------------------------
> >
>
>
> --
> Pifta
>


-- 
*Sumit Agrawal* | Senior Staff Engineer
cloudera.com <https://www.cloudera.com>
[image: Cloudera] <https://www.cloudera.com/>
[image: Cloudera on Twitter] <https://twitter.com/cloudera> [image:
Cloudera on Facebook] <https://www.facebook.com/cloudera> [image: Cloudera
on LinkedIn] <https://www.linkedin.com/company/cloudera>
------------------------------

Reply via email to