Jack,

I might be incorrect here, but I'll at least throw out some thoughts. If I
understand correctly, the attacker requires access to modify some
serialized object so that deserialization leads to arbitrary code
execution. I think that the best way to protect against that is to avoid
making it possible for an attacker to modify serialized bytes.

To my knowledge, Java serialization is used in two places: first, to
serialize objects between nodes, like sending a task to a Spark executor,
and second, to serialize some persistent state in Flink. Iceberg does not
use Java serialization for anything in the format or long-term storage. For
the first case, I think that it is up to the distributed system passing
objects between nodes to secure the content, like using TLS for connections
between nodes. Since Java serialization is used by the processing engine,
there isn't much Iceberg could do to change this and we have to rely on
Spark or Flink.

For the second issue, I think our use of Java serialization to store state
is very limited, but we should take a look to make sure. I think this is
one area where Iceberg made the choice to use Java serialization, so we
should look into it and fix it if possible... although I'm not entirely
sure how to avoid swapping out the state that gets loaded.

Ryan

On Sat, Jul 17, 2021 at 2:02 AM Jack Ye <yezhao...@gmail.com> wrote:

> Hi everyone,
>
> We use Java serialization and deserialization a lot in Iceberg. I wonder
> if we have considered the potential of Java deserialization attack, where
> an attacker can replace serialized bytes to execute arbitrary code through
> the readObject method.
>
> Currently our SerializationUtil.deserializeFromBytes directly converts
> bytes to an ObjectInputStream. I know Apache commons have
> ValidatingObjectInputStream which can prevent the issue to some extent.
>
> Have we thought about this issue in the past? Are there any other
> suggestions?
>
> Best,
> Jack Ye
>


-- 
Ryan Blue
Tabular

Reply via email to