Re: Understanding Iceberg's dependency configuration

Ryan Blue Thu, 10 Oct 2019 14:29:26 -0700

Hi, thanks for reporting this.

We updated Jackson to 2.9.10 earlier today because of a vulnerability in
Jackson. See
https://github.com/apache/incubator-iceberg/commit/cbefc10b5b3fd4f8e4f92251b439eab3b64ea14d


How are you using Iceberg with Spark 2.4? I thought that the change should
be safe because the recommended way to integrate with Spark is to use the
iceberg-spark-runtime Jar that shades and relocates Jackson. That way, it
doesn't conflict with versions used by Spark. Are you using the
iceberg-spark-runtime build?

rb

On Thu, Oct 10, 2019 at 2:24 PM suds <sudssf2...@gmail.com> wrote:

> I am also seeing issues when using master branch with spark v 2.4.0+
> Caused by: com.fasterxml.jackson.databind.JsonMappingException: Scala
> module 2.8.8 requires Jackson Databind version >= 2.8.0 and < 2.9.0
> at
> com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:66)
> at
> com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:18)
> at
> com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:730)
> at
> org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
> at
> org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
> ... 24 more
>
> or
> Caused by: com.fasterxml.jackson.databind.JsonMappingException:
> Incompatible Jackson version: 2.7.9
> at
> com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
> at
> com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
> at
> com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:730)
> at
> org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
> at
> org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
> ... 24 more
>
> from another thread there was discussion about shaded jar
>
> Iceberg should provided shaded Jars to make it easy to get started with
> Spark. We also want to shade Parquet, Avro, and others to ensure that
> Iceberg's dependencies can be updated without conflicting with what Spark
> uses. Libraries like slf4j-api should be fine to exclude because they
> change rarely, though.
>
> What are best practices when deploying iceberg in EMR? do we need to
> create shaded jar with all avro, parquet, jackson dependencies?
>
>
>
>
>
> On Sat, Jun 22, 2019 at 10:23 PM RD <rdsr...@gmail.com> wrote:
>
>> Hi Iceberg devs,
>>
>> I see that guava and slf4j-api are compileOnly dependencies. This implies
>> that they are not required at runtime and will not be resolved when
>> resolving Iceberg artifacts. So it might very well be the case that, say
>> for example, for iceberg-spark, the guava dependency that could be used
>> would be coming from Spark itself which could very well be different from
>> what we intended.
>>
>> I think these should be changed to compile as these are required
>> dependencies, thoughts?
>>
>> Today, iceberg-runtime and iceberg-presto-runtime artifacts will not
>> include these dependencies as they are declared as compileonly and we have
>> configured shadow tasks to pick dependencies from "shadow" configuration.
>>
>> I think these slf4j and guava should be part of these iceberg runtime
>> artifacts, no?
>>
>> Also, iceberg-[presto]-runtime reconfigure/recreates "shadow"
>> configuration
>> https://imperceptiblethoughts.com/shadow/configuration/#configuring-the-runtime-classpath.
>> This configuration is reserved by Shadow task to add transitive
>> dependencies which are not to be bundled in the fat jar.
>>
>> I think that we should not recreate "shadow" configuration and use
>> standard runtime/compile configuration for shadow task to use.
>>
>> My last comment is what is the expected/recommended way to use Iceberg
>> artifacts in a runtime say Spark. Should thin jars with transitive
>> dependencies be used or an Iceberg runtime with shaded dependencies [most
>> common dependencies which could conflict e.g guava, avro] be used?
>>
>> -R
>>
>>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Understanding Iceberg's dependency configuration

Reply via email to