Re: [VOTE] Release Apache Iceberg 0.11.0 RC0

Ryan Murray Mon, 25 Jan 2021 01:24:53 -0800

I have moved back to +1 (non-binding)

As you said Ryan, the error message is bad and hides the real error. While
I was testing misconfigured catalogs I kept getting the error 'Cannot
initialize Catalog, missing no-arg constructor:
org.apache.iceberg.hive.HiveCatalog'  when the real error is (in this case)
a misconfigured Hive. I have raised #2145
<https://github.com/apache/iceberg/issues/2145> to address this as it isn't
critical to the release.


Best,
Ryan

On Mon, Jan 25, 2021 at 12:05 AM Ryan Blue <rb...@netflix.com.invalid>
wrote:

> +1 (binding)
>
> Downloaded, validated checksum and signature, ran RAT checks, built
> binaries and tested.
>
> Also checked Spark 2, Spark 3, and Hive 2:
>
>    - Created a new table in Spark 3.1.1 release candidate without the
>    USING clause
>    - Created a table in Spark 3.0.1 with CTAS and a USING clause
>    - Created a new database in Spark 3.0.1 and validated the warehouse
>    location for new tables
>    - Used Spark 3 extensions in 3.0.1 to add bucketing to a table
>    - Deleted data from a table in Spark 3.0.1
>    - Ran merge statements in Spark 3.0.1 and validated join type
>    optimizations
>    - Used multi-catalog support in Spark 2.4.5 to read from testhive and
>    prodhive catalogs using the same config as Spark 3
>    - Tested multi-catalog metadata tables in Spark 2.4.5
>    - Tested input_file_name() in Spark 2.4.5
>    - Read from a Hive catalog table in Hive 2
>
> Here’s my command to start Spark 3:
>
> /home/blue/Apps/spark-3.0.1-bin-hadoop2.7/bin/spark-shell \
>     --driver-java-options 
> -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 \
>     --conf 
> spark.jars.repositories=https://repository.apache.org/content/repositories/orgapacheiceberg-1015/
>  \
>     --packages org.apache.iceberg:iceberg-spark3-runtime:0.11.0 \
>     --conf 
> spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
>  \
>     --conf spark.hadoop.hive.metastore.uris=thrift://localhost:32917 \
>     --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
>     --conf spark.sql.catalog.local.type=hadoop \
>     --conf spark.sql.catalog.local.warehouse=/home/blue/tmp/hadoop-warehouse \
>     --conf spark.sql.catalog.prodhive=org.apache.iceberg.spark.SparkCatalog \
>     --conf spark.sql.catalog.prodhive.type=hive \
>     --conf spark.sql.catalog.prodhive.warehouse=/home/blue/tmp/prod-warehouse 
> \
>     --conf spark.sql.catalog.prodhive.default-namespace=default \
>     --conf spark.sql.catalog.testhive=org.apache.iceberg.spark.SparkCatalog \
>     --conf spark.sql.catalog.testhive.type=hive \
>     --conf spark.sql.catalog.testhive.uri=thrift://localhost:34847 \
>     --conf spark.sql.catalog.testhive.warehouse=/home/blue/tmp/test-warehouse 
> \
>     --conf spark.sql.catalog.testhive.default-namespace=default \
>     --conf spark.sql.defaultCatalog=prodhive
>
> And here’s a script to start Hive:
>
> /home/blue/Apps/apache-hive-2.3.7-bin/bin/hive --hiveconf 
> hive.metastore.uris=thrift://localhost:32917
> hive> SET iceberg.mr.catalog=hive;
> hive> ADD JAR /home/blue/Downloads/iceberg-hive-runtime-0.11.0.jar;
>
> The only issue I found is that the Spark 3.1.1 release candidate can’t use
> the extensions module because an internal variable substitution class
> changed in 3.1.x. I don’t think that should fail this release, we can do
> more thorough testing with 3.1.1 once it is released and fix problems in a
> point release.
>
> On Fri, Jan 22, 2021 at 3:26 PM Jack Ye <yezhao...@gmail.com> wrote:
>
>> Hi everyone,
>>
>> I propose the following RC to be released as the official Apache Iceberg
>> 0.11.0 release. The RC is also reviewed and signed by Ryan Blue.
>>
>> The commit id is ad78cc6cf259b7a0c66ab5de6675cc005febd939
>>
>> This corresponds to the tag: apache-iceberg-0.11.0-rc0
>> * https://github.com/apache/iceberg/commits/apache-iceberg-0.11.0-rc0
>> * https://github.com/apache/iceberg/tree/apache-iceberg-0.11.0-rc0
>>
>> The release tarball, signature, and checksums are here:
>> *
>> https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-0.11.0-rc0
>>
>> You can find the KEYS file here:
>> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>>
>> Convenience binary artifacts are staged in Nexus. The Maven repository
>> URL is:
>> *
>> https://repository.apache.org/content/repositories/orgapacheiceberg-1015
>>
>> This release includes the following changes:
>>
>> *High-level features*
>>
>>    - Core API now supports partition spec and sort order evolution
>>    - Spark 3 now supports the following SQL extensions:
>>       - MERGE INTO
>>       - DELETE FROM
>>       - ALTER TABLE ... ADD/DROP PARTITION
>>       - ALTER TABLE ... WRITE ORDERED BY
>>       - invoke stored procedures using CALL
>>    - Flink now supports streaming reads, CDC writes (experimental), and
>>    filter pushdown
>>    - AWS module is added to support better integration with AWS, with AWS
>>    Glue catalog <https://aws.amazon.com/glue> support and dedicated S3
>>    FileIO implementation
>>    - Nessie module is added to support integration with project Nessie
>>    <https://projectnessie.org>
>>
>> *Important bug fixes*
>>
>>    - #1981 fixes date and timestamp transforms
>>    - #2091 fixes Parquet vectorized reads when column types are promoted
>>    - #1962 fixes Parquet vectorized position reader
>>    - #1991 fixes Avro schema conversions to preserve field docs
>>    - #1811 makes refreshing Spark cache optional
>>    - #1798 fixes read failure when encountering duplicate entries of
>>    data files
>>    - #1785 fixes invalidation of metadata tables in CachingCatalog
>>    - #1784 fixes resolving of SparkSession table's metadata tables
>>
>> *Other notable changes*
>>
>>    - NaN counter is added to format v2 metrics
>>    - Shared catalog properties are added in core library to standardize
>>    catalog level configurations
>>    - Spark and Flink now supports dynamically loading customized
>>    `Catalog` and `FileIO` implementations
>>    - Spark now supports loading tables with file paths via HadoopTables
>>    - Spark 2 now supports loading tables from other catalogs, like Spark
>>    3
>>    - Spark 3 now supports catalog names in DataFrameReader when using
>>    Iceberg as a format
>>    - Hive now supports INSERT INTO, case insensitive query, projection
>>    pushdown, create DDL with schema and auto type conversion
>>    - ORC now supports reading tinyint, smallint, char, varchar types
>>    - Hadoop catalog now supports role-based access of table listing
>>
>> Please download, verify, and test.
>>
>> Please vote in the next 72 hours.
>>
>> [ ] +1 Release this as Apache Iceberg 0.11.0
>> [ ] +0
>> [ ] -1 Do not release this because...
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: [VOTE] Release Apache Iceberg 0.11.0 RC0

Reply via email to