I verified the hash/sign/build/UT and manual testing with Apache Spark
2.4.6 (hadoop-2.7) and 3.0.0 (hadoop-3.2) on Apache Hive 2.3.7 metastore.
(BTW, for spark-3.0.0-bin-hadoop3.2, I used Ryan's example command with
`spark.sql.warehouse.dir` instead of `spark.warehouse.path`)

1. Iceberg 0.9 + Spark 2.4.6 works as expected.
2. Iceberg 0.9 + Spark 3.0.0 works as expected mostly, but views work
differently from Iceberg 0.9 + Spark 2.4.6.

The following is the example.


scala> sql("select * from t2").show
|  a|  b|

scala> Seq(("a", 1), ("b", 2)).toDF("a",

scala> sql("select * from t2").show // Iceberg 0.9 with Spark 2.4.6 shows
the updated result correctly here.
|  a|  b|

scala> spark.read.format("iceberg").load("/tmp/t1").show
|  a|  b|
|  a|  1|
|  b|  2|

So far, I'm not sure which part (Iceberg or Spark) introduces this
but it would be nice if we had some document about the difference between
versions if this is designed like this.


On Sun, Jul 12, 2020 at 12:32 PM Daniel Weeks <dwe...@apache.org> wrote:

> +1 (binding)
> Verified sigs/sums/license/build/test
> I did have an issue with the test metastore for the spark3 tests on the
> first run, but couldn't replicate it in subsequent tests.
> -Dan
> On Fri, Jul 10, 2020 at 10:42 AM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>> +1 (binding)
>> Verified checksums, ran tests, staged convenience binaries.
>> I also ran a few tests using Spark 3.0.0 and Spark 2.4.5 and the runtime
>> Jars. For anyone that would like to use spark-sql or spark-shell, here are
>> the commands that I used:
>> ~/Apps/spark-3.0.0-bin-hadoop2.7/bin/spark-sql \
>>     --repositories 
>> https://repository.apache.org/content/repositories/orgapacheiceberg-1008 \
>>     --packages org.apache.iceberg:iceberg-spark3-runtime:0.9.0 \
>>     --conf spark.warehouse.path=$PWD/spark-warehouse \
>>     --conf spark.hadoop.hive.metastore.uris=thrift://localhost:42745 \
>>     --conf 
>> spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
>> \
>>     --conf spark.sql.catalog.spark_catalog.type=hive \
>>     --conf spark.sql.catalog.hive_prod=org.apache.iceberg.spark.SparkCatalog 
>> \
>>     --conf spark.sql.catalog.hive_prod.type=hive \
>>     --conf 
>> spark.sql.catalog.hadoop_prod=org.apache.iceberg.spark.SparkCatalog \
>>     --conf spark.sql.catalog.hadoop_prod.type=hadoop \
>>     --conf spark.sql.catalog.hadoop_prod.warehouse=$PWD/hadoop-warehouse
>> ~/Apps/spark-2.4.5-bin-hadoop2.7/bin/spark-shell \
>>     --repositories 
>> https://repository.apache.org/content/repositories/orgapacheiceberg-1008 \
>>     --packages org.apache.iceberg:iceberg-spark-runtime:0.9.0 \
>>     --conf spark.hadoop.hive.metastore.uris=thrift://localhost:42745 \
>>     --conf spark.warehouse.path=$PWD/spark-warehouse
>> The Spark 3 command sets up 3 catalogs:
>>    - A wrapper around the built-in catalog, spark_catalog, that adds
>>    support for Iceberg tables
>>    - A hive_prod Iceberg catalog that uses the same metastore as the
>>    session catalog
>>    - A hadoop_prod Iceberg catalog that stores tables in a
>>    hadoop-warehouse folder
>> Everything worked great, except for a minor issue with CTAS, #1194
>> <https://github.com/apache/iceberg/pull/1194>. I’m okay to release with
>> that issue, but we can always build a new RC if anyone thinks it is a
>> blocker.
>> If you’d like to run tests in a downstream project, you can use the
>> staged binary artifacts by adding this to your gradle build:
>>   repositories {
>>     maven {
>>       name 'stagedIceberg'
>>       url 
>> 'https://repository.apache.org/content/repositories/orgapacheiceberg-1008/'
>>     }
>>   }
>>   ext {
>>     icebergVersion = '0.9.0'
>>   }
>> On Fri, Jul 10, 2020 at 9:20 AM Ryan Murray <rym...@dremio.com> wrote:
>>> 1. Verify the signature: OK
>>> 2. Verify the checksum: OK
>>> 3. Untar the archive tarball: OK
>>> 4. Run RAT checks to validate license headers: RAT checks passed
>>> 5. Build and test the project: all unit tests passed.
>>> +1 (non-binding)
>>> I did see that my build took >12 minutes and used all 100% of all 8
>>> cores & 32GB of memory (openjdk-8 ubuntu 18.04) which I haven't noticed
>>> before.
>>> On Fri, Jul 10, 2020 at 4:37 AM OpenInx <open...@gmail.com> wrote:
>>>> I followed the verify guide here (
>>>> https://lists.apache.org/thread.html/rd5e6b1656ac80252a9a7d473b36b6227da91d07d86d4ba4bee10df66%40%3Cdev.iceberg.apache.org%3E)
>>>> :
>>>> 1. Verify the signature: OK
>>>> 2. Verify the checksum: OK
>>>> 3. Untar the archive tarball: OK
>>>> 4. Run RAT checks to validate license headers: RAT checks passed
>>>> 5. Build and test the project: all unit tests passed.
>>>> +1 (non-binding).
>>>> On Fri, Jul 10, 2020 at 9:46 AM Ryan Blue <rb...@netflix.com.invalid>
>>>> wrote:
>>>>> Hi everyone,
>>>>> I propose the following RC to be released as the official Apache
>>>>> Iceberg 0.9.0 release.
>>>>> The commit id is 4e66b4c10603e762129bc398146e02d21689e6dd
>>>>> * This corresponds to the tag: apache-iceberg-0.9.0-rc5
>>>>> * https://github.com/apache/iceberg/commits/apache-iceberg-0.9.0-rc5
>>>>> * https://github.com/apache/iceberg/tree/4e66b4c1
>>>>> The release tarball, signature, and checksums are here:
>>>>> *
>>>>> https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-0.9.0-rc5/
>>>>> You can find the KEYS file here:
>>>>> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS
>>>>> Convenience binary artifacts are staged in Nexus. The Maven repository
>>>>> URL is:
>>>>> *
>>>>> https://repository.apache.org/content/repositories/orgapacheiceberg-1008/
>>>>> This release includes support for Spark 3 and vectorized reads for
>>>>> flat schemas in Spark.
>>>>> Please download, verify, and test.
>>>>> Please vote in the next 72 hours.
>>>>> [ ] +1 Release this as Apache Iceberg 0.9.0
>>>>> [ ] +0
>>>>> [ ] -1 Do not release this because...
>>>>> --
>>>>> Ryan Blue
>>>>> Software Engineer
>>>>> Netflix
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix

Reply via email to