Adding dev@iceberg.apache.org
On Thu, Feb 14, 2019 at 3:00 PM sudsport s <sudssf2...@gmail.com> wrote: > HI I am doing some testing with schema evolution. I looked at > testSchemaUpdate method and SchemaUpdate class for reference. > > > Here are steps I doing to test schema evolution validation > > initially data is created with following schema using "key" as partition > key > > root > |-- id: string (nullable = true) > |-- value: string (nullable = true) > |-- key: integer (nullable = false) > |-- value1: string (nullable = true) > |-- value2: string (nullable = true) > > schema update to rename value1 -> v1 > > root > |-- id: string (nullable = true) > |-- value: string (nullable = true) > |-- key: integer (nullable = false) > |-- v1: string (nullable = true) > |-- value2: string (nullable = true) > > schema update to rename key -> newKey ( I know changing partition key is > not good idea but this is a test :) ) > > root > |-- id: string (nullable = true) > |-- value: string (nullable = true) > |-- newKey: integer (nullable = false) > |-- v1: string (nullable = true) > |-- value2: string (nullable = true) > > > when I read data frame using spark I get following schema > > root > |-- id: string (nullable = true) > |-- value: string (nullable = true) > |-- newKey: integer (nullable = false) > |-- v1: string (nullable = true) > |-- value2: string (nullable = true) > > but when I try to run query or scan using changed column in where clause I > get following exception > > > INFO TableScan: Scanning table /tmp/schema-evolution snapshot > 1550184572006 created at 2019-02-14 14:49:32.189 with filter > not_null(ref(name="v1")) > Exception in thread "main" > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, > tree: > Exchange SinglePartition > +- *(1) HashAggregate(keys=[], functions=[partial_count(1)], > output=[count#77L]) > +- *(1) Project > +- *(1) Filter (isnotnull(v1#60) && (cast(v1#60 as int) = 0)) > +- *(1) DataSourceV2Scan [v1#60], > IcebergScan(table=/tmp/schema-evolution, type=struct<4: v1: optional > string>, filters=[not_null(ref(name="v1"))]) > > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > > Caused by: com.netflix.iceberg.exceptions.ValidationException: Cannot find > field 'v1' in struct: struct<1: id: optional string, 2: value: optional > string, 3: key: required int, 4: value1: optional string, 5: value2: > optional string> > at > com.netflix.iceberg.exceptions.ValidationException.check(ValidationException.java:39) > at > com.netflix.iceberg.expressions.UnboundPredicate.bind(UnboundPredicate.java:46) > > > I ran same query using where various combinations "v1 = 0" , "value1 = 0" > , "key = 0" and "newKey = 0" > > What is best way to query data in iceberg table when schema is changed? > > > following output from metadata json > > > < "name" : "key", > --- > > "name" : "newKey", > 25c25 > < "name" : "value1", > --- > > "name" : "v1", > > > -- > You received this message because you are subscribed to the Google Groups > "Iceberg Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to iceberg-devel+unsubscr...@googlegroups.com. > To post to this group, send email to iceberg-de...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/iceberg-devel/3efe985e-2302-412b-a899-8efe1fbf13c8%40googlegroups.com > <https://groups.google.com/d/msgid/iceberg-devel/3efe985e-2302-412b-a899-8efe1fbf13c8%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. >