Thanks for offering! I was already working on it though. Here's the PR: https://github.com/apache/incubator-iceberg/pull/203
On Mon, Jun 3, 2019 at 1:09 AM Filip <filip....@gmail.com> wrote: > I could try to take a stab at fixing this given that you've pointed out > very clearly the expected behavior in your previous explanation. > > On Thu, May 30, 2019 at 10:31 PM Ryan Blue <rb...@netflix.com.invalid> > wrote: > >> Yeah, this is a bug. You should be able to define multiple partition >> functions on the same field. But we do want to check that multiple time >> partitions are not used because they are redundant. I'll open a PR. Thanks >> for pointing this out! >> >> On Tue, May 28, 2019 at 4:15 AM Anton Okolnychyi >> <aokolnyc...@apple.com.invalid> wrote: >> >>> Hm, this is actually a good question. >>> >>> My understanding is that we shouldn't explicitly define partitioning by >>> year/month/day/hour on the same column. Instead, we should be fine with >>> hour only. Iceberg produces ordinals for time-based partition functions. As >>> far as I remember, Ryan was planning to submit a PR in order to prohibit >>> multiple partition functions. >>> >>> I believe in the above case you are trying to create one partition spec >>> with multiple partition functions on the same field. >>> >>> Keep in mind that if you partition by hour only, the directory structure >>> won’t contain year/month/day folders. If you are to have that directory >>> structure, you need to have actual columns for year/month/day in your >>> dataset and use identity partition function. >>> >>> Thanks, >>> Anton >>> >>> >>> > On 28 May 2019, at 09:27, filip <filip....@gmail.com> wrote: >>> > >>> > >>> > A while back I bumped into an issue with what seems to be an >>> inconsistency in the partition spec API or maybe it's just an >>> implementation bug. >>> > Attempting to have multiple partitions specs on the same schema field >>> I bumped into an issue regarding the fact that while the API allows for >>> multiple partitions spec defined for same field, internally this conflicts >>> with the assumption that there is only one partition spec per field. >>> > >>> > Given this partition spec: >>> > >>> > PartitionSpec spec = PartitionSpec.builderFor(schema) >>> > .withSpecId(0) >>> > .year("timestamp") >>> > .month("timestamp") >>> > .day("timestamp") >>> > .hour("timestamp") >>> > .build(); >>> > >>> > Trying to validate partition pruning with similar code to: >>> > >>> > UnboundPredicate<Object> match = Expressions.equal("timestamp", >>> > >>> >>> Literal.of("2019-01-11T00:00:00.000000").to(TimestampType.withoutZone()).value()); >>> > Assert.assertTrue( >>> > new InclusiveManifestEvaluator(spec, >>> match).eval(table.currentSnapshot().manifests().get(0)); >>> > >>> > I get an unexpected google collection exception: >>> > >>> > java.lang.IllegalArgumentException: Multiple entries with same key: >>> 1=org.apache.iceberg.PartitionField@da8cdda7 and >>> 1=org.apache.iceberg.PartitionField@e5c6fddb >>> > >>> > at >>> com.google.common.collect.ImmutableMap.conflictException(ImmutableMap.java:215) >>> > at >>> com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:209) >>> > at >>> com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:147) >>> > at >>> com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:110) >>> > at >>> com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:393) >>> > at >>> org.apache.iceberg.PartitionSpec.lazyFieldsBySourceId(PartitionSpec.java:232) >>> > at >>> org.apache.iceberg.PartitionSpec.getFieldBySourceId(PartitionSpec.java:95) >>> > at >>> org.apache.iceberg.expressions.Projections$InclusiveProjection.predicate(Projections.java:208) >>> > at >>> org.apache.iceberg.expressions.Projections$InclusiveProjection.predicate(Projections.java:200) >>> > at >>> org.apache.iceberg.expressions.Projections$BaseProjectionEvaluator.predicate(Projections.java:185) >>> > at >>> org.apache.iceberg.expressions.Projections$BaseProjectionEvaluator.predicate(Projections.java:136) >>> > at >>> org.apache.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:152) >>> > at >>> org.apache.iceberg.expressions.Projections$BaseProjectionEvaluator.project(Projections.java:152) >>> > at >>> org.apache.iceberg.expressions.InclusiveManifestEvaluator.<init>(InclusiveManifestEvaluator.java:63) >>> > at >>> org.apache.iceberg.expressions.InclusiveManifestEvaluator.<init>(InclusiveManifestEvaluator.java:56) >>> > at >>> org.apache.iceberg.TestScansAndSchemaEvolution.testMultiPartitionPerFieldTransform(TestScansAndSchemaEvolution.java:177) >>> > >>> > >>> > I was wondering if this issue is tracked so maybe I could help out. >>> > >>> > Thanks, >>> > /Filip >>> >>> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > > > -- > Filip Bocse > -- Ryan Blue Software Engineer Netflix