Thanks for offering! I was already working on it though. Here's the PR:
https://github.com/apache/incubator-iceberg/pull/203

On Mon, Jun 3, 2019 at 1:09 AM Filip <filip....@gmail.com> wrote:

> I could try to take a stab at fixing this given that you've pointed out
> very clearly the expected behavior in your previous explanation.
>
> On Thu, May 30, 2019 at 10:31 PM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
>> Yeah, this is a bug. You should be able to define multiple partition
>> functions on the same field. But we do want to check that multiple time
>> partitions are not used because they are redundant. I'll open a PR. Thanks
>> for pointing this out!
>>
>> On Tue, May 28, 2019 at 4:15 AM Anton Okolnychyi
>> <aokolnyc...@apple.com.invalid> wrote:
>>
>>> Hm, this is actually a good question.
>>>
>>> My understanding is that we shouldn't explicitly define partitioning by
>>> year/month/day/hour on the same column. Instead, we should be fine with
>>> hour only. Iceberg produces ordinals for time-based partition functions. As
>>> far as I remember, Ryan was planning to submit a PR in order to prohibit
>>> multiple partition functions.
>>>
>>> I believe in the above case you are trying to create one partition spec
>>> with multiple partition functions on the same field.
>>>
>>> Keep in mind that if you partition by hour only, the directory structure
>>> won’t contain year/month/day folders. If you are to have that directory
>>> structure, you need to have actual columns for year/month/day in your
>>> dataset and use identity partition function.
>>>
>>> Thanks,
>>> Anton
>>>
>>>
>>> > On 28 May 2019, at 09:27, filip <filip....@gmail.com> wrote:
>>> >
>>> >
>>> > A while back I bumped into an issue with what seems to be an
>>> inconsistency in the partition spec API or maybe it's just an
>>> implementation bug.
>>> > Attempting to have multiple partitions specs on the same schema field
>>> I bumped into an issue regarding the fact that while the API allows for
>>> multiple partitions spec defined for same field, internally this conflicts
>>> with the assumption that there is only one partition spec per field.
>>> >
>>> > Given this partition spec:
>>> >
>>> > PartitionSpec spec = PartitionSpec.builderFor(schema)
>>> >             .withSpecId(0)
>>> >             .year("timestamp")
>>> >             .month("timestamp")
>>> >             .day("timestamp")
>>> >             .hour("timestamp")
>>> >             .build();
>>> >
>>> > Trying to validate partition pruning with similar code to:
>>> >
>>> > UnboundPredicate<Object> match = Expressions.equal("timestamp",
>>> >
>>>  
>>> Literal.of("2019-01-11T00:00:00.000000").to(TimestampType.withoutZone()).value());
>>> > Assert.assertTrue(
>>> > new InclusiveManifestEvaluator(spec,
>>> match).eval(table.currentSnapshot().manifests().get(0));
>>> >
>>> > I get an unexpected google collection exception:
>>> >
>>> > java.lang.IllegalArgumentException: Multiple entries with same key:
>>> 1=org.apache.iceberg.PartitionField@da8cdda7 and
>>> 1=org.apache.iceberg.PartitionField@e5c6fddb
>>> >
>>> > at
>>> com.google.common.collect.ImmutableMap.conflictException(ImmutableMap.java:215)
>>> > at
>>> com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:209)
>>> > at
>>> com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:147)
>>> > at
>>> com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:110)
>>> > at
>>> com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:393)
>>> > at
>>> org.apache.iceberg.PartitionSpec.lazyFieldsBySourceId(PartitionSpec.java:232)
>>> > at
>>> org.apache.iceberg.PartitionSpec.getFieldBySourceId(PartitionSpec.java:95)
>>> > at
>>> org.apache.iceberg.expressions.Projections$InclusiveProjection.predicate(Projections.java:208)
>>> > at
>>> org.apache.iceberg.expressions.Projections$InclusiveProjection.predicate(Projections.java:200)
>>> > at
>>> org.apache.iceberg.expressions.Projections$BaseProjectionEvaluator.predicate(Projections.java:185)
>>> > at
>>> org.apache.iceberg.expressions.Projections$BaseProjectionEvaluator.predicate(Projections.java:136)
>>> > at
>>> org.apache.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:152)
>>> > at
>>> org.apache.iceberg.expressions.Projections$BaseProjectionEvaluator.project(Projections.java:152)
>>> > at
>>> org.apache.iceberg.expressions.InclusiveManifestEvaluator.<init>(InclusiveManifestEvaluator.java:63)
>>> > at
>>> org.apache.iceberg.expressions.InclusiveManifestEvaluator.<init>(InclusiveManifestEvaluator.java:56)
>>> > at
>>> org.apache.iceberg.TestScansAndSchemaEvolution.testMultiPartitionPerFieldTransform(TestScansAndSchemaEvolution.java:177)
>>> >
>>> >
>>> > I was wondering if this issue is tracked so maybe I could help out.
>>> >
>>> > Thanks,
>>> > /Filip
>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>
> --
> Filip Bocse
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to