Sebastian Klemke created FLINK-7002:
---------------------------------------

             Summary: Partitioning broken if enum is used in compound key 
specified using field expression
                 Key: FLINK-7002
                 URL: https://issues.apache.org/jira/browse/FLINK-7002
             Project: Flink
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.3.1, 1.2.0
            Reporter: Sebastian Klemke


When groupBy() or keyBy() is used with multiple field expressions, at least one 
of them being an enum type serialized using EnumTypeInfo, partitioning seems 
random, resulting in incorrectly grouped/keyed output datasets/datastreams.

The attached Flink DataSet API jobs and the test dataset detail the issue: Both 
jobs count (id, type) occurrences, TestJob uses field expressions to group, 
WorkingTestJob uses a KeySelector function.

Expected output for both is 6 records, with frequency value 100_000 each. If 
you run in LocalEnvironment, results are in fact equivalent. But when run on a 
cluster with 5 TaskManagers, only KeySelector function with String key produces 
correct results whereas field expressions produce random, non-repeatable, wrong 
results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to