Hi everybody,

while reviewing PR #2094 [1] I noticed that the field reference syntax for
FieldAccessors is not compatible with the syntax supported for key
definitions (ExpressionKeys) used in groupBy(), keyBy(),
join().where().equalTo(), etc.

FieldAccessors are only used for build-in aggregations in the DataStream
API (sum(), min(), max(), ...).

In particular I identified the following inconsistencies:

- FieldAccessors allow to address array cells. ExpressionKeys treat arrays
as AtomicTypes (Array TypeInfos do not extend CompositeType). Hence, it is
not possible to address array cells.
- ExpressionKeys do only support Integer keys for tuples. An atomic type
can only be addressed with "*". FieldAccessors allow to address AtomicTypes
with 0 in addtion to "*".
- ExpressionKeys support to address fields of Java tuples with "f2" and
Scala tuple fields with "_3". FieldAccessors do not support the "f" or "_"
prefix.

I would like to propose to adapt the syntax of both mechanisms (ideally,
both should use the same code for validation). IMO, the ExpressionKey
syntax much more widely used and is well designed. Therefore, I would adopt
it for FieldAccessors as well. However, that would mean to restrict the
syntax of the FieldAccessors and might break existing code.

What do others think?

[1] https://github.com/apache/flink/pull/2094

Reply via email to