Access Row fields by attribute name rather than by index in PyFlink TableFunction

Sumeet Malhotra Wed, 19 May 2021 01:45:16 -0700

Hi,

According to the documentation for PyFlink Table row based operations [1],
typical usage is as follows:


@udtf(result_types=[DataTypes.INT(), DataTypes.STRING()])
def split(x: Row) -> Row:
    for s in x[1].split(","):
        yield x[0], s

table.flat_map(split)

Is there any way that row fields inside the UDTF can be accessed by
their attribute names instead of array index? In my use case, I'm doing the
following:

raw_data = t_env.from_path('MySource')
raw_data \
    .join_lateral(my_table_tranform_fn(raw_data).alias('a', 'b', 'c') \
    .flat_map(my_flat_map_fn).alias('x', 'y', 'z') \
    .execute_insert("MySink")

In the table function `my_flat_map_fn` I'm unable to access the fields of
the row by their attribute names i.e., assuming the input argument to the
table function is x, I cannot access fields as x.a, x.b or x.c, instead I
have use use x[0], x[1] and x[2]. The error I get is the _fields is not
populated.

In my use case, the number of columns is very high and working with indexes
is so much error prone and unmaintainable.

Any suggestions?

Thanks,
Sumeet

Access Row fields by attribute name rather than by index in PyFlink TableFunction

Reply via email to