Hi Steve,
Unfortunately the information you posted still does not explain how you
ended up with *RAW('java.util.Map', ?)* for your input type. Would be
best if you could share an example that I could use to reproduce it.
I tried putting down some potential approaches:
I tested it with a class generated from an avsc:
{"namespace": "com.ververica.avro.generated",
"type": "record",
"name": "Address",
"fields": [
{"name": "num", "type": "int"},
{"name": "street", "type": {
"type": "map",
"values" : "string",
"default": {}
}}
]
}
which has two fields:
@Deprecated public int num;
@Deprecated public java.util.Map<java.lang.String,java.lang.String>
street;
1) From the description you posted the UrlParameters (street in my case)
field should have *LEGACY('RAW', 'ANY<java.util.Map>')* type.
root
|-- num: INT
|-- street: LEGACY('RAW', 'ANY<java.util.Map>')
2) Using the new type system
A more seamless integration of the DataStream <> Table integration is
still under development. You can check FLIP-136[1] for it. Therefore
you'd need to adjust your types in the input DataStream. Bare in mind
this approach changes the way the type is serialized from an Avro based
to custom Flink's POJO serialization.
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
Map<String, TypeInformation<?>> fieldTypes = new HashMap<>();
fieldTypes.put("num", BasicTypeInfo.INT_TYPE_INFO);
fieldTypes.put("street", Types.MAP(Types.STRING, Types.STRING));
SingleOutputStreamOperator<Address> elements = env.fromElements(
Address.newBuilder()
.setNum(1)
.setStreet(new HashMap<>())
.build()
)
.returns(
Types.POJO(
Address.class,
fieldTypes
)
);
StreamTableEnvironment tEnv = StreamTableEnvironment.create(
env,
EnvironmentSettings.newInstance().useBlinkPlanner().build());
tEnv.createTemporaryView("test", elements);
tEnv.from("test").select(call(Func.class,
$("street"))).execute().print();
}
public static class Func extends ScalarFunction {
@FunctionHint(
input = {@DataTypeHint(value = "MAP<STRING, STRING>")},
output = @DataTypeHint("STRING")
)
public String eval(final Map<String, String> map) {
// business logic
return "ABC";
}
}
3) Using the legacy types approach you can query that field like this:
public static class LegacyFunc extends ScalarFunction {
public String eval(final Map<String, String> map) {
// business logic
return "ABC";
}
}
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
SingleOutputStreamOperator<Address> elements = env.fromElements(
Address.newBuilder()
.setNum(1)
.setStreet(new HashMap<>())
.build()
);
StreamTableEnvironment tEnv = StreamTableEnvironment.create(
env,
EnvironmentSettings.newInstance().useBlinkPlanner().build());
tEnv.createTemporaryView("test", elements);
tEnv.registerFunction("legacyFunc", new LegacyFunc());
tEnv.from("test").select(call("legacyFunc",
$("street"))).execute().print();
Best,
Dawid
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-136%3A++Improve+interoperability+between+DataStream+and+Table+API
On 09/11/2020 02:30, Steve Whelan wrote:
Hi Dawid,
Just wanted to bump this thread in case you had any thoughts.
Thanks,
Steve
On Thu, Oct 29, 2020 at 2:42 PM Steve Whelan <swhe...@jwplayer.com
<mailto:swhe...@jwplayer.com>> wrote:
For some background, I am upgrading from Flink v1.9 to v1.11. So
what I am about to describe is our implementation on v1.9, which
worked. I am trying to achieve the same functionality on v1.11.
I have a DataStream whose type is an avro generated POJO, which
contains a field *UrlParameters* that is of type *Map<String,
String>*. I register this stream as a view so I can perform SQL
queries on it. One of the queries contains the UDF I have
previously posted. It appears that in the conversion to a view,
the type of *UrlParameters* is being converted into
*RAW('java.util.Map', ?)*.
*Code on v1.9*
DataStream pings = // a Kafka stream source deserialized into an
avro generated POJO
tableEnvironment.registerDataStream("myTable", pings);
table = tableEnvironment.sqlQuery("SELECT MAP_VALUE(UrlParameters,
'some_key') FROM myTable");
// tablesinks...
/The produced type of my deserializer is:/
@Override
public TypeInformation<Ping> getProducedType() {
// Ping.class is an avro generated POJO
return TypeInformation.of(Ping.class);
}
/Scalar UDF MAP_VALUE:/
public static String eval(final Map<String, String> map, final
String key) {
return map.get(key);
}
I an using a UDF to access fields in the *UrlParameters* map
because if I try to access them directly in the SQL (i.e.
`*UrlParameters['some_key']*`), I get the below exception. This
stackoverflow[1] had suggested the UDF as a work around.
Caused by: org.apache.flink.table.api.TableException: Type is not
supported: ANY
at
org.apache.flink.table.planner.calcite.FlinkTypeFactory$.toLogicalType(FlinkTypeFactory.scala:551)
at
org.apache.flink.table.planner.codegen.ExprCodeGenerator.visitCall(ExprCodeGenerator.scala:478)
at
org.apache.flink.table.planner.codegen.ExprCodeGenerator.visitCall(ExprCodeGenerator.scala:53)
at org.apache.calcite.rex.RexCall.accept(RexCall.java:288)
at
org.apache.flink.table.planner.codegen.ExprCodeGenerator.$anonfun$visitCall$1(ExprCodeGenerator.scala:490)
This above implementation worked successfully on v1.9. We use a
stream source instead of a table source b/c we do other non-SQL
type things with the stream.
*Code on v1.11*
The following is the implementation on v1.11 which does not work.
I was using the Old Planner on v1.9 but have switched to the Blink
Planner on v1.11, in case that has any relevance here.
DataStream pings = // a Kafka stream source deserialized into an
avro generated POJO object
tableEnvironment.createTemporaryView("myTable", pings);
table = tableEnvironment.sqlQuery("SELECT MAP_VALUE(UrlParameters,
'some_key') FROM myTable");
// tablesinks...
The UDF referenced above produced the below error. So I assumed
adding DataTypeHints was the way to solve it but I was unable to
get that to work. That is what prompted the initial email to the ML.
Caused by: org.apache.flink.table.api.ValidationException: Invalid
input arguments. Expected signatures are:
MAP_VALUE(map => MAP<STRING, STRING>, key => STRING)
at
org.apache.flink.table.types.inference.TypeInferenceUtil.createInvalidInputException(TypeInferenceUtil.java:190)
at
org.apache.flink.table.planner.functions.inference.TypeInferenceOperandChecker.checkOperandTypesOrError(TypeInferenceOperandChecker.java:131)
at
org.apache.flink.table.planner.functions.inference.TypeInferenceOperandChecker.checkOperandTypes(TypeInferenceOperandChecker.java:89)
... 50 more
Caused by: org.apache.flink.table.api.ValidationException: Invalid
argument type at position 0. Data type MAP<STRING, STRING>
expected but RAW('java.util.Map', ?) passed.
at
org.apache.flink.table.types.inference.TypeInferenceUtil.adaptArguments(TypeInferenceUtil.java:137)
at
org.apache.flink.table.types.inference.TypeInferenceUtil.adaptArguments(TypeInferenceUtil.java:102)
at
org.apache.flink.table.planner.functions.inference.TypeInferenceOperandChecker.checkOperandTypesOrError(TypeInferenceOperandChecker.java:126)
... 51 more
I can try creating a concrete reproducible example if this
explanation isn't enough though its quite a bit with the avro POJO
and custom deserializer.
Thanks,
Steve
[1]
https://stackoverflow.com/questions/45621542/does-flink-sql-support-java-map-types
<https://stackoverflow.com/questions/45621542/does-flink-sql-support-java-map-types>