Hi,

did you try to use a different order? Core module first and then Hive module?

The compatibility layer should work sufficiently for regular Hive UDFs that don't aggregate data. Hive aggregation functions should work well in batch scenarios. However, for streaming pipeline the aggregate functions need to be able to consume updates (such as retraction in your case).

In summary: Ideally, for simply stuff such as SUM or COUNT, you should use the core functions instead of Hive one. Using Hive agg functions in streaming could lead to issues if the input operator is not insert-only.

Regards,
Timo

On 08.09.21 06:47, vtygoss wrote:

Hi, Flink Community!


i met a problem using flink 1.12.0 standalone cluster with hive catalog.


scene 1:

- module: hive module

- execute sql: select sum(1) from xxx

- exception: *org.apache.flink.table.api.TableException: Required built-in function [plus] could not be found in any catalog.*


scene 2:

- module: hive module and core module

- execute sql: select sum(1)

- exception: *org.apache.flink.table.api.ValidationException: Could not find an implementation method 'retract' in class 'class org.apache.flink.table.functions.hive.HiveGenericUDAF' for function 'sum' that matches the following signature:*

*void retract(org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.AggregationBuffer, java.lang.Integer)*


scene 3:

- module: core module

- execute sql: select sum(1)

- no exception, but hive udf is invalid.



so is there a solution to use both hive udf and avoid these exceptions?


Thank you for any suggestions.


Best Regards!




Reply via email to