[jira] [Commented] (FLINK-5802) Flink SQL calling Hive User-Defined Functions

Fabian Hueske (JIRA) Wed, 15 Feb 2017 00:15:12 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867437#comment-15867437
 ]


Fabian Hueske commented on FLINK-5802:
--------------------------------------

Thanks for opening this issue [~clarkyzl].
I agree, supporting Hive UDFs would be a great feature for the Table API.

Since there are several ways to achieve this, we should discuss the design 
first.
>From the top of my head I can think of two approaches:

1. Native support by extending the internals of the Table API. This would mean 
that we have Hive specific code to register functions and integrate them with 
code generation. Depending on the interface of the Hive UDFs it might even mean 
that we have to generate a different physical execution plan. It also means 
that we will have a dependency on Hive in the Table API which I personally 
would like to avoid.
2. Support by wrapping. For this, we would implement Table API UDFs that 
internally wrap Hive UDFs. Since the wrappers are treated like regular Table 
API UDFs, we do not need to change the internals of the Table API. On the other 
hand this does only work if the interfaces of Table API UDFs and Hive UDFs are 
compatible (if they are not we would probably need different execution plans). 
Another pro aspect is that the wrappers could be located in a separate Maven 
module which also prevents a hard Hive dependency in flink-table.

I would opt for the second approach because it has less implications on the 
internals of the Table API. If we figure out that this approach will not work 
for some Hive UDFs we have to decide whether it is worth to support those or 
not.

> Flink SQL calling Hive User-Defined Functions
> ---------------------------------------------
>
>                 Key: FLINK-5802
>                 URL: https://issues.apache.org/jira/browse/FLINK-5802
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: Zhuoluo Yang
>              Labels: features
>
> It's important to call hive udf in Flink SQL. A great many udfs were written 
> in hive since last ten years. 
> It's really important to reuse the hive udfs. This feature will reduce the 
> cost of migration and bring more users to flink.
> Spark SQL has already supported this function.
> https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_spark-guide/content/calling-udfs.html
> The Hive UDFs here include both built-in UDFs and customized UDFs. As many 
> business logic had been written in UDFs, the customized UDFs are more 
> important than the built-in UDFs. 
> Generally, there are three kinds of UDFs in Hive: UDF, UDTF and UDAF.
> Here is the document of the Spark SQL: 
> http://spark.apache.org/docs/latest/sql-programming-guide.html#compatibility-with-apache-hive
>  
> Spark code:
> https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
> https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5802) Flink SQL calling Hive User-Defined Functions

Reply via email to