[ 
https://issues.apache.org/jira/browse/HIVE-19040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416600#comment-16416600
 ] 

Vihang Karajgaonkar commented on HIVE-19040:
--------------------------------------------

Based on my previous discussions with [~alangates] I think the assumption with 
this particular API was that only Hive uses currently and hence it is assumed 
that the right hive jars are in the classpath of HMS. This API is very flaky 
though from the point of view of standalone-metastore. Technically, HMS APIs 
are backwards compatible which means older clients should be able to talk to 
newer HMS. This assumption is broken is in this case. In my humble opinion 
sending an object as bytearray over wire and deserializing it on the server 
side is reinventing what thrift does for us. I am not even sure how Hive UDFs 
are used to create filter strings on MySQL (I will have to look more to 
understand how the expressionTree is getting generated). What happens if client 
sends a UDF which is specific to Hive?

Also, is it always true that HS2 and HMS will always be at the same version?

The second meta-point to think about is when the standalone-metastore is 
deployed separately, should it have hive-exec jars in its classpath? I think 
what we are indirectly saying now is if Hive is one of the users of this 
standalone HMS (which it most certainly will always be) then we should add the 
"right" version of hive-exec jars in the metastore's classpath. How does that 
make metastore standalone? Aren't we back to square one?

I think the right way ahead may be is to deprecate this API and reimplement it 
without depending on hive-exec. The {{Filter.g}} is already part of HMS and we 
should try to build/enhance this to provide the expressions which can be used 
to filter out partitions.

> get_partitions_by_expr() implementation  in HiveMetaStore causes backward 
> incompatibility easily
> ------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-19040
>                 URL: https://issues.apache.org/jira/browse/HIVE-19040
>             Project: Hive
>          Issue Type: Improvement
>          Components: Standalone Metastore
>    Affects Versions: 2.0.0
>            Reporter: Aihua Xu
>            Priority: Major
>
> In the HiveMetaStore implementation of {{public PartitionsByExprResult 
> get_partitions_by_expr(PartitionsByExprRequest req) throws TException}} , an 
> expression is serialized into byte array from the client side and passed 
> through  PartitionsByExprRequest. Then HMS will deserialize back into the 
> expression and filter the partitions by it.
> Such partition filtering expression can contain various UDFs. If there are 
> some changes to one of the UDFs between different Hive versions, HS2 on the 
> older version will serialize the expression in old format which won't be able 
> to be deserialized by HMS on the newer version.  One example of that is, 
> GenericUDFIn class adds {{transient}}  to the field constantInSet which will 
> cause such incompatibility.
> One approach I'm thinking of is, instead of converting the expression object 
> to byte array, we can pass the expression string directly. 
>  
>  
>   
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to