[ 
https://issues.apache.org/jira/browse/HIVE-19040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417987#comment-16417987
 ] 

Sergey Shelukhin commented on HIVE-19040:
-----------------------------------------

[~vihangk1] as per above... HMS version doesn't matter, only Hive jars and HS2.

The reason this API exists is actually also described above although it's not 
very clear I guess.
See the impl.
What happens is we send actual Hive expression to MS.
On the main path, where it can be pushed down, it gets deserialized, converted 
to string and pushed, so that part is indeed redundant and can be replaced.
However the reason we send it as bytes (and the reason the whole API was added 
on top of the old string filter API), is to allow HMS to evaluate it when 
pushdown is not available (which is actually in most cases - Filter.g and SQL 
pushdown supports only basic stuff). This way MS can get (or potentially 
stream) all the partitions and actually evaluate a full Hive expression on them 
with any standard UDFs.
The alternative without this API is to send all partitions back to client 
(which unlike the SQL db can be far away) and evaluate Hive expression on 
client, which can be very expensive if there are many.
That's why bytes and proxy class into ql jar is used (if long ago we had 
metastore client and metastore server module separation, it would have used QL 
classes directly). Also Hive stuff is already serializable into bytes (due to 
needing to serialize the plan).
So with just filter.g this functionality can be lost. 

If we want to keep the ability of MS to evaluate expressions locally, it's 
possible to beef up this API (and the proxy class config). They can refer to a 
named "expression evaluator" that will be configured to refer to a particular 
class, and also supplied in the request. Version can also be included to handle 
compat...

It's also possible to create support for native expressions in metastore that 
would handle most Hive cases, i.e. basically replace Filter.g with Hive 
expression parsing and include common UDFs like IN, etc..



> get_partitions_by_expr() implementation  in HiveMetaStore causes backward 
> incompatibility easily
> ------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-19040
>                 URL: https://issues.apache.org/jira/browse/HIVE-19040
>             Project: Hive
>          Issue Type: Improvement
>          Components: Standalone Metastore
>    Affects Versions: 2.0.0
>            Reporter: Aihua Xu
>            Priority: Major
>
> In the HiveMetaStore implementation of {{public PartitionsByExprResult 
> get_partitions_by_expr(PartitionsByExprRequest req) throws TException}} , an 
> expression is serialized into byte array from the client side and passed 
> through  PartitionsByExprRequest. Then HMS will deserialize back into the 
> expression and filter the partitions by it.
> Such partition filtering expression can contain various UDFs. If there are 
> some changes to one of the UDFs between different Hive versions, HS2 on the 
> older version will serialize the expression in old format which won't be able 
> to be deserialized by HMS on the newer version.  One example of that is, 
> GenericUDFIn class adds {{transient}}  to the field constantInSet which will 
> cause such incompatibility.
> One approach I'm thinking of is, instead of converting the expression object 
> to byte array, we can pass the expression string directly. 
>  
>  
>   
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to