[ https://issues.apache.org/jira/browse/HIVE-19040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417987#comment-16417987 ]
Sergey Shelukhin commented on HIVE-19040: ----------------------------------------- [~vihangk1] as per above... HMS version doesn't matter, only Hive jars and HS2. The reason this API exists is actually also described above although it's not very clear I guess. See the impl. What happens is we send actual Hive expression to MS. On the main path, where it can be pushed down, it gets deserialized, converted to string and pushed, so that part is indeed redundant and can be replaced. However the reason we send it as bytes (and the reason the whole API was added on top of the old string filter API), is to allow HMS to evaluate it when pushdown is not available (which is actually in most cases - Filter.g and SQL pushdown supports only basic stuff). This way MS can get (or potentially stream) all the partitions and actually evaluate a full Hive expression on them with any standard UDFs. The alternative without this API is to send all partitions back to client (which unlike the SQL db can be far away) and evaluate Hive expression on client, which can be very expensive if there are many. That's why bytes and proxy class into ql jar is used (if long ago we had metastore client and metastore server module separation, it would have used QL classes directly). Also Hive stuff is already serializable into bytes (due to needing to serialize the plan). So with just filter.g this functionality can be lost. If we want to keep the ability of MS to evaluate expressions locally, it's possible to beef up this API (and the proxy class config). They can refer to a named "expression evaluator" that will be configured to refer to a particular class, and also supplied in the request. Version can also be included to handle compat... It's also possible to create support for native expressions in metastore that would handle most Hive cases, i.e. basically replace Filter.g with Hive expression parsing and include common UDFs like IN, etc.. > get_partitions_by_expr() implementation in HiveMetaStore causes backward > incompatibility easily > ------------------------------------------------------------------------------------------------ > > Key: HIVE-19040 > URL: https://issues.apache.org/jira/browse/HIVE-19040 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore > Affects Versions: 2.0.0 > Reporter: Aihua Xu > Priority: Major > > In the HiveMetaStore implementation of {{public PartitionsByExprResult > get_partitions_by_expr(PartitionsByExprRequest req) throws TException}} , an > expression is serialized into byte array from the client side and passed > through PartitionsByExprRequest. Then HMS will deserialize back into the > expression and filter the partitions by it. > Such partition filtering expression can contain various UDFs. If there are > some changes to one of the UDFs between different Hive versions, HS2 on the > older version will serialize the expression in old format which won't be able > to be deserialized by HMS on the newer version. One example of that is, > GenericUDFIn class adds {{transient}} to the field constantInSet which will > cause such incompatibility. > One approach I'm thinking of is, instead of converting the expression object > to byte array, we can pass the expression string directly. > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)