[ https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028545#comment-14028545 ]
Mithun Radhakrishnan commented on HIVE-7195: -------------------------------------------- [~sershe]: listPartitions(), etc. do have a max_parts parameter. I'm exploring the possibility of reducing the thrift traffic for partition-operations, for a given number of partitions. That would free us up to transfer metadata for more partitions, without fear of the metastore keeling over from heap-frag, etc. One way of doing that is to reduce redundancy when specifying multiple partitions. Abstracting how partitions are specified makes it possible to vary and extend this. > Improve Metastore performance > ----------------------------- > > Key: HIVE-7195 > URL: https://issues.apache.org/jira/browse/HIVE-7195 > Project: Hive > Issue Type: Improvement > Reporter: Brock Noland > Priority: Critical > > Even with direct SQL, which significantly improves MS performance, some > operations take a considerable amount of time, when there are many partitions > on table. Specifically I believe the issue: > * When a client gets all partitions we do not send them an iterator, we > create a collection of all data and then pass the object over the network in > total > * Operations which require looking up data on the NN can still be slow since > there is no cache of information and it's done in a serial fashion > * Perhaps a tangent, but our client timeout is quite dumb. The client will > timeout and the server has no idea the client is gone. We should use > deadlines, i.e. pass the timeout to the server so it can calculate that the > client has expired. -- This message was sent by Atlassian JIRA (v6.2#6252)