[
https://issues.apache.org/jira/browse/HBASE-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297060#comment-16297060
]
Xiang Li commented on HBASE-15482:
----------------------------------
Hi [~tedyu], [~jerryhe], thanks for your comments and guide!
Patch 003 is uploaded to address the following changes mainly:
* Simple the logic in light of 15482.v3.txt. Besides, add the logic to
** Check if numTopsAtMost < 1 (which is invalid)
** Check if top is 1. When it is 1, return top host directly.
* Change the conf key string from
{{hbase.TableSnapshotInputFormat.locality.enable}} into
{{hbase.TableSnapshotInputFormat.locality.enabled}}, by using "enabled" instead
of "enable", as I see most of the conf key strings are using "enabled"
> Provide an option to skip calculating block locations for SnapshotInputFormat
> -----------------------------------------------------------------------------
>
> Key: HBASE-15482
> URL: https://issues.apache.org/jira/browse/HBASE-15482
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Reporter: Liyin Tang
> Assignee: Xiang Li
> Priority: Minor
> Fix For: 2.1.0
>
> Attachments: 15482.v3.txt, HBASE-15482.master.000.patch,
> HBASE-15482.master.001.patch, HBASE-15482.master.002.patch,
> HBASE-15482.master.003.patch
>
>
> When a MR job is reading from SnapshotInputFormat, it needs to calculate the
> splits based on the block locations in order to get best locality. However,
> this process may take a long time for large snapshots.
> In some setup, the computing layer, Spark, Hive or Presto could run out side
> of HBase cluster. In these scenarios, the block locality doesn't matter.
> Therefore, it will be great to have an option to skip calculating the block
> locations for every job. That will super useful for the Hive/Presto/Spark
> connectors.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)