[ 
https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022271#comment-13022271
 ] 

Siying Dong commented on HIVE-2121:
-----------------------------------

changed test outputs. test outputs of sample tests changed as I chanced token 
name for sampling using buckets from TOK_TABLESAMPLE to TOK_TABLEBUCKETSAMPLE 
(as I am adding a TOK_TABLESPLITSAMPLE). Other than that, there should be no 
change.

This patch has a limit: even if we only sample one split out, execution is not 
switched to local mode if possible, as getSplits() is called in job submit 
part, which already passed the step to choose local mode.

> Input Sampling By Splits
> ------------------------
>
>                 Key: HIVE-2121
>                 URL: https://issues.apache.org/jira/browse/HIVE-2121
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch
>
>
> We need a better input sampling to serve at least two purposes:
> 1. test their queries against a smaller data set
> 2. understand more about how the data look like without scanning the whole 
> table.
> A simple function that gives a subset splits will help in those cases. It 
> doesn't have to be strict sampling.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to