[ https://issues.apache.org/jira/browse/HIVE-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022271#comment-13022271 ]
Siying Dong commented on HIVE-2121: ----------------------------------- changed test outputs. test outputs of sample tests changed as I chanced token name for sampling using buckets from TOK_TABLESAMPLE to TOK_TABLEBUCKETSAMPLE (as I am adding a TOK_TABLESPLITSAMPLE). Other than that, there should be no change. This patch has a limit: even if we only sample one split out, execution is not switched to local mode if possible, as getSplits() is called in job submit part, which already passed the step to choose local mode. > Input Sampling By Splits > ------------------------ > > Key: HIVE-2121 > URL: https://issues.apache.org/jira/browse/HIVE-2121 > Project: Hive > Issue Type: New Feature > Reporter: Siying Dong > Assignee: Siying Dong > Attachments: HIVE-2121.1.patch, HIVE-2121.2.patch > > > We need a better input sampling to serve at least two purposes: > 1. test their queries against a smaller data set > 2. understand more about how the data look like without scanning the whole > table. > A simple function that gives a subset splits will help in those cases. It > doesn't have to be strict sampling. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira