The Skewed Tables
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-SkewedTables>
section in the DDL wikidoc has more information which might be helpful.

HIVE-3649 was just one of several jiras that added list bucketing in
releases 0.10 and 0.11.  See HIVE-3026
<https://issues.apache.org/jira/browse/HIVE-3026> for links to the rest of
them.  (The one that added DML support hasn't been documented yet:
HIVE-3073 <https://issues.apache.org/jira/browse/HIVE-3073>.)

I'm revising the jira links in the wiki now.

-- Lefty


On Wed, Jul 2, 2014 at 1:25 AM, Lefty Leverenz <leftylever...@gmail.com>
wrote:

> Does anyone have time to answer this?  It would be good to clarify things
> in the wiki.
>
> HIVE-3649 <https://issues.apache.org/jira/browse/HIVE-3649> added the
> list bucketing feature in release 0.10.0.  The description says:
>
> We need to differ normal skewed table from list bucketing table. we use an
>> optional parameter "store as DIRECTORIES"
>
>
> So I think your understanding is correct, but let's hear from the experts.
>
> -- Lefty
>
>
> On Fri, Jun 27, 2014 at 1:25 PM, Steven Willis <swil...@compete.com>
> wrote:
>
>> I'm having trouble understanding the difference between a skewed table
>> and a list bucketed table:
>>
>> https://cwiki.apache.org/confluence/display/Hive/ListBucketing
>>
>> Is the only difference that ListBucketing stores the data as directories
>> and a "plain" skewed table stores them as files? I think that's what the
>> wiki page is saying, but it's very confusing. For one, the title of the
>> page is ListBucketing and in many places it seems to use the phrase "List
>> Bucketing" as the general feature of partitioning a table by skewed columns
>> (whether in directories or files).
>>
>> There's a section "Skewed Table vs. List Bucketing Table" (
>> https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-ListBucketing)
>>  that
>> I would assume would spell out the differences between the two, but it says:
>>
>>  - Skewed Table is a table which has skewed information.
>>  - List Bucketing Table is a skewed table. In addition, it tells Hive to
>> use the list bucketing feature on the skewed table: create sub-directories
>> for skewed values.
>>
>> That makes it seem like "the list bucketing feature" is just using
>> sub-directories for the data. If that's the case, why is the whole article
>> titled ListBucketing, and why is the section describing the basic idea
>> (that apparently both skewed tables and list bucketed tables have in
>> common) titled just "List Bucketing" (
>> https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-ListBucketing
>> ).
>>
>> The article also says, "Mainly due to its sub-directory nature, list
>> bucketing can't coexist with some features." So does that mean just list
>> bucketing (the subdirectory feature that skewed tables can have as an
>> option) is incompatible with the features mentioned, or does it mean that
>> any skewed table is incompatible with said features.
>>
>> -Steve
>>
>
>

Reply via email to