The Skewed Tables <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-SkewedTables> section in the DDL wikidoc has more information which might be helpful.
HIVE-3649 was just one of several jiras that added list bucketing in releases 0.10 and 0.11. See HIVE-3026 <https://issues.apache.org/jira/browse/HIVE-3026> for links to the rest of them. (The one that added DML support hasn't been documented yet: HIVE-3073 <https://issues.apache.org/jira/browse/HIVE-3073>.) I'm revising the jira links in the wiki now. -- Lefty On Wed, Jul 2, 2014 at 1:25 AM, Lefty Leverenz <leftylever...@gmail.com> wrote: > Does anyone have time to answer this? It would be good to clarify things > in the wiki. > > HIVE-3649 <https://issues.apache.org/jira/browse/HIVE-3649> added the > list bucketing feature in release 0.10.0. The description says: > > We need to differ normal skewed table from list bucketing table. we use an >> optional parameter "store as DIRECTORIES" > > > So I think your understanding is correct, but let's hear from the experts. > > -- Lefty > > > On Fri, Jun 27, 2014 at 1:25 PM, Steven Willis <swil...@compete.com> > wrote: > >> I'm having trouble understanding the difference between a skewed table >> and a list bucketed table: >> >> https://cwiki.apache.org/confluence/display/Hive/ListBucketing >> >> Is the only difference that ListBucketing stores the data as directories >> and a "plain" skewed table stores them as files? I think that's what the >> wiki page is saying, but it's very confusing. For one, the title of the >> page is ListBucketing and in many places it seems to use the phrase "List >> Bucketing" as the general feature of partitioning a table by skewed columns >> (whether in directories or files). >> >> There's a section "Skewed Table vs. List Bucketing Table" ( >> https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-ListBucketing) >> that >> I would assume would spell out the differences between the two, but it says: >> >> - Skewed Table is a table which has skewed information. >> - List Bucketing Table is a skewed table. In addition, it tells Hive to >> use the list bucketing feature on the skewed table: create sub-directories >> for skewed values. >> >> That makes it seem like "the list bucketing feature" is just using >> sub-directories for the data. If that's the case, why is the whole article >> titled ListBucketing, and why is the section describing the basic idea >> (that apparently both skewed tables and list bucketed tables have in >> common) titled just "List Bucketing" ( >> https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-ListBucketing >> ). >> >> The article also says, "Mainly due to its sub-directory nature, list >> bucketing can't coexist with some features." So does that mean just list >> bucketing (the subdirectory feature that skewed tables can have as an >> option) is incompatible with the features mentioned, or does it mean that >> any skewed table is incompatible with said features. >> >> -Steve >> > >