Does anyone have time to answer this? It would be good to clarify things in the wiki.
HIVE-3649 <https://issues.apache.org/jira/browse/HIVE-3649> added the list bucketing feature in release 0.10.0. The description says: We need to differ normal skewed table from list bucketing table. we use an > optional parameter "store as DIRECTORIES" So I think your understanding is correct, but let's hear from the experts. -- Lefty On Fri, Jun 27, 2014 at 1:25 PM, Steven Willis <swil...@compete.com> wrote: > I'm having trouble understanding the difference between a skewed table and > a list bucketed table: > > https://cwiki.apache.org/confluence/display/Hive/ListBucketing > > Is the only difference that ListBucketing stores the data as directories > and a "plain" skewed table stores them as files? I think that's what the > wiki page is saying, but it's very confusing. For one, the title of the > page is ListBucketing and in many places it seems to use the phrase "List > Bucketing" as the general feature of partitioning a table by skewed columns > (whether in directories or files). > > There's a section "Skewed Table vs. List Bucketing Table" ( > https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-ListBucketing) > that > I would assume would spell out the differences between the two, but it says: > > - Skewed Table is a table which has skewed information. > - List Bucketing Table is a skewed table. In addition, it tells Hive to > use the list bucketing feature on the skewed table: create sub-directories > for skewed values. > > That makes it seem like "the list bucketing feature" is just using > sub-directories for the data. If that's the case, why is the whole article > titled ListBucketing, and why is the section describing the basic idea > (that apparently both skewed tables and list bucketed tables have in > common) titled just "List Bucketing" ( > https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-ListBucketing > ). > > The article also says, "Mainly due to its sub-directory nature, list > bucketing can't coexist with some features." So does that mean just list > bucketing (the subdirectory feature that skewed tables can have as an > option) is incompatible with the features mentioned, or does it mean that > any skewed table is incompatible with said features. > > -Steve >