Lefty, I have updated the hive wiki in few places to say we should use "stored as directories" for list bucketing features. There are two different optimizations that uses "SKEWED BY” keyword. One is skewed join optimization and other is list bucketing optimization. I think we need to mention this in some place so that users are aware of the difference between the two. “STORED AS DIRECTORIES” is used by only one optimization i.e list bucketing.
Following are the design docs for both https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization https://cwiki.apache.org/confluence/display/Hive/ListBucketing Thanks Prasanth Jayachandran On Apr 27, 2014, at 11:28 PM, Lefty Leverenz <leftylever...@gmail.com> wrote: > Prasanth, Hive's user docs are wiki-only at this point so there's no version > control. We just add notes about which release introduced or changed > something. For an example see the beginning of the Skewed Tables section. > Sometimes the version information isn't called out like that, though, it's > just part of the text. And in the CREATE TABLE syntax it's a comment > alongside a clause such as TBLPROPERTIES. > > The procedure for getting wiki access is described in About This Wiki: > How to get permission to edit > Create a Confluence account > Sign up for the user mailing list by sending a message to > user-subscr...@hive.apache.org > Send a message to user@hive.apache.org requesting write access > > Ashutosh has been granting wiki edit privileges lately (Carl Steinbach used > to do it). I don't know how it's done or I'd gladly give you access. > > I hope you'll be able to take care of this doc because you understand skewed > tables and I only know what I've read in the wiki, so I think you'll do a > better job. But of course I'll review it and tinker with it a bit. > > > -- Lefty > > > On Mon, Apr 28, 2014 at 1:40 AM, Prasanth Jayachandran > <pjayachand...@hortonworks.com> wrote: > @Mayur.. I don’t think the initial design considered CTAS for skewed tables. > So it might not be supported at all. > > @Lefty.. I am not sure where/how the docs are maintained. Is it version > controlled? Or is it only maintained in confluence wiki? If it is the later > can you please provide me access to edit the wiki? or alternatively if you > can update the docs adding “stored as directories” to the examples, it will > be great. Also updating the docs with “CTAS not supported for list bucketing”. > > Thanks > Prasanth Jayachandran > > On Apr 26, 2014, at 8:03 AM, Mayur Gupta <mayur.gupt...@gmail.com> wrote: > >> Hey Prasanth, >> >> The CTAS for skewed table doesn't work, is it a bug? >> >> create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as >> directories select r1, r2 from t2; >> >> >> On Thu, Apr 24, 2014 at 3:03 PM, Mayur Gupta <mayur.gupt...@gmail.com> wrote: >> Thanks a lot Prasanth for the reply. I would have never figured that out as >> the documentation at Hive Wiki DDL page and design page doesn't list this. >> >> One additional point it seems the Skewed table doesn't work when the table >> is created as CTAS. The below statement doesn't create separate files. Is it >> a bug or is it by intent? >> >> create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as >> directories select r1, r2 from t2; >> >> >> On Thu, Apr 24, 2014 at 6:12 AM, Prasanth Jayachandran >> <pjayachand...@hortonworks.com> wrote: >> Hi Mayur, >> >> The reason why you see single file is, you have not enabled storing skewed >> columns/values as directories. >> You can do the following to enable storing the skewed columns and values as >> directories >> >> set hive.mapred.supports.subdirectories=true; >> set mapred.input.dir.recursive=true; >> create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as >> directories; >> >> This will enable you to store the skewed columns as directories below >> >> /user/hive/warehouse/t1/r2=a/000000_0 (skewed values go here) >> /user/hive/warehouse/t1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/000000_0 (all >> other values go here) >> >> With respect to your desc extended question where skewedColValueLocationMaps >> is empty, its a bug in implementation. I just verified that it shows empty >> for unpartitioned tables. But it shows correctly for partitioned tables. >> I have created a bug for unpartitioned tables here which you can track for >> progress on this issue https://issues.apache.org/jira/browse/HIVE-6968 >> >> >> Thanks >> Prasanth Jayachandran >> >> On Apr 23, 2014, at 6:52 AM, Mayur Gupta <mayur.gupt...@gmail.com> wrote: >> >>> Below is my skewedInfo >>> >>> skewedInfo:SkewedInfo(skewedColNames:[r2], skewedColValues:[[a]], >>> skewedColValueLocationMaps:{}) >>> >>> Any idea why is the skewedColValueLocationMaps empty? >>> >>> >>> On Mon, Apr 21, 2014 at 11:19 AM, Mayur Gupta <mayur.gupt...@gmail.com> >>> wrote: >>> Hey There, >>> >>> I was trying to use Skewed tables but I am facing the issue that it is not >>> creating separate files for the skewed data. Even with a simple example I >>> am having the same issue. The hive version is 0.11. >>> >>> create table t(col1 string, col2 string); >>> load data local inpath '/home/hadoop/a.txt' into table t; >>> >>> create table t1(r1 string, r2 string) skewed by (r2) on ('a'); >>> insert into table t1 select * from t; >>> >>> The contents of a.txt are : >>> 1 ^Aa >>> 2^A b >>> 3 ^Ac >>> 4 ^Aa >>> 5 ^Ab >>> 6 ^Aa >>> >>> I see only single file. >>> >>> /user/hive/warehouse/t1/000000_0 >>> >>> Any pointers on what I am doing wrong? >>> >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity to >> which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader of >> this message is not the intended recipient, you are hereby notified that any >> printing, copying, dissemination, distribution, disclosure or forwarding of >> this communication is strictly prohibited. If you have received this >> communication in error, please contact the sender immediately and delete it >> from your system. Thank You. >> >> > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader of > this message is not the intended recipient, you are hereby notified that any > printing, copying, dissemination, distribution, disclosure or forwarding of > this communication is strictly prohibited. If you have received this > communication in error, please contact the sender immediately and delete it > from your system. Thank You. > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.