I can point to possible locations but I'm not sure where this belongs. For starters, STORED AS DIRECTORIES needs to be added to the storage format section<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormat,StorageFormat,andSerDe>in the DDL doc and several config params need to be added to the Configuration Properties<https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties>doc. (I'll take care of the config params.)
As Mayur pointed out, we have the DDL doc and a design doc. There's another design doc too, so take your pick among these locations: - DDL doc - Create Table -- Row Format, Storage Format, and SerDe<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormat,StorageFormat,andSerDe> - Create Table -- Skewed Tables<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-SkewedTables> -- *might be the best place* - CTAS<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableAsSelect(CTAS)>-- if skewed table doesn't work, say so here and in Skewed Tables - Alter Table Storage Properties<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTableStorageProperties>-- add STORED AS DIRECTORIES here or in separate section for skewed tables - Alter Table or Partition<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterEitherTableorPartition>-- *can this be done at the partition level?* - List Bucketing (desgn doc) - Hive Enhancements: Create Table<https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-CreateTable> and Alter Table<https://cwiki.apache.org/confluence/display/Hive/ListBucketing#ListBucketing-AlterTable> - *could have new configuration section in or after Hive Enhancements* - Skewed Join Optimization<https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization> (design doc) - *doesn't seem to belong here* - *Configuration Properties <https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties> * - *definitely doesn't belong here, but we need the parameters * Wherever you put it, I'll add links from some other locations. By the way, is STORED AS DIRECTORIES used for anything other than skewed tables? Thanks. -- Lefty On Fri, Apr 25, 2014 at 6:23 PM, Prasanth Jayachandran < pjayachand...@hortonworks.com> wrote: > Lefty, > > I can add this information. Can you please point me to the location to add > this? Perhaps, you can help reviewing it. > > Thanks > Prasanth Jayachandran > > On Apr 24, 2014, at 1:13 PM, Lefty Leverenz <leftylever...@gmail.com> > wrote: > > I'm looking at the docs and thinking of ways to include this information. > But Prasanth, if you want to do it yourself that would be great. > > -- Lefty > > > On Thu, Apr 24, 2014 at 5:33 AM, Mayur Gupta <mayur.gupt...@gmail.com>wrote: > >> Thanks a lot Prasanth for the reply. I would have never figured that out >> as the documentation at Hive Wiki DDL >> page<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-SkewedTables>and >> design >> page <https://cwiki.apache.org/confluence/display/Hive/ListBucketing> doesn't >> list this. >> >> One additional point it seems the Skewed table doesn't work when the >> table is created as CTAS. The below statement doesn't create separate >> files. Is it a bug or is it by intent? >> >> create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as >> directories select r1, r2 from t2; >> >> >> On Thu, Apr 24, 2014 at 6:12 AM, Prasanth Jayachandran < >> pjayachand...@hortonworks.com> wrote: >> >>> Hi Mayur, >>> >>> The reason why you see single file is, you have not enabled storing >>> skewed columns/values as directories. >>> You can do the following to enable storing the skewed columns and values >>> as directories >>> >>> set hive.mapred.supports.subdirectories=true; >>> set mapred.input.dir.recursive=true; >>> create tablet1(r1 string, r2 string) skewed by (r2) on (‘a’) stored as >>> directories; >>> >>> This will enable you to store the skewed columns as directories below >>> >>> /user/hive/warehouse/t1/r2=a/000000_0 (skewed values go here) >>> /user/hive/warehouse/t1/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME/000000_0 >>> (all other values go here) >>> >>> With respect to your desc extended question where >>> skewedColValueLocationMaps is empty, its a bug in implementation. I just >>> verified that it shows empty for unpartitioned tables. But it shows >>> correctly for partitioned tables. >>> I have created a bug for unpartitioned tables here which you can track >>> for progress on this issue >>> https://issues.apache.org/jira/browse/HIVE-6968 >>> >>> >>> Thanks >>> Prasanth Jayachandran >>> >>> On Apr 23, 2014, at 6:52 AM, Mayur Gupta <mayur.gupt...@gmail.com> >>> wrote: >>> >>> Below is my skewedInfo >>> >>> skewedInfo:SkewedInfo(skewedColNames:[r2], skewedColValues:[[a]], >>> skewedColValueLocationMaps:{}) >>> >>> Any idea why is the skewedColValueLocationMaps empty? >>> >>> >>> On Mon, Apr 21, 2014 at 11:19 AM, Mayur Gupta >>> <mayur.gupt...@gmail.com>wrote: >>> >>>> Hey There, >>>> >>>> I was trying to use Skewed tables but I am facing the issue that it is >>>> not creating separate files for the skewed data. Even with a simple example >>>> I am having the same issue. The hive version is 0.11. >>>> >>>> create table t(col1 string, col2 string); >>>> load data local inpath '/home/hadoop/a.txt' into table t; >>>> >>>> create table t1(r1 string, r2 string) skewed by (r2) on ('a'); >>>> insert into table t1 select * from t; >>>> >>>> The contents of a.txt are : >>>> 1 ^Aa >>>> 2^A b >>>> 3 ^Ac >>>> 4 ^Aa >>>> 5 ^Ab >>>> 6 ^Aa >>>> >>>> I see only single file. >>>> >>>> /user/hive/warehouse/t1/000000_0 >>>> >>>> Any pointers on what I am doing wrong? >>>> >>> >>> >>> >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or entity >>> to which it is addressed and may contain information that is confidential, >>> privileged and exempt from disclosure under applicable law. If the reader >>> of this message is not the intended recipient, you are hereby notified that >>> any printing, copying, dissemination, distribution, disclosure or >>> forwarding of this communication is strictly prohibited. If you have >>> received this communication in error, please contact the sender immediately >>> and delete it from your system. Thank You. >> >> >> > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >