Thanks Nitin. I have only one partition in this table for testing. I thought within the partition it will scan only certain files based on skewed fields. However it is scanning the entire data within the partition. On Nov 14, 2013 9:38 AM, "Nitin Pawar" <nitinpawar...@gmail.com> wrote:
> In my understanding, > when you are saying scanning entire dataset it is looking at all your > partitions because your data has been partitioned by the date column. > > A skewed table is a table where there will be different files created for > all your skewed keys in all the partitions. > So for your query it will look at all partitions. > > The setting you have kept is only applicable to join queries as it clearly > says skewjoin. Non join queries it does not have an affect. > > > Thanks, > Nitin > > > > > On Thu, Nov 14, 2013 at 6:35 AM, Rajesh Balamohan < > rajesh.balamo...@gmail.com> wrote: > >> Hi All, >> >> I have the following skewed table "addresses_1" >> >> select id, count(*) c from addresses_1 group by id order by c desc limit >> 10; >> 142624653 1554806 >> 198477395 958492 >> 102641838 220181 >> 138947865 211331 >> 156483436 193429 >> 96411677 179771 >> 210082076 168033 >> 800174765 152421 >> 139116901 141207 >> 704352025 137263 >> >> I was able to create the following table with the skew information. And >> I was able to load the data into the table as well. >> >> CREATE TABLE skew_addresses_1( >> id bigint, >> address_id bigint, >> address_lines string, >> city string, >> state string, >> postal_code string, >> country string, >> latitude string, >> longitude string, >> ) PARTITIONED BY (dateTS string) SKEWED BY (id) ON (142624653, >> 198477395, 102641838, 138947865, 156483436, 96411677, 210082076, 800174765, >> 139116901, 704352025) >> stored as rcfile; >> >> select id,count(*) c from skew_addresses_1 where id=142624653 group by id >> order by c limit 10; >> >> *However, at the time of running select query, entire dataset is >> scanned. * I thought only the relevant dataset (with skew information >> will be scanned). Am I missing anything here? Any help will be >> appreciated. I am using Hive 10.x >> >> I have enabled hive.optimize.skewjoin.compiletime=true and I can see the >> skew information populated in SKEWED_COL_NAMES in metadata. But there is >> no information in SKEWED_COL_VALUE_LOC_MAP table. >> >> >> -- >> ~Rajesh.B >> > > > > -- > Nitin Pawar >