Would you please elaborate on it? Thanks
Tim On 6/14/12 10:30 AM, "Edward Capriolo" <edlinuxg...@gmail.com> wrote: >We have had a ticket open for quite some time for combine input format >to work across partitions. Not sure if that can help with what you are >seeing as well. It could help us alot. > >Edward > >On Thu, Jun 14, 2012 at 1:25 PM, Gang Liu <g...@fb.com> wrote: >> Hey Edward, >> >> Thank you very much for providing comments. >> >> This feature is designed for use cases described in wiki. We do see them >> in the real life so that we come up with the feature. >> >> In this first release, in order to use the feature: >> 1. Hive table users need to know the skewed key in advance >> 2. Hive table users need to know the skewed key is the same each >> partition. >> 3. If Hive table users know skewed key change, they can "alter" skewed >>key >> via "alter" statement. >> >> 4. If #3 happens, old partitions have old skewed key and new partition >> have new. It's expected. >> >> We may consider the following in the future release: >> 1. Hive instruments skewed key and displays them to user >> >> Thanks >> >> Tim >> >> >> On 6/14/12 9:34 AM, "Edward Capriolo" <edlinuxg...@gmail.com> wrote: >> >>>I am of the opinion this feature is too specialized to be generally >>>helpful. >>> >>>------------------------------- >>>The cardinality of 'x' is in 1000's per partition of T. Moreover, >>>there is a skew for the values of 'x'. In general, there are ~10 >>>values of 'x' which have a very large skew, and the remaining >>>values of 'x' have a small cardinality. Also, note that this mapping >>>(values of 'x' with a high cardinality can change daily). >>>-------------------------- >>> >>>In these cases you should use clustering/bucketing. This will prevent >>>the skew you are talking about. If you want more efficiency in certain >>>query types build a index on top of the original table. >>> >>>I understand someone wanting to do this because mysql partition can do >>>this, but this sounds like a management problem. Who is to say the >>>skew is the same each partition? >>> >>>----------------------------------------- >>>hive compiler to do input pruning. The list of skewed keys is stored >>>at the table level (note that, this list can be initially supplied by >>>the client periodically, and can be eventually updated when a new >>>partition is being loaded). >>>----------------------------------------- >>> >>>Imagine you have a table partitioned by hour and two datacenters China >>>and NY. At some hours the skew will be different. Skews change over >>>time. Since this property is table level I do not understand how this >>>would be changed. >>> >>> >>> >>>On Thu, Jun 14, 2012 at 4:14 AM, Carl Steinbach <c...@cloudera.com> >>>wrote: >>>> Hi Tim, >>>> >>>> I added some comments to the wiki a couple days ago. I just wanted to >>>>make >>>> sure you saw them since it doesn't look like you're registered as a >>>>watcher >>>> for that page. >>>> >>>> Thanks. >>>> >>>> Carl >>>> >>>> On Mon, Jun 11, 2012 at 12:22 PM, Gang Liu <g...@fb.com> wrote: >>>> >>>>> Hi Carl, thanks Tim >>>>> >>>>> On 6/11/12 12:14 PM, "Carl Steinbach" <c...@cloudera.com> wrote: >>>>> >>>>> >+ hcatalog-dev >>>>> > >>>>> >On Mon, Jun 11, 2012 at 12:09 PM, Carl Steinbach <c...@cloudera.com> >>>>> >wrote: >>>>> > >>>>> >> This link may work better for some people: >>>>> >> >>>>> >> https://cwiki.apache.org/confluence/display/Hive/ListBucketing >>>>> >> >>>>> >> Thanks. >>>>> >> >>>>> >> Carl >>>>> >> >>>>> >> >>>>> >> On Mon, Jun 11, 2012 at 12:03 PM, Gang Liu <g...@fb.com> wrote: >>>>> >> >>>>> >>> Dear all hive developers, >>>>> >>> >>>>> >>> We are making good progress of implementing the list bucketing >>>>> >>>feature. It >>>>> >>> should be available soon in weeks. >>>>> >>> >>>>> >>> We'd like to call feature review again and please provide your >>>>> >>>comments. >>>>> >>> >>>>> >>> Thanks >>>>> >>> >>>>> >>> Tim >>>>> >>> >>>>> >>> On 6/1/12 10:13 AM, "Gang Liu" <g...@fb.com> wrote: >>>>> >>> >>>>> >>> >Dear all, >>>>> >>> > >>>>> >>> >Please review the proposal and provide your comments: >>>>> >>> > >>>>> >>> >https://cwiki.apache.org/Hive/listbucketing.html >>>>> >>> > >>>>> >>> > >>>>> >>> >Thanks >>>>> >>> > >>>>> >>> >Tim >>>>> >>> > >>>>> >>> >>>>> >>> >>>>> >> >>>>> >>>>> >>