Mostly wanted to tell hive it's sorted so it could use more efficient joins like a map side join. No other reason On Aug 3, 2015 10:47 AM, "Ryan Harris" <ryan.har...@zionsbancorp.com> wrote:
> Unless you are using bucketing and sampling, there is no benefit (that I > can think of) to informing hive that the data **is** in fact sorted... > > > > If there is something specific you are trying to accomplish by specifying > the sort order of that column, perhaps you can elaborate on that. > Otherwise, leave out the 'sorted by' statement and you should be fine. > > > > *From:* David Capwell [mailto:dcapw...@gmail.com] > *Sent:* Monday, August 03, 2015 10:50 AM > *To:* user@hive.apache.org > *Subject:* Re: External sorted tables > > > > Based off the ddl it is required to have buckets, I was wondering if there > was a way to get around it? > > Thinking as a hack I could try bucket=1, but if there is a better way > would love to know > > On Aug 2, 2015 6:18 PM, "Takahiko Saito" <tysa...@gmail.com> wrote: > > Hi, > > > > Is it possible that 'create table sorted by' must have buckets? > > > > I found the below statements in > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL: > > > > "The CLUSTERED BY and SORTED BY creation commands do not affect how data > is inserted into a table – only how it is read. This means that users must > be careful to insert data correctly by specifying the number of reducers to > be equal to the number of buckets, and using CLUSTER BY and SORT BY > commands in their query." > > > > On Thu, Jul 30, 2015 at 7:22 PM, David Capwell <dcapw...@gmail.com> wrote: > > We are trying to create a external table in hive. This data is sorted, > so wanted to tell hive about this. When I do, it complains about > parsing the create. > > > CREATE EXTERNAL TABLE IF NOT EXISTS store.testing ( > ... > . . . . . . . . . . . . . . . . . . .> timestamp bigint, > ...) > . . . . . . . . . . . . . . . . . . .> SORTED BY (timestamp) > ... > . . . . . . . . . . . . . . . . . . .> LOCATION '/project/db/table' > . . . . . . . . . . . . . . . . . . .> ; > Error: Error while compiling statement: FAILED: ParseException line > 1:507 missing EOF at 'SORTED' near ')' (state=42000,code=40000) > 2: jdbc:hive2://localhost:10000/store> > > What can I do to let hive know that my data is sorted? Every example > online of sorted by is grouped with buckets, but we really don't want > to add bucketing. > > > Hive version: 0.14.0 > > Thanks for your help! > > > > > > -- > > Takahiko Saito > ------------------------------ > THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS > CONFIDENTIAL and may contain information that is privileged and exempt from > disclosure under applicable law. If you are neither the intended recipient > nor responsible for delivering the message to the intended recipient, > please note that any dissemination, distribution, copying or the taking of > any action in reliance upon the message is strictly prohibited. If you have > received this communication in error, please notify the sender immediately. > Thank you. >