Problem with that approach is, with LIMIT 10, If I am putting after desc, then it will get only 10 rows irrespective of BUYER_ID. But I need specifically for each BUYER_ID 10 latest rows.
*Raihan Jamal* On Tue, Jul 10, 2012 at 12:03 AM, Abhishek Tiwari < abhishektiwari.bt...@gmail.com> wrote: > Raihan, > > Andes suggests you use 'limit' after 'desc' ie Hive with sort your query > results in descending order and then chop off any more than first 10 > records. > However, Hive will still run the full scan on data since it has to find > the sorted list of records for you. > > > On Tue, Jul 10, 2012 at 8:36 AM, Raihan Jamal <jamalrai...@gmail.com>wrote: > >> This is my below requirement. I need- *Find `TOP 10` data for each >> `BUYER_ID, *So I cannot use LIMIT 10 here in this case. >> >> This is the below table >> >> CREATE TABLE IF NOT EXISTS TestingTable1 >> ( >> BUYER_ID BIGINT, >> ITEM_ID BIGINT, >> CREATED_TIME STRING >> ) >> >> And this is the below data in the above table- >> >> BUYER_ID | ITEM_ID | CREATED_TIME >> ------------+------------------+----------------------- >> 1015826235 220003038067 2012-07-09 19:40:21, >> 1015826235 300003861266 2012-07-09 18:19:59, >> 1015826235 140002997245 2012-07-09 09:23:17, >> 1015826235 210002448035 2012-07-09 22:21:11, >> 1015826235 260003553381 2012-07-09 07:09:56, >> 1015826235 260003553382 2012-07-09 19:40:39, >> 1015826235 260003553383 2012-07-09 06:58:47, >> 1015826235 260003553384 2012-07-09 07:28:47, >> 1015826235 260003553385 2012-07-09 08:48:47, >> 1015826235 260003553386 2012-07-09 06:38:47, >> 1015826235 260003553387 2012-07-09 05:38:47, >> 1015826235 260003553388 2012-07-09 04:55:47, >> 1015826235 260003553389 2012-07-09 06:54:37, >> 34512201 597245693 2012-07-09 16:20:21, >> 34512201 8071787728 2012-07-09 15:19:59, >> 34512201 5868222883 2012-07-09 08:23:17, >> 34512201 2412180494 2012-07-09 22:21:11, >> 34512201 2422054205 2012-07-09 06:09:56, >> 34512201 1875744030 2012-07-09 19:40:39, >> 34512201 5639158173 2012-07-09 06:58:47, >> 34512201 5656232360 2012-07-09 07:28:47, >> 34512201 959188449 2012-07-09 08:48:47, >> 34512201 4645350592 2012-07-09 06:38:47, >> 34512201 5657320532 2012-07-09 05:38:47, >> 34512201 290419656539 2012-07-09 04:55:47, >> >> If you see the above data in the table, there are only two UNIQUE >> `BUYER_ID` and corresponding to those I have `ITEM_ID` AND `CREATED_TIME`. >> I need only 10 latest record basis on the time for that given day for each >> `BUYER_ID`. >> >> So for this `BUYER_ID` - `34512201` I need 10 latest record basis on >> `CREATED_TIME` for that given day only, it means for today's date I need 10 >> latest record for each `BUYER_ID`. >> >> And each `BUYER_ID` can have any day's data. But I am specifically >> interested for day before today's data(means yesterday's date always) by >> checking at the `CREATED_TIME` >> >> **Find `TOP 10` data for each `BUYER_ID`. Below is the sample output.** >> >> Sample Output. >> >> BUYER_ID | ITEM_ID | CREATED_TIME >> ------------+------------------+----------------------- >> 34512201 2412180494 2012-07-09 22:21:11 >> 34512201 1875744030 2012-07-09 19:40:39 >> 34512201 597245693 2012-07-09 16:20:21 >> 34512201 8071787728 2012-07-09 15:19:59 >> 34512201 959188449 2012-07-09 08:48:47 >> 34512201 5868222883 2012-07-09 08:23:17 >> 34512201 5656232360 2012-07-09 07:28:47 >> 34512201 5639158173 2012-07-09 06:58:47 >> 34512201 4645350592 2012-07-09 06:38:47 >> 34512201 2422054205 2012-07-09 06:09:56 >> 1015826235 210002448035 2012-07-09 22:21:11 >> 1015826235 260003553382 2012-07-09 19:40:39 >> 1015826235 220003038067 2012-07-09 19:40:21 >> 1015826235 300003861266 2012-07-09 18:19:59 >> 1015826235 140002997245 2012-07-09 09:23:17 >> 1015826235 260003553385 2012-07-09 08:48:47 >> 1015826235 260003553384 2012-07-09 07:28:47 >> 1015826235 260003553381 2012-07-09 07:09:56 >> 1015826235 260003553383 2012-07-09 06:58:47 >> 1015826235 260003553389 2012-07-09 06:54:37 >> >> >> *Raihan Jamal* >> >> >> >> On Mon, Jul 9, 2012 at 7:56 PM, Andes <ylyy-1...@163.com> wrote: >> >>> ** >>> hello, you can use "desc" and "limit 10" to filter the top 10. >>> >>> 2012-07-10 >>> ------------------------------ >>> ** >>> Best Regards >>> Andes >>> >>> ** >>> ------------------------------ >>> *发件人:*Raihan Jamal >>> *发送时间:*2012-07-10 10:31 >>> *主题:*Find TOP 10 using HiveQL >>> *收件人:*"user"<user@hive.apache.org> >>> *抄送:* >>> >>> When I run this query, >>> >>> SELECT TOP 10 FROM TestingTable1 WHERE ORDER BY buyer_id, created_time >>> DESC; >>> >>> >>> I always get error as- >>> >>> *FAILED: Parse Error: line 1:7 cannot recognize input 'TOP' in select >>> expression* >>> >>> Is there any way around to use TOP 10 or something similar that will >>> work in HiveQL? >>> >>> >>> >>> *Raihan Jamal* >>> >>> >> > > > > >