When you say
LATERAL VIEW explode(purchased_item) exploded_table as prod_and_ts
The last alias is an alias to a column; not a table. explode will only
produce one column per row; it cannot produce multiple columns. I hope
that helps.
On Tue, Jul 10, 2012 at 8:31 PM, Raihan Jamal wrote:
> I have t
Yup this works. Thanks for the help.
*Raihan Jamal*
On Tue, Jul 10, 2012 at 4:37 PM, Vijay wrote:
> In that case, wouldn't this work:
>
> SELECT buyer_id, item_id, rank(buyer_id), created_time
> FROM (
> SELECT buyer_id, item_id, created_time
> FROM testingtable1
> DISTRIBUTE BY
I have this below query, whenever I try to execute it, I always get error,
which I am not able to understand.
SELECT
t1.buyer_id,
t1.item_id,
t1.created_time,
t2.product_id,
t2.timestamps
FROM
TestingTable1 t1
JOIN
(
SELECT
user_id,
prod_and_ts.product_id as
Hi,
We use Hive to store user click logs and dig out interesting trends, doing
a lot of big table joins.
We mostly partition by date, and inside each partition data is bucketed by
some hashed-id.
We are wondering how many buckets to use?
If I understand correctly, Hive DB stores each bucket in on
Hello. I have a problem with the filesystem closing.
The filesystem was closed when the hive query is running.
It is 'select' query and the data size is about 1TB.
I'm using hadoop-0.20.2 and hive-0.7.1.
The error log is telling that tmp file is not deleted, or the tmp path
exception is occurred.
In that case, wouldn't this work:
SELECT buyer_id, item_id, rank(buyer_id), created_time
FROM (
SELECT buyer_id, item_id, created_time
FROM testingtable1
DISTRIBUTE BY buyer_id
SORT BY buyer_id, created_time desc
) a
WHERE rank(buyer_id) < 10;
On Tue, Jul 10, 2012 at 4:21 PM, Rai
This is a little tricky but this is how it works:
SELECT buyer_id, item_id, rank(item_id), created_time
FROM (
SELECT buyer_id, item_id, created_time
FROM testingtable1
DISTRIBUTE BY buyer_id, item_id
SORT BY buyer_id, item_id, created_time desc
) a
WHERE rank(item_id) < 10;
On T
Thanks Vijay, Yes it worked. Can you also take a look into one of my other
post subject title *TOP 10.*
*Raihan Jamal*
On Tue, Jul 10, 2012 at 1:41 PM, Vijay wrote:
> to_date(from_unixtime(cast(timestamps as int)))
>
> On Tue, Jul 10, 2012 at 1:33 PM, Raihan Jamal
> wrote:
> > I need only
Thanks Kulkarni - I think I have the right data types but here are DDL
definitions and query.
Query:
select registration
from studenttab10k s right outer join votertab10k v
on (s.name = v.name);
Table definitions:
create external table IF NOT EXISTS
to_date(from_unixtime(cast(timestamps as int)))
On Tue, Jul 10, 2012 at 1:33 PM, Raihan Jamal wrote:
> I need only the date not the hours and second, so that is the reason I was
> using to_date and from_unxitime() take int as parameter and timestamps is a
> string in this case.
>
>
>
>
> Raihan J
I need only the date not the hours and second, so that is the reason I was
using to_date and from_unxitime() take int as parameter and timestamps is a
string in this case.
*Raihan Jamal*
On Tue, Jul 10, 2012 at 1:28 PM, Vijay wrote:
> You need to use from_unixtime()
>
> On Tue, Jul 10, 201
You need to use from_unixtime()
On Tue, Jul 10, 2012 at 1:01 PM, Raihan Jamal wrote:
> Here timestamps is a string and I always get NULL in the second column when
> I try to get date out of the timestamp. Anything wrong I am doing?
>
>
> select A.timestamps, to_date(A.timestamps) from
> (select u
Hi Kanna,
This might just mean that in your query you are having a STRING type for a
field which is actually a DOUBLE.
On Tue, Jul 10, 2012 at 3:05 PM, Kanna Karanam wrote:
> Has anyone seen this error before? Am I missing anything here?
>
> ** **
>
> 2012-07-10 11:11:02,203 INFO org.apache
Has anyone seen this error before? Am I missing anything here?
2012-07-10 11:11:02,203 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_201207091248_0107_m_00_0: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing
Here timestamps is a string and I always get NULL in the second column when
I try to get date out of the timestamp. Anything wrong I am doing?
*
*
*
*
select A.timestamps, to_date(A.timestamps) from
(select user_id, prod_and_ts.product_id as product_id,
prod_and_ts.timestamps as timestamps from tes
Still it's not working with the use of my rank UDF. Below is the query I am
using
Can anyone help me, what changes I need to make in my below sql query?
CREATE TABLE IF NOT EXISTS TestingTable1
(
BUYER_ID BIGINT,
ITEM_ID BIGINT,
CREATED_TIME STRING
)
*Find TOP 10 latest data (ITEM_ID, CREATE
This appears to be a Hive issue (something probably called FS.close()
too early?). Redirecting to the Hive user lists as they can help
better with this.
On Tue, Jul 10, 2012 at 9:59 PM, 안의건 wrote:
> Hello. I have a problem with the filesystem closing.
>
> The filesystem was closed when the hive q
You could use TRANSFORM with a simple awk script:
TRANSFORM(a, b, c, d)
USING "/usr/bin/awk '
{if($1!=c){c=$1; a=0}; if(a<20){print $0; a++}}'"
This will create a top 20 for each group.
--Wouter de Bie
Team Lead Analytics Infrastructure, Spotify
wou...@spotify.com (mailto:wou...@spotify.c
Hi Raihan,
You should use 'rank(buyer_id)' in the order by clause on line 9 in stead
of the alias 'rk'. I had the same problem, strangely, the alias is not
resolved when it is in the order by clause.
Other thing, I had some issues when I used this exact same set up for
ranking results, that the r
I am trying that solution. Currently I am running my query to see what
result I am getting back with UDF.
*Raihan Jamal*
On Tue, Jul 10, 2012 at 12:13 AM, Nitin Pawar wrote:
> i thought you managed to solve this with rank??
>
>
> On Tue, Jul 10, 2012 at 12:38 PM, Raihan Jamal wrote:
>
>> Pr
i thought you managed to solve this with rank??
On Tue, Jul 10, 2012 at 12:38 PM, Raihan Jamal wrote:
> Problem with that approach is, with LIMIT 10, If I am putting after desc,
> then it will get only 10 rows irrespective of BUYER_ID. But I need
> specifically for each BUYER_ID 10 latest rows.
>
Problem with that approach is, with LIMIT 10, If I am putting after desc,
then it will get only 10 rows irrespective of BUYER_ID. But I need
specifically for each BUYER_ID 10 latest rows.
*Raihan Jamal*
On Tue, Jul 10, 2012 at 12:03 AM, Abhishek Tiwari <
abhishektiwari.bt...@gmail.com> wrote
Raihan,
Andes suggests you use 'limit' after 'desc' ie Hive with sort your query
results in descending order and then chop off any more than first 10
records.
However, Hive will still run the full scan on data since it has to find the
sorted list of records for you.
On Tue, Jul 10, 2012 at 8:36 A
change this " SELECT buyer_id, item_id, created_time" to " SELECT buyer_id,
item_id, created_time, rk"
On Tue, Jul 10, 2012 at 12:30 PM, Raihan Jamal wrote:
> I was not able to understand, This is my below qiuery that I am using
> currently-
>
> SELECT buyer_id, item_id, created_time
> FROM (
>
I was not able to understand, This is my below qiuery that I am using
currently-
SELECT buyer_id, item_id, created_time
FROM (
SELECT buyer_id, item_id, rank(buyer_id) as rank, created_time
FROM testingtable1
DISTRIBUTE BY buyer_id, item_id
SORT BY buyer_id, item_id, created_time d
25 matches
Mail list logo