Re: changing field delimiter for an existing table?

2012-05-11 Thread David Kulp
the > expressions I will be "inserting". > > > On Fri, May 11, 2012 at 5:07 PM, David Kulp wrote: > Here is the default textfile. Substitute delimiters as necessary. > > CREATE TABLE ... > ROW FORMAT DELIMITED > FIELDS TERMINATED BY '\001' COLLECT

Re: changing field delimiter for an existing table?

2012-05-11 Thread David Kulp
Here is the default textfile. Substitute delimiters as necessary. CREATE TABLE ... ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\002' MAP KEYS TERMINATED BY '\003' LINES TERMINATED BY '\n' STORED AS TEXTFILE; On May 11, 2012, at 5:58 PM, Igor Tatarinov wrot

Re: Managed vs external tables in hive

2012-05-10 Thread David Kulp
It's simpler than this. All files look the same -- and are often very simple delimited text -- whether managed or external. The only difference is that the files associated with a managed table are dropped when the table is dropped and files that are loaded into a managed table are moved into

Re: using the key from a SequenceFile

2012-04-19 Thread David Kulp
uot;STORED AS SEQUENCEFILE" and you should be golden. You can presumably use one of the alternative serializers in your MR program, but I haven't tried it, yet. -d On Apr 19, 2012, at 8:52 AM, David Kulp wrote: > But I'm not clear on how to write a single row of multiple va

Re: using the key from a SequenceFile

2012-04-19 Thread David Kulp
ses the value part of it, other than that you won’t > notice the difference between sequence or plain text file > > From: David Kulp [mailto:dk...@fiksu.com] > Sent: Thursday, April 19, 2012 2:13 PM > To: user@hive.apache.org > Subject: Re: using the key from a SequenceFile >

Re: using the key from a SequenceFile

2012-04-19 Thread David Kulp
I'm trying to achieve something very similar. I want to write an MR program that writes results in a record-based sequencefile that would be directly readable from hive as though it were created using "STORED AS SEQUENCEFILE" with, say, BinarySortableSerDe. From this discussion it seems that H

Re: Lag function in Hive

2012-04-10 Thread David Kulp
FROM mytable t1 > JOIN mytable t2 ON (t1.rownum = t2.rownum + 1 AND t2.partition=bar) > WHERE t1.partition=foo; > > This should be faster as partition selection will happen earlier. > > This is still going to involve an awful lot of I/O, and not going to be fast. > > Phil. >

Re: Lag function in Hive

2012-04-10 Thread David Kulp
CRIBE FORMATTED tablename". On Apr 10, 2012, at 10:51 AM, wrote: > Thanks - I will check this out. > > Meanwhile, would default clustering happen using rownum? How can I check on > how is clustering happening in our environment? > > Rgds > > ----- Original

Re: Lag function in Hive

2012-04-10 Thread David Kulp
New here. Hello all. Could you try a self-join, possibly also restricted to partitions? E.g. SELECT t2.value - t1.value FROM mytable t1, mytable t2 WHERE t1.rownum = t2.rownum+1 AND t1.partition=foo AND t2.partition=bar If your data is clustered by rownum, then this join should, in theory, be