Re: Identifying and Marking records as duplicates

2012-08-17 Thread Bob Gause
We use com.facebook.hive.udf.UDFNumberRows to do a ranking by time in some of our queries. You could do that, and then do another select where the row number/rank is 1 to get all the "unique" rows. There are probably a bunch of other ways to do this, but this is the one that first came to mind

Re: Hive append support

2012-08-09 Thread Bob Gause
Hive has no update & delete statements. You can drop a table, and that is as closes to a delete as you get. The only "update" you get is to append more data to a table. There is INSERT OVERWRITE & INSERT. The first will create the first set of rows in the table, the second will append more data

Re: Find the files which contains a particular String

2012-07-31 Thread Bob Gause
We do a similar process with our log files in Hive. We only handle 30 to 60 files (similar structure) at a time, but it sounds like it would fit your model….. We create an external table, then do hdfs puts to add the files to the table: CREATE EXTERNAL TABLE log_import( date STRING, time ST