How to load lines into Hive while breaking them by words?

2011-09-26 Thread Mark Kerzner
Hi, a simple question - if I have a book as a text, and I want to load it into a Hive table, with one word forming one entry, how should I do it? Thank you, Mark

Re: "Path Is Not Legal" when loading HDFS->S3

2011-09-26 Thread Miguel Cabero
Hi Bradford, For tables stored on s3, you have to specify : create EXTERNAL table events … Regards, Miguel On 27 Sep 2011, at 00:28, Jonathan Seidman wrote: > Hey Bradford - from my experience that error occurs when there's a conflict > between the "default.fs.name" setting and the value in t

Re: Skip first line of CSV loading

2011-09-26 Thread Bradford Stephens
Any thoughts on this? On Wed, Apr 13, 2011 at 1:55 PM, Daniel Jue wrote: > Is there a way to have hive skip the first line of CSV loading (say, > to skip column headers)? > > Or will this require a second stage with a transform, and > a) a hard coded knowledge of what a header row might contain,

Re: how to fliter some special logs

2011-09-26 Thread Shouguo Li
i found it easier to use scripts for special parsing needs. so say you want to import apache access logs for analysis, first load the log file to some staging table, say tmp_log_file_staging_parts, then you would write query like this: INSERT OVERWRITE TABLE access_logs PARTITION(date_str='2011-0

Re: "Path Is Not Legal" when loading HDFS->S3

2011-09-26 Thread Jonathan Seidman
Hey Bradford - from my experience that error occurs when there's a conflict between the "default.fs.name" setting and the value in the metastore.SDS.location column in the Hive metadata. For us this has occurred when either migrating to a new cluster or changing the NN hostname. Not sure how all th

"Path Is Not Legal" when loading HDFS->S3

2011-09-26 Thread Bradford Stephens
Hey amigos, I'm doing a EMR load for HDFS to S3 data. My example looks correct, but I'm getting an odd error. Since all the EMR data is in one directory, I'm copying the file to HDFS, then doing 'LOAD DATA INPATH' to put it back into S3. CREATE TABLE events( ..blahblah... ) ROW FORMAT DELIMITED F

mapjoin and distinct.

2011-09-26 Thread jun li
hi, it seems that mapjoin and distinct can not be together. else it throw Parse Error select distinct /*+ MAPJOIN(a) */ or select /*+ MAPJOIN(a) */ distinct -- Li Jun