Hi,

I am loading a CSV file, which has 177692 records. However, if I perform a
row count after I load the CSV file into Pig, it gives an output of 177700,
which is 8 records more than the data present in the original file. I am
not doing any processing, but just loading and displaying the record count.

src_data = LOAD '/user/src_data.csv' USING
org.apache.pig.piggybank.storage.CSVExcelStorage
(',','YES_MULTILINE','UNIX','SKIP_INPUT_HEADER') AS
(col1:chararray, col2:chararray,col3:chararray, col4:chararray);

alias_for_count  = GROUP src_data ALL;
alias_for_join_count = FOREACH alias_for_count  GENERATE
COUNT_STAR (src_data ) AS num_rows;

DUMP alias_for_join_count;

May I know what could be the reason for this behavior?


Thanks & Regards
Vijay

-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.

Reply via email to