Hi,
I am loading a CSV file, which has 177692 records. However, if I perform a
row count after I load the CSV file into Pig, it gives an output of 177700,
which is 8 records more than the data present in the original file. I am
not doing any processing, but just loading and displaying the record count.
src_data = LOAD '/user/src_data.csv' USING
org.apache.pig.piggybank.storage.CSVExcelStorage
(',','YES_MULTILINE','UNIX','SKIP_INPUT_HEADER') AS
(col1:chararray, col2:chararray,col3:chararray, col4:chararray);
alias_for_count = GROUP src_data ALL;
alias_for_join_count = FOREACH alias_for_count GENERATE
COUNT_STAR (src_data ) AS num_rows;
DUMP alias_for_join_count;
May I know what could be the reason for this behavior?
Thanks & Regards
Vijay
--
The contents of this e-mail are confidential and for the exclusive use of
the intended recipient. If you receive this e-mail in error please delete
it from your system immediately and notify us either by e-mail or
telephone. You should not copy, forward or otherwise disclose the content
of the e-mail. The views expressed in this communication may not
necessarily be the view held by WHISHWORKS.