I tried to process a big number of small files on pig and I got a strange
problem.

2011-02-27 00:00:58,746 [Thread-15] INFO
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : *43458*
2011-02-27 00:00:58,755 [Thread-15] INFO
 org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : *43458*
2011-02-27 00:01:14,173 [Thread-15] INFO
 org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : *329*

When the script finish to process, the result is just about a subgroup of
the input files.
These are logs from a whole month,  but the results are just from the day
21.


Maybe I'm missing something.
Any Ideas?

-- 
*Charles Ferreira Gonçalves *
http://homepages.dcc.ufmg.br/~charles/
UFMG - ICEx - Dcc
Cel.: 55 31 87741485
Tel.:  55 31 34741485
Lab.: 55 31 34095840

Reply via email to