I think I found a bug in versions 0.12.0 through 0.14.0. I've been trying
to get passed it all day. Can someone please confirm?
The below is the bare minimum I was able to extract from my original
problem to in order to demonstrate the bug. So, don't expect the following
code to serve any practical purpose. :)
My input file (test_in) is two columns with a tab delimiter:
1 F
2 F
My streaming function (sf.py) ignores the actual input and simply generates
2 records:
#!/usr/bin/python
if __name__ == '__main__':
print 'x'
print 'y'
(But I should mention that in my original problem the input to output was
one-to-one. I just ignored the input here to get to the bare minimum
effect.)
My pig script:
MY_INPUT = load 'test_in' as ( f1, f2);
split MY_INPUT into T if (f2 == 'T'), F otherwise;
T2 = group T by f1;
store T2 into 'test_out/T2';
F2 = group F by f1;
store F2 into 'test_out/F2'; -- (this line is actually optional to demo
the bug)
F3 = stream F2 through `sf.py`;
store F3 into 'test_out/F3';
My expected output for test/out/F3 is two records that come directly from
sf.py:
x
y
However, I only get:
x
I've tried all of the following to get the expected behavior:
- upgraded Pig from 0.12.0 to 0.14.0
- local vs. distributed mode
- flush sys.stdout in the streaming function
- replace sf.py with sf.sh which is a bash script that used "echo x;
echo y" to do the same thing. In this case, the final contents of
test_out/F# would vary - sometimes I would get both x and y, and sometimes
I would just get x.
Aside from removing the one Pig line that I've marked optional, any other
attempts to simplify the Pig script or input file causes the bug to not
manifest.
I've attached the logs from running "pig -x local test.pig"
Thanks,
Steve
$ pig -x local test.pig
15/04/30 21:47:43 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
15/04/30 21:47:43 INFO pig.ExecTypeProvider: Picked LOCAL as the ExecType
2015-04-30 21:47:44,006 [main] INFO org.apache.pig.Main - Apache Pig version
0.14.0 (r1640057) compiled Nov 16 2014, 18:02:05
2015-04-30 21:47:44,013 [main] INFO org.apache.pig.Main - Logging error
messages to: /home/hadoop/steve/pig_1430430464004.log
2015-04-30 21:47:44,051 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - user.name is deprecated.
Instead, use mapreduce.job.user.name
2015-04-30 21:47:45,141 [main] INFO org.apache.pig.impl.util.Utils - Default
bootup file /home/hadoop/.pigbootup not found
2015-04-30 21:47:45,411 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2015-04-30 21:47:45,417 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is
deprecated. Instead, use mapreduce.jobtracker.address
2015-04-30 21:47:45,420 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
hadoop file system at: file:///
2015-04-30 21:47:46,869 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2015-04-30 21:47:47,477 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2015-04-30 21:47:47,702 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2015-04-30 21:47:48,131 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2015-04-30 21:47:48,193 [main] WARN org.apache.pig.newplan.BaseOperatorPlan -
Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 2 time(s).
2015-04-30 21:47:48,228 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation -
mapred.textoutputformat.separator is deprecated. Instead, use
mapreduce.output.textoutputformat.separator
2015-04-30 21:47:48,301 [main] INFO org.apache.pig.tools.pigstats.ScriptState
- Pig features used in the script: GROUP_BY,STREAMING
2015-04-30 21:47:48,431 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2015-04-30 21:47:48,561 [main] INFO
org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer -
{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator,
GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter,
MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer,
PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2015-04-30 21:47:48,891 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File
concatenation threshold: 100 optimistic? false
2015-04-30 21:47:48,928 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2015-04-30 21:47:49,039 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 5
2015-04-30 21:47:49,045 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 2 map-only splittees.
2015-04-30 21:47:49,049 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 2 out of total 3 MR operators.
2015-04-30 21:47:49,052 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged MR job 33 into MR job 30
2015-04-30 21:47:49,060 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged MR job 35 into MR job 30
2015-04-30 21:47:49,060 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Requested parallelism of splitter: -1
2015-04-30 21:47:49,060 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 2 map-reduce splittees.
2015-04-30 21:47:49,060 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- Merged 2 out of total 3 MR operators.
2015-04-30 21:47:49,061 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2015-04-30 21:47:49,156 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2015-04-30 21:47:49,206 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - session.id is deprecated.
Instead, use dfs.metrics.session-id
2015-04-30 21:47:49,212 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics -
Initializing JVM Metrics with processName=JobTracker, sessionId=
2015-04-30 21:47:49,716 [main] INFO
org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are
added to the job
2015-04-30 21:47:49,730 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation -
mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use
mapreduce.reduce.markreset.buffer.percent
2015-04-30 21:47:49,730 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-04-30 21:47:49,731 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is
deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2015-04-30 21:47:49,734 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Reduce phase detected, estimating # of required reducers.
2015-04-30 21:47:49,735 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Using reducer estimator:
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2015-04-30 21:47:49,754 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
- BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=8
2015-04-30 21:47:49,754 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting Parallelism to 1
2015-04-30 21:47:49,754 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is
deprecated. Instead, use mapreduce.job.reduces
2015-04-30 21:47:49,876 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up multi store job
2015-04-30 21:47:50,164 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2015-04-30 21:47:50,171 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation -
mapred.job.tracker.http.address is deprecated. Instead, use
mapreduce.jobtracker.http.address
2015-04-30 21:47:50,208 [JobControl] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with
processName=JobTracker, sessionId= - already initialized
2015-04-30 21:47:50,762 [JobControl] WARN
org.apache.hadoop.mapreduce.JobSubmitter - No job jar file set. User classes
may not be found. See Job or Job#setJar(String).
2015-04-30 21:47:50,906 [JobControl] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to
process : 1
2015-04-30 21:47:50,906 [JobControl] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
2015-04-30 21:47:50,927 [JobControl] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : 1
2015-04-30 21:47:50,989 [JobControl] INFO
org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2015-04-30 21:47:51,548 [JobControl] INFO
org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job:
job_local2069324582_0001
2015-04-30 21:47:51,664 [JobControl] WARN org.apache.hadoop.conf.Configuration
-
file:/tmp/hadoop-hadoop/mapred/staging/hadoop2069324582/.staging/job_local2069324582_0001/job.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval; Ignoring.
2015-04-30 21:47:51,687 [JobControl] WARN org.apache.hadoop.conf.Configuration
-
file:/tmp/hadoop-hadoop/mapred/staging/hadoop2069324582/.staging/job_local2069324582_0001/job.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.attempts; Ignoring.
2015-04-30 21:47:52,331 [JobControl] WARN org.apache.hadoop.conf.Configuration
-
file:/tmp/hadoop-hadoop/mapred/local/localRunner/hadoop/job_local2069324582_0001/job_local2069324582_0001.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval; Ignoring.
2015-04-30 21:47:52,350 [JobControl] WARN org.apache.hadoop.conf.Configuration
-
file:/tmp/hadoop-hadoop/mapred/local/localRunner/hadoop/job_local2069324582_0001/job_local2069324582_0001.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.attempts; Ignoring.
2015-04-30 21:47:52,376 [JobControl] INFO org.apache.hadoop.mapreduce.Job -
The url to track the job: http://localhost:8080/
2015-04-30 21:47:52,378 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local2069324582_0001
2015-04-30 21:47:52,378 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Processing aliases F,F2,F3,MY_INPUT,T,T2
2015-04-30 21:47:52,378 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- detailed locations: M: MY_INPUT[1,11],T[2,20],T2[3,5],F[2,38],F2[5,5] C: R:
F3[7,5]
2015-04-30 21:47:52,403 [Thread-7] INFO
org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null
2015-04-30 21:47:52,421 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2015-04-30 21:47:52,421 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Running jobs are [job_local2069324582_0001]
2015-04-30 21:47:52,505 [Thread-7] INFO
org.apache.hadoop.conf.Configuration.deprecation -
mapred.textoutputformat.separator is deprecated. Instead, use
mapreduce.output.textoutputformat.separator
2015-04-30 21:47:52,513 [Thread-7] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2015-04-30 21:47:52,515 [Thread-7] INFO
org.apache.hadoop.conf.Configuration.deprecation -
mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use
mapreduce.reduce.markreset.buffer.percent
2015-04-30 21:47:52,516 [Thread-7] INFO
org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is
deprecated. Instead, use mapreduce.job.reduces
2015-04-30 21:47:52,516 [Thread-7] INFO
org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is
deprecated. Instead, use mapreduce.jobtracker.address
2015-04-30 21:47:52,546 [Thread-7] INFO
org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter
2015-04-30 21:47:52,844 [Thread-7] INFO
org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks
2015-04-30 21:47:52,845 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.LocalJobRunner - Starting task:
attempt_local2069324582_0001_m_000000_0
2015-04-30 21:47:52,998 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ]
2015-04-30 21:47:53,007 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
Total Length = 8
Input split[0]:
Length = 8
ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit
Locations:
-----------------------
2015-04-30 21:47:53,035 [LocalJobRunner Map Task Executor #0] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader -
Current split being processed file:/home/hadoop/steve/test_in:0+8
2015-04-30 21:47:53,047 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.MapTask - Map output collector class =
org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2015-04-30 21:47:53,258 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.MapTask - (EQUATOR) 0 kvi 26214396(104857584)
2015-04-30 21:47:53,258 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.MapTask - mapreduce.task.io.sort.mb: 100
2015-04-30 21:47:53,259 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.MapTask - soft limit at 83886080
2015-04-30 21:47:53,259 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.MapTask - bufstart = 0; bufvoid = 104857600
2015-04-30 21:47:53,259 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.MapTask - kvstart = 26214396; length = 6553600
2015-04-30 21:47:53,280 [LocalJobRunner Map Task Executor #0] INFO
com.hadoop.compression.lzo.GPLNativeCodeLoader - Loaded native gpl library from
the embedded binaries
2015-04-30 21:47:53,283 [LocalJobRunner Map Task Executor #0] INFO
com.hadoop.compression.lzo.LzoCodec - Successfully loaded & initialized
native-lzo library [hadoop-lzo rev 77cfa96225d62546008ca339b7c2076a3da91578]
2015-04-30 21:47:53,324 [LocalJobRunner Map Task Executor #0] WARN
org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been
initialized
2015-04-30 21:47:53,362 [LocalJobRunner Map Task Executor #0] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map
- Aliases being processed per job phase (AliasName[line,offset]): M:
MY_INPUT[1,11],T[2,20],T2[3,5],F[2,38],F2[5,5] C: R: F3[7,5]
2015-04-30 21:47:54,075 [LocalJobRunner Map Task Executor #0] INFO
amazon.emr.metrics.MetricsSaver - MetricsConfigRecord disabledInCluster: false
instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine:
false
2015-04-30 21:47:54,081 [LocalJobRunner Map Task Executor #0] INFO
amazon.emr.metrics.MetricsSaver - Created MetricsSaver
j-2027WAZUSMYG8:i-3891d1c5:RunJar:05647 period:60
/mnt/var/em/raw/i-3891d1c5_20150430_null_00000_raw.bin
2015-04-30 21:47:54,103 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.LocalJobRunner -
2015-04-30 21:47:54,104 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.MapTask - Starting flush of map output
2015-04-30 21:47:54,104 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.MapTask - Spilling map output
2015-04-30 21:47:54,104 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.MapTask - bufstart = 0; bufend = 24; bufvoid =
104857600
2015-04-30 21:47:54,104 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.MapTask - kvstart = 26214396(104857584); kvend =
26214392(104857568); length = 5/6553600
2015-04-30 21:47:54,122 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.MapTask - Finished spill 0
2015-04-30 21:47:54,125 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.Task - Task:attempt_local2069324582_0001_m_000000_0 is
done. And is in the process of committing
2015-04-30 21:47:54,175 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.LocalJobRunner - map
2015-04-30 21:47:54,175 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.Task - Task 'attempt_local2069324582_0001_m_000000_0'
done.
2015-04-30 21:47:54,175 [LocalJobRunner Map Task Executor #0] INFO
org.apache.hadoop.mapred.LocalJobRunner - Finishing task:
attempt_local2069324582_0001_m_000000_0
2015-04-30 21:47:54,176 [Thread-7] INFO
org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2015-04-30 21:47:54,179 [Thread-7] INFO
org.apache.hadoop.mapred.LocalJobRunner - Waiting for reduce tasks
2015-04-30 21:47:54,180 [pool-3-thread-1] INFO
org.apache.hadoop.mapred.LocalJobRunner - Starting task:
attempt_local2069324582_0001_r_000000_0
2015-04-30 21:47:54,247 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Task -
Using ResourceCalculatorProcessTree : [ ]
2015-04-30 21:47:54,271 [pool-3-thread-1] INFO
org.apache.hadoop.mapred.ReduceTask - Using ShuffleConsumerPlugin:
org.apache.hadoop.mapreduce.task.reduce.Shuffle@1f3176b9
2015-04-30 21:47:54,337 [pool-3-thread-1] INFO
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - MergerManager:
memoryLimit=709551680, maxSingleShuffleLimit=177387920,
mergeThreshold=468304128, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2015-04-30 21:47:54,351 [EventFetcher for fetching Map Completion Events] INFO
org.apache.hadoop.mapreduce.task.reduce.EventFetcher -
attempt_local2069324582_0001_r_000000_0 Thread started: EventFetcher for
fetching Map Completion Events
2015-04-30 21:47:54,424 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete
2015-04-30 21:47:54,424 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Running jobs are [job_local2069324582_0001]
2015-04-30 21:47:54,512 [localfetcher#1] INFO
org.apache.hadoop.mapreduce.task.reduce.LocalFetcher - localfetcher#1 about to
shuffle output of map attempt_local2069324582_0001_m_000000_0 decomp: 30 len:
34 to MEMORY
2015-04-30 21:47:54,533 [localfetcher#1] INFO
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput - Read 30 bytes from
map-output for attempt_local2069324582_0001_m_000000_0
2015-04-30 21:47:54,534 [localfetcher#1] INFO
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - closeInMemoryFile ->
map-output of size: 30, inMemoryMapOutputs.size() -> 1, commitMemory -> 0,
usedMemory ->30
2015-04-30 21:47:54,538 [Readahead Thread #0] WARN
org.apache.hadoop.io.ReadaheadPool - Failed readahead on ifile
EBADF: Bad file descriptor
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native
Method)
at
org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:263)
at
org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:142)
at
org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-04-30 21:47:54,539 [EventFetcher for fetching Map Completion Events] INFO
org.apache.hadoop.mapreduce.task.reduce.EventFetcher - EventFetcher is
interrupted.. Returning
2015-04-30 21:47:54,558 [pool-3-thread-1] INFO
org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied.
2015-04-30 21:47:54,559 [pool-3-thread-1] INFO
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - finalMerge called
with 1 in-memory map-outputs and 0 on-disk map-outputs
2015-04-30 21:47:54,591 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Merger
- Merging 1 sorted segments
2015-04-30 21:47:54,595 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Merger
- Down to the last merge-pass, with 1 segments left of total size: 22 bytes
2015-04-30 21:47:54,598 [pool-3-thread-1] INFO
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merged 1 segments,
30 bytes to disk to satisfy reduce memory limit
2015-04-30 21:47:54,598 [pool-3-thread-1] INFO
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merging 1 files, 34
bytes from disk
2015-04-30 21:47:54,600 [pool-3-thread-1] INFO
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl - Merging 0 segments,
0 bytes from memory into reduce
2015-04-30 21:47:54,600 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Merger
- Merging 1 sorted segments
2015-04-30 21:47:54,601 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Merger
- Down to the last merge-pass, with 1 segments left of total size: 22 bytes
2015-04-30 21:47:54,602 [pool-3-thread-1] INFO
org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied.
2015-04-30 21:47:54,610 [pool-3-thread-1] INFO
org.apache.hadoop.conf.Configuration.deprecation - mapred.skip.on is
deprecated. Instead, use mapreduce.job.skiprecords
2015-04-30 21:47:54,635 [pool-3-thread-1] WARN
org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been
initialized
2015-04-30 21:47:54,725 [pool-3-thread-1] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce
- Aliases being processed per job phase (AliasName[line,offset]): M:
MY_INPUT[1,11],T[2,20],T2[3,5],F[2,38],F2[5,5] C: R: F3[7,5]
2015-04-30 21:47:54,847 [pool-3-thread-1] INFO
org.apache.hadoop.conf.Configuration.deprecation - mapred.task.id is
deprecated. Instead, use mapreduce.task.attempt.id
===== Task Information Header =====
Command: python -u sf.py
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)
Start time: Thu Apr 30 21:47:54 UTC 20152015-04-30 21:47:54,858
[pool-3-thread-1] INFO org.apache.hadoop.conf.Configuration.deprecation -
mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
===== * * * =====
2015-04-30 21:47:54,899 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Task -
Task:attempt_local2069324582_0001_r_000000_0 is done. And is in the process of
committing
2015-04-30 21:47:54,914 [pool-3-thread-1] INFO
org.apache.hadoop.mapred.LocalJobRunner - 1 / 1 copied.
2015-04-30 21:47:54,914 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Task -
Task attempt_local2069324582_0001_r_000000_0 is allowed to commit now
2015-04-30 21:47:54,917 [pool-3-thread-1] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of
task 'attempt_local2069324582_0001_r_000000_0' to
file:/home/hadoop/steve/test_out/T2/_temporary/0/task_local2069324582_0001_r_000000
2015-04-30 21:47:54,920 [pool-3-thread-1] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of
task 'attempt_local2069324582_0001_r_000000_0' to
file:/home/hadoop/steve/test_out/F2/_temporary/0/task_local2069324582_0001_r_000000
2015-04-30 21:47:54,923 [pool-3-thread-1] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of
task 'attempt_local2069324582_0001_r_000000_0' to
file:/home/hadoop/steve/test_out/F3/_temporary/0/task_local2069324582_0001_r_000000
2015-04-30 21:47:54,926 [pool-3-thread-1] INFO
org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2015-04-30 21:47:54,926 [pool-3-thread-1] INFO org.apache.hadoop.mapred.Task -
Task 'attempt_local2069324582_0001_r_000000_0' done.
2015-04-30 21:47:54,926 [pool-3-thread-1] INFO
org.apache.hadoop.mapred.LocalJobRunner - Finishing task:
attempt_local2069324582_0001_r_000000_0
2015-04-30 21:47:54,926 [Thread-7] INFO
org.apache.hadoop.mapred.LocalJobRunner - reduce task executor complete.
2015-04-30 21:47:55,152 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics -
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already
initialized
2015-04-30 21:47:55,162 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics -
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already
initialized
2015-04-30 21:47:55,164 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - mapred.map.tasks is
deprecated. Instead, use mapreduce.job.maps
2015-04-30 21:47:55,165 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics -
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already
initialized
2015-04-30 21:47:55,263 [main] INFO org.apache.pig.tools.pigstats.JobStats -
using output size reader:
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader
2015-04-30 21:47:55,267 [main] INFO org.apache.pig.tools.pigstats.JobStats -
using output size reader:
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader
2015-04-30 21:47:55,269 [main] INFO org.apache.pig.tools.pigstats.JobStats -
using output size reader:
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader
2015-04-30 21:47:55,273 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2015-04-30 21:47:55,276 [main] INFO
org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.4.0-amzn-4 0.14.0 hadoop 2015-04-30 21:47:49 2015-04-30 21:47:55
GROUP_BY,STREAMING
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime
MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime
MedianReducetime Alias Feature Outputs
job_local2069324582_0001 1 1 n/a n/a n/a n/a
n/a n/a n/a n/a F,F2,F3,MY_INPUT,T,T2 MULTI_QUERY
file:///home/hadoop/steve/test_out/T2,file:///home/hadoop/steve/test_out/F2,file:///home/hadoop/steve/test_out/F3,
Input(s):
Successfully read 2 records from: "file:///home/hadoop/steve/test_in"
Output(s):
Successfully stored 0 records in: "file:///home/hadoop/steve/test_out/T2"
Successfully stored 2 records (20 bytes) in:
"file:///home/hadoop/steve/test_out/F2"
Successfully stored 1 records (2 bytes) in:
"file:///home/hadoop/steve/test_out/F3"
Counters:
Total records written : 3
Total bytes written : 22
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local2069324582_0001
2015-04-30 21:47:55,278 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics -
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already
initialized
2015-04-30 21:47:55,280 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics -
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already
initialized
2015-04-30 21:47:55,282 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics -
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already
initialized
2015-04-30 21:47:55,298 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
2015-04-30 21:47:55,342 [main] INFO org.apache.pig.Main - Pig script completed
in 12 seconds and 27 milliseconds (12027 ms)
2015-04-30 21:47:55,345 [Thread-12] INFO amazon.emr.metrics.MetricsSaver -
MetricsSaver j-2027WAZUSMYG8:i-3891d1c5:RunJar:05647 metricFile
/mnt/var/em/raw/i-3891d1c5_20150430_null_00000_raw.bin
2015-04-30 21:47:55,354 [Thread-12] INFO amazon.emr.metrics.MetricsSaver -
Saved 2:4 records to /mnt/var/em/raw/i-3891d1c5_20150430_RunJar_05647_raw.bin