This was with 0.8.1. The following is a part of the output of hprof:

          percent          live          alloc'ed  stack class
 rank   self  accum     bytes objs     bytes  objs trace name
    1  4.99%  4.99%  33572480 419656 100460640 1255758 307176
java.util.HashMap$Entry[]
    2  4.92%  9.91%  33120000 414000  66240000 828000 308200
java.util.HashMap$Entry[]
    3  4.92% 14.83%  33114400 413930  99186000 1239825 308147
java.util.HashMap$Entry[]
    4  4.91% 19.74%  33009600 412620  66019200 825240 308202
java.util.HashMap$Entry[]
    5  4.69% 24.43%  31554048 986064 2720274496 85008578 309807
org.apache.pig.newplan.logical.relational.LogicalSchema$LogicalFieldSchema
    6  3.95% 28.39%  26604720 1108530 2104577160 87690715 309427
java.util.HashMap$Entry
    7  3.74% 32.13%  25187136 1049464  52832592 2201358 307177
java.util.HashMap$Entry
    8  3.65% 35.78%  24577328 852060 127045504 4220197 306673 char[]
    9  3.56% 39.35%  23979536 428206  71194592 1271332 306680
java.lang.Object[]
   10  3.45% 42.80%  23214264 967261 240166320 10006930 300253
java.util.ArrayList
   11  3.45% 46.24%  23179632 413922  69429528 1239813 308150
java.lang.Object[]
   12  3.21% 49.45%  21583080 899295  78141408 3255892 306516
java.util.HashMap$Entry
   13  3.04% 52.49%  20452128 852172 116126520 4838605 306674
java.lang.String
   14  2.58% 55.07%  17385728 1086608 1359995520 84999720 309812
org.apache.pig.impl.util.Pair
   15  2.50% 57.57%  16786640 419666  50231120 1255778 307175
java.util.HashMap
   16  2.50% 60.06%  16786240 419656  50230320 1255758 307172
java.util.HashMap
   17  2.49% 62.55%  16732960 418324  58320080 1458002 307121
java.util.HashMap
   18  2.46% 65.01%  16560000 414000  33120000 828000 308201
java.util.HashMap
   19  1.96% 66.98%  13209056 412783  52836224 1651132 309652
org.apache.pig.newplan.logical.relational.LogicalSchema$LogicalFieldSchema
   20  1.96% 68.94%  13207872 412746  39617568 1238049 308146
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema
   21  1.96% 70.90%  13203840 412620  26407680 825240 308193
org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema
   22  1.49% 72.39%  10010592 625662 785665632 49104102 309774
java.lang.Integer
   23  1.48% 73.87%   9936648 414027  19872648 828027 308203
java.util.HashMap$Entry
   24  1.48% 75.34%   9934128 413922  29755512 1239813 308149
java.util.HashMap$Entry
   25  1.47% 76.82%   9907872 412828  19811160 825465 307947
java.util.ArrayList
   26  1.47% 78.29%   9907872 412828  19811160 825465 307948
java.util.HashMap$Entry
   27  1.47% 79.76%   9902880 412620  19805760 825240 308206
java.util.HashMap$Entry
   28  1.17% 80.93%   7885080   24   7885080    24 311867 char[]
   29  1.16% 82.09%   7776216    2   9749800     6 313313 char[]
   30  1.13% 83.22%   7595616 3686 1054669088 926722 309644
java.util.HashMap$Entry[]
   31  0.98% 84.20%   6622880 413930  19837200 1239825 308148
org.apache.pig.impl.util.MultiMap
   32  0.98% 85.18%   6601920 412620  13203840 825240 308199
org.apache.pig.impl.util.MultiMap
   33  0.68% 85.86%   4584576  568  13732448  5942 308075
java.util.HashMap$Entry[]
   34  0.44% 86.30%   2956656 123194 234198048 9758252 309678
java.lang.String
   35  0.44% 86.74%   2956656 123194 234198048 9758252 309680
java.util.HashMap$Entry
   36  0.44% 87.18%   2929776 1535  17382848 22404 306766
java.util.HashMap$Entry[]
   37  0.43% 87.61%   2895536 22872  13513088 107528 311646 char[]
   38  0.42% 88.03%   2856864 1386  16783440 20733 308151
java.util.HashMap$Entry[]
   39  0.42% 88.46%   2848496 1386  11172656 13806 308208
java.util.HashMap$Entry[]
   40  0.42% 88.88%   2848320 1380  11172480 13800 308207
java.util.HashMap$Entry[]

All the entries related to java.util.HashMap$Entry and java.util.HashMap are
traced to schema related function calls like some of the following:
TRACE 307171:

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:156)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)

org.apache.pig.impl.logicalLayer.LOForEach.getSchema(LOForEach.java:245)

org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:851)
TRACE 307172:
        java.util.AbstractMap.<init>(AbstractMap.java:56)
        java.util.HashMap.<init>(HashMap.java:206)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:161)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)
TRACE 307173:
        java.util.HashMap.<init>(HashMap.java:209)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:161)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)

org.apache.pig.impl.logicalLayer.LOForEach.getSchema(LOForEach.java:245)
TRACE 307174:
        org.apache.pig.impl.util.MultiMap.<init>(MultiMap.java:46)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:162)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)

org.apache.pig.impl.logicalLayer.LOForEach.getSchema(LOForEach.java:245)
TRACE 307175:
        java.util.AbstractMap.<init>(AbstractMap.java:56)
        java.util.HashMap.<init>(HashMap.java:206)
        org.apache.pig.impl.util.MultiMap.<init>(MultiMap.java:47)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:162)
TRACE 307176:
        java.util.HashMap.<init>(HashMap.java:209)
        org.apache.pig.impl.util.MultiMap.<init>(MultiMap.java:47)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:162)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.copyAndLink(Schema.java:242)
TRACE 307177:
        java.util.HashMap$Entry.<init>(HashMap.java:683)
        java.util.HashMap.addEntry(HashMap.java:753)
        java.util.HashMap.put(HashMap.java:385)

org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.setParent(Schema.java:251)


~Shubham.

On Wed, Jun 15, 2011 at 6:17 PM, Daniel Dai <[email protected]> wrote:

>  That would be surprised. Which version of Pig are you using?
>
> Daniel
>
>
> On 06/15/2011 03:10 PM, Shubham Chopra wrote:
>
> Hi Daniel,
>
> Thanks for the reply. I did try that and ran into this issue again when I
> increased the number of operators. I found out, with hprof, that most sites
> with high memory usage are schema related. Is that a bug in schema
> implementation? Are schema related data-structures expected to consume so
> much memory?
>
> ~Shubham.
>
> On Wed, Jun 15, 2011 at 2:32 PM, Daniel Dai <[email protected]>wrote:
>
>> Try to increase heap size. If you are running through bin/pig, set
>> PIG_HEAPSIZE (in MB, default is 1000). You can use "pig -secretDebugCmd"
>> option to see what the command line looks like.
>>
>> Daniel
>>
>>
>> On 06/15/2011 10:09 AM, Shubham Chopra wrote:
>>
>>> Hi,
>>>
>>> I am using Pig for number crunching on data that has a large number of
>>> columns (~300 or so). The script has around 25 operators and all I am
>>> doing
>>> in the script is group bys and SUMs. The script fails with the following
>>> exception:
>>> <code>
>>> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
>>> exceeded
>>>         at java.util.HashMap.<init>(HashMap.java:209)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:190)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
>>>         at
>>> org.apache.pig.impl.logicalLayer.schema.Schema.clone(Schema.java:1005)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.ExpressionOperator.clone(ExpressionOperator.java:144)
>>>         at
>>> org.apache.pig.impl.logicalLayer.LOProject.clone(LOProject.java:447)
>>>         at
>>> org.apache.pig.impl.logicalLayer.LogicalPlan.clone(LogicalPlan.java:116)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.<init>(LogicalPlanCloneHelper.java:63)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:45)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3504)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
>>>         at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601)
>>>         at org.apache.pig.PigServer$Graph.clone(PigServer.java:1645)
>>>         at org.apache.pig.PigServer.getClonedGraph(PigServer.java:527)
>>>         at org.apache.pig.PigServer.storeEx(PigServer.java:850)
>>>         at org.apache.pig.PigServer.store(PigServer.java:816)
>>>         at org.apache.pig.PigServer.store(PigServer.java:784)
>>> </code>
>>> The complete output I see is the following:
>>> <code>
>>> $run-script
>>> 11/06/15 09:19:27 INFO executionengine.HExecutionEngine: Connecting to
>>> hadoop file system at: hdfs://abcd:9000
>>> 11/06/15 09:19:28 INFO executionengine.HExecutionEngine: Connecting to
>>> map-reduce job tracker at: abcd:9001
>>>
>>> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
>>> exceeded
>>>         at java.util.HashMap.<init>(HashMap.java:209)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.<init>(Schema.java:190)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
>>>         at
>>> org.apache.pig.impl.logicalLayer.schema.Schema.clone(Schema.java:1005)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.clone(Schema.java:450)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.ExpressionOperator.clone(ExpressionOperator.java:144)
>>>         at
>>> org.apache.pig.impl.logicalLayer.LOProject.clone(LOProject.java:447)
>>>         at
>>> org.apache.pig.impl.logicalLayer.LogicalPlan.clone(LogicalPlan.java:116)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.<init>(LogicalPlanCloneHelper.java:63)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:45)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:3504)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1464)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1013)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:800)
>>>         at
>>>
>>> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
>>>         at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1601)
>>>         at org.apache.pig.PigServer$Graph.clone(PigServer.java:1645)
>>>         at org.apache.pig.PigServer.getClonedGraph(PigServer.java:527)
>>>         at org.apache.pig.PigServer.storeEx(PigServer.java:850)
>>>         at org.apache.pig.PigServer.store(PigServer.java:816)
>>>         at org.apache.pig.PigServer.store(PigServer.java:784)
>>> </code>
>>> The process uses around 1.2 gigs of ram before crapping out with the
>>> exception above. Has anyone else faced a similar situation? Any way out
>>> of
>>> this?
>>>
>>> Thanks,
>>> Shubham.
>>>
>>
>>
>
>

Reply via email to