Rajat Khandelwal created HADOOP-16278:
-----------------------------------------

             Summary: With S3 Filesystem, Long Running services End up Doing 
lot of GC and eventually die
                 Key: HADOOP-16278
                 URL: https://issues.apache.org/jira/browse/HADOOP-16278
             Project: Hadoop Common
          Issue Type: Bug
          Components: common, hadoop-aws, metrics
    Affects Versions: 3.1.2, 3.1.1, 3.1.0
            Reporter: Rajat Khandelwal
             Fix For: 3.1.3
         Attachments: Screenshot 2019-04-30 at 12.52.42 PM.png, Screenshot 
2019-04-30 at 2.33.59 PM.png

I'll start with the symptoms and eventually come to the cause. 

 

We are using HDP 3.1 and Noticed that every couple of days the Hive Metastore 
starts doing GC, sometimes with 30 minute long pauses. Although nothing is 
collected and the Heap remains fully used. 

 

Next, we looked at the Heap Dump and found that 99% of the memory is taken up 
by one Executor Service for its task queue. 

 

!Screenshot 2019-04-30 at 12.52.42 PM.png!

The Instance is Created like this:

{{ private static final ScheduledExecutorService scheduler = Executors}}
{{ .newScheduledThreadPool(1, new ThreadFactoryBuilder().setDaemon(true)}}
{{ .setNameFormat("MutableQuantiles-%d").build());}}

 

So All the instances of MutableQuantiles are using a Shared single threaded 
ExecutorService

The second thing to notice is this block of code in the Constructor of 
MutableQuantiles:

{{this.scheduledTask = scheduler.scheduleAtFixedRate(new 
MutableQuantiles.RolloverSample(this), (long)interval, (long)interval, 
TimeUnit.SECONDS);}}

So As soon as a MutableQuantiles Instance is created, one task is scheduled at 
Fix Rate. Instead of that, it could schedule them at Fixed Delay (Refer 
HADOOP-16248). 

Now coming to why it's related to S3. 

 

S3AFileSystem Creates an instance of S3AInstrumentation, which creates two 
quantiles (related to S3Guard) with 1s(hardcoded) interval and leaves them 
hanging. By hanging I mean perpetually scheduled. As and when new Instances of 
S3AFileSystem are created, two new quantiles are created, which in turn create 
two scheduled tasks and never cancel them. This way number of scheduled tasks 
keeps on growing without ever getting cleaned up, leading to GC/OOM/Crash. 

 

MutableQuantiles has a numInfo field which tells things like the name of the 
metric. From the Heapdump, I found one numInfo and traced all objects refering 
that.

 

!Screenshot 2019-04-30 at 2.33.59 PM.png!

 

There seem to be 300K objects of for the same metric 
(S3Guard_metadatastore_throttle_rate). 

As expected, there are other 300K objects for the other MutableQuantiles 
created by S3AInstrumentation class. 

Although the number of instances of S3AInstrumentation class is only 4. 

Clearly, there is a leak. One S3AInstrumentation instance is creating two 
scheduled tasks to be run every second. These tasks are left scheduled and not 
cancelled when S3AInstrumentation.close() is called. Hence, they are never 
cleaned up. GC is also not able to collect them since they are referred by the 
scheduler. 

Who creates S3AInstrumentation instances? S3AFileSystem.initialize(), which is 
called in FileSystem.get(URI, Configuration). Since hive metastore is a service 
that deals with a lot of Path Objects and hence needs to do a lot of calls to 
FileSystem.get, it's the one to first shows these symptoms. 

We're seeing similar symptoms in AM for long-running jobs (for both Tez AM and 
MR AM). 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to