[
https://issues.apache.org/jira/browse/SOLR-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller reassigned SOLR-9389:
---------------------------------
Assignee: Mark Miller
> HDFS Transaction logs stay open for writes which leaks Xceivers
> ---------------------------------------------------------------
>
> Key: SOLR-9389
> URL: https://issues.apache.org/jira/browse/SOLR-9389
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Hadoop Integration, hdfs
> Affects Versions: 6.1, master (7.0)
> Reporter: Tim Owen
> Assignee: Mark Miller
> Attachments: SOLR-9389.patch
>
>
> The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open
> for its whole lifetime, which consumes two threads on the HDFS data node
> server (dataXceiver and packetresponder) even once the Solr tlog has finished
> being written to.
> This means for a cluster with many indexes on HDFS, the number of Xceivers
> can keep growing and eventually hit the limit of 4096 on the data nodes. It's
> especially likely for indexes that have low write rates, because Solr keeps
> enough tlogs around to contain 100 documents (up to a limit of 10 tlogs).
> There's also the issue that attempting to write to a finished tlog would be a
> major bug, so closing it for writes helps catch that.
> Our cluster during testing had 100+ collections with 100 shards each, spread
> across 8 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x
> replication for the tlog files, this meant we hit the xceiver limit fairly
> easily and had to use the attached patch to ensure tlogs were closed for
> writes once finished.
> The patch introduces an extra lifecycle state for the tlog, so it can be
> closed for writes and free up the HDFS resources, while still being available
> for reading. I've tried to make it as unobtrusive as I could, but there's
> probably a better way. I have not changed the behaviour of the local disk
> tlog implementation, because it only consumes a file descriptor regardless of
> read or write.
> nb We have decided not to use Solr-on-HDFS now, we're using local disk (for
> various reasons). So I don't have a HDFS cluster to do further testing on
> this, I'm just contributing the patch which worked for us.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]