[ 
https://issues.apache.org/jira/browse/SOLR-9389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-9389:
---------------------------------

    Assignee: Mark Miller

> HDFS Transaction logs stay open for writes which leaks Xceivers
> ---------------------------------------------------------------
>
>                 Key: SOLR-9389
>                 URL: https://issues.apache.org/jira/browse/SOLR-9389
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Hadoop Integration, hdfs
>    Affects Versions: 6.1, master (7.0)
>            Reporter: Tim Owen
>            Assignee: Mark Miller
>         Attachments: SOLR-9389.patch
>
>
> The HdfsTransactionLog implementation keeps a Hadoop FSDataOutputStream open 
> for its whole lifetime, which consumes two threads on the HDFS data node 
> server (dataXceiver and packetresponder) even once the Solr tlog has finished 
> being written to.
> This means for a cluster with many indexes on HDFS, the number of Xceivers 
> can keep growing and eventually hit the limit of 4096 on the data nodes. It's 
> especially likely for indexes that have low write rates, because Solr keeps 
> enough tlogs around to contain 100 documents (up to a limit of 10 tlogs). 
> There's also the issue that attempting to write to a finished tlog would be a 
> major bug, so closing it for writes helps catch that.
> Our cluster during testing had 100+ collections with 100 shards each, spread 
> across 8 boxes (each running 4 solr nodes and 1 hdfs data node) and with 3x 
> replication for the tlog files, this meant we hit the xceiver limit fairly 
> easily and had to use the attached patch to ensure tlogs were closed for 
> writes once finished.
> The patch introduces an extra lifecycle state for the tlog, so it can be 
> closed for writes and free up the HDFS resources, while still being available 
> for reading. I've tried to make it as unobtrusive as I could, but there's 
> probably a better way. I have not changed the behaviour of the local disk 
> tlog implementation, because it only consumes a file descriptor regardless of 
> read or write.
> nb We have decided not to use Solr-on-HDFS now, we're using local disk (for 
> various reasons). So I don't have a HDFS cluster to do further testing on 
> this, I'm just contributing the patch which worked for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to