[
https://issues.apache.org/jira/browse/SOLR-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156672#comment-14156672
]
Renaud Delbru edited comment on SOLR-6460 at 10/2/14 3:47 PM:
--------------------------------------------------------------
Here is the latest patch which includes an optimisation to reduce the number of
opened files and some code cleaning. To summarise, the current patch provides
the following:
h4. Cleaning of Old Transaction Logs
The CdcrUpdateLog removes old tlogs based on pointers instead of a fixed size
limit.
h4. Log Reader
The CdcrUpdateLog provides a log reader with scan and seek operations. A log
reader is associated to a log pointer, and is taking care of the life-cycle of
the pointer.
h4. Log Index
To improve the efficiency of the seek operation of the log reader, an index of
transaction log files has been added. This index enables to quickly lookup a
tlog file based on a version number. This index is implemented by adding a
version number to the tlog filename and by leveraging the file system index.
This solution was choosen as it was simpler and more robust than managing a
separate disk-based index.
h4. Number of Opened Files
TransactionLog has been extended to automatically (1) close the output stream
when its refeference count reaches 0, and (2) reopen the output stream on
demand.
The new tlog (the current tlog being written) is kept open at all time. When a
transaction log is pushed to the old tlog list, its reference count is
decremented, which might trigger the closing of the output stream.
The output stream is reopened in two cases:
* during recovery, to write a commit to the end of an uncapped tlog file;
* when a log reader is accessing it.
At the moment, the logic is splitted into two classes (TransactionLog and
CdcrTransactionLog). We should probably merge the two in the final version.
h4. Integration within the UpdateHandler
There is a nocommit in the UpdateHandler to force the instantiation of the
CdcrUpdateLog instead of the UpdateLog. We need to decide how user will
configure this and modify the UpdateHandler appropriately.
was (Author: rendel):
Here is the latest patch which includes an optimisation to reduce the number of
opened files and some code cleaning. To summarise, the current patch provides
the following:
h4. Cleaning of Old Transaction Logs
The CdcrUpdateLog removes old tlogs based on pointers instead of a fixed size
limit.
h4. Log Reader
The CdcrUpdateLog provides a log reader with scan and seek operations. A log
reader is associated to a log pointer, and is taking care of the life-cycle of
the pointer.
h4. Log Index
To improve the efficiency of the seek operation of the log reader, an index of
transaction log files have been added. This index enables to quickly lookup a
tlog file based on a version number. This index is implemented by adding a
version number to the tlog filename and by leveraging the file system index.
This solution was choosen as it was simpler and more robust than managing a
separate disk-based index.
h4. Number of Opened Files
TransactionLog has been extended to automatically (1) close the output stream
when its refeference count reach 0, and (2) reopen the output stream on demand.
The new tlog (the current tlog being written) is kept open at all time. When a
transaction log is pushed to the old tlog list, its reference count is
decremented, which might trigger the closing of the output stream.
The output stream is reopened in two cases:
* during recovery, to write a commit to the end of an uncapped tlog file;
* when a log reader is accessing it.
At the moment, the logic is splitted into two classes (TransactionLog and
CdcrTransactionLog). We should probably merge the two in the final version.
h4. Integration within the UpdateHandler
There is a nocommit in the UpdateHandler to force the instantiation of the
CdcrUpdateLog instead of the UpdateLog. We need to decide how user will
configure this and modify the UpdateHandler appropriately.
> Keep transaction logs around longer
> -----------------------------------
>
> Key: SOLR-6460
> URL: https://issues.apache.org/jira/browse/SOLR-6460
> Project: Solr
> Issue Type: Sub-task
> Reporter: Yonik Seeley
> Attachments: SOLR-6460.patch, SOLR-6460.patch, SOLR-6460.patch
>
>
> Transaction logs are currently deleted relatively quickly... but we need to
> keep them around much longer to be used as a source for cross-datacenter
> recovery. This will also be useful in the future for enabling peer-sync to
> use more historical updates before falling back to replication.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]