[hibernate-dev] Re: improving Search

Emmanuel Bernard Tue, 10 Jun 2008 13:55:27 -0700


On  Jun 10, 2008, at 07:17, Sanne Grinovero wrote:

Hello Emmanuel,
as you asked how to use the SnapshotDeletionPolicy:
when you create the IndexWriter do:

  IndexDeletionPolicy policy = new KeepOnlyLastCommitDeletionPolicy();
SnapshotDeletionPolicy snapshotter = newSnapshotDeletionPolicy(policy);IndexWriter writer = new IndexWriter(dir, autoCommit, analyzer,snapshotter);
then when you want to make a copy you "freeze" the existence ofcurrent index segments,kindly asking the IndexWriter to avoid r emoving or otherwise changethe files your are going to copy;it also tells you which files you should copy to get a clone of theindex as at snapshot time:
try {
  IndexCommitPoint commit = snapshotter.snapshot();
  Collection fileNames = commit.getFileNames();
  <iterate over & copy files from fileNames>
} finally {
  snapshotter.release();
}
(credits to Lucene in Action 2°edition MEAP)
Should I add this improvement to JIRA? I'll add the code examples;also you opened already HSEARCH-152
is that meant for this same purpose?


Yes that's the goal for HSEARCH-152

>> Does it somehow involve not having cluster changes (ie intra VMpolicy rather than inter VM?)I don't really understand what you're asking; I hope previousexample contains an answer; it just means you don't need
to lock the index to make a "hotcopy" to anywhere.

My question is does it somehow involve interacting with theIndexWriter so that it does not do stuffs it would do otherwise.ie if I run indexing on VM1 and VM2, will copying files "from VM1" beaffected by indexWriter operations from VM2

>>The explanation is correct but a bit cataclysmic. We open thereader *only* if there is an actual deletion for the given Directoryprovider. So in some systems we might very well almost never open arw reader.Well yes it doesn't happen always, but not only for deletions: alsoall updates are split to to delete+insert work AFAIK.if you combine this with the open issue that the index is updated onevery dirty entity (not checking if onlythe indexed fields are changed) this translates to lots of re-opening, twice for every transaction updating some entity... (stillcorrect?)


Yes correct.

>> We need to check how efficient the implementation has been done.If it a mechanism similar to IndexModifier, it's not worth it.
Agree, I'll try to discover that.
>>This is what happens by default already, one index directory byentity type. We could compute a flag at init time to know that (it'salrady in place actually) and use it to use a Term query>> rather than the full query if only one entity is present in theindex. Let's open a JIRA issueWell my purpose is to avoid the need to switch from reader towriter, this also simplifies the reordering code were you split allwork in two sequences (would be unneded);For deletion and eviction we should check to see if the workspacehas an indexwriter or indexreader available and use whatever we haveavailable; but this would complicatecode and don't like it, so I really hope we could really avoid theuse of indexreader for modification operations;as we found already a good solutions for mass eviction and when theflag says it's ok to delete by id, there's just one case left weshould think about.Should I open the JIRA for the partial solution as far as we gotcurrently?

Yep

>> Not if it's updated in a cluster, right?
>> Plus seeing the contention lock we have experienced (onIndexReader) in the recent test case, I want to be sure it'sactually faster than opening every time.
Why not? don't you update by using delete+insert in cluster too?

In a cluster, you open the IW do what you have to do protected by theglobal lock, then close the IW (and release the global lock) whichmeans the IW on a other machine of the cluster can then have it.Does it work the same if you keep the IW opened? Does it release theglobal lock? Does it cope with other IWs updating the file system?

I'll write performance tests just to be sure, but for pastexperience I really expect it to be much faster.Also note that all document analysis is done inside the IndexWriter,so not using it concurrently is a bottleneck when the analysis isexpensive (think about PDFs in Blobs..);during analysis the files are locked to a single thread/transactionbut the time spent in really changing files is just a fraction, andLucene has it's own proper locks in those points.
cheers,
Sanne



_______________________________________________
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev

[hibernate-dev] Re: improving Search

Reply via email to