[hibernate-dev] Re: Search: backend refactoring

Emmanuel Bernard Mon, 08 Sep 2008 07:30:09 -0700


On  Sep 7, 2008, at 05:41, Sanne Grinovero wrote:

The short question:
        may I add some methods to the implementations of LuceneWork?
        I'm refactoring the backends and it would help, but there
        is a warning there in the javadoc about not changing it freely.

        Sanne

The short answer is no, I don't think it should be needed. LuceneWorkshould be the minimal contract needed when sending info across thewire. What additional info do you need to forward?

The same question, a bit more verbose:

Hi,
I've been puzzling about several optimization in Search I would like
to implement,
but am needing to do some refactoring in the
org.hibernate.search.backend package.
(mostly done actually, but needing your ideas)

Most changes affect "lucene" implementation, but the code would be
greatly simplified,
more readable and (performing better too IMHO) if I'm permitted tochange thecurrent implementations of LuceneWork; however there's a big warningthereabout a requirement to be backwards compatible with the serializedform.
(btw OptimizeLuceneWork is missing the "magic serialization number")


optimize does not cross the wire

I would like to add them some methods, and a single field whichcould actually
be transient so I could attempt to maintain the compatibility.
Additionally I've been thinking that iff you like to keep theLuceneWork asa very simple transport and prefer to not add methods, it would benicer to
have just one class and have the AddLuceneWork/DeleteLuceneWork/... to
differentiate
by a field (using org.hibernate.search.backend.WorkType ?)

I am open to this approach. I initially created subclasses because thenecessary data was different between works.

to mark the different type of work; so I could add
the methods I'm needing to the enum.
Also I could see some use of having an UpdateLuceneWork too, so thatit isthe backend implementation's business to decide if he wants to splitit in a
delete+insert or do something more clever:
the receive order of messages would be less critical and some clever
optimizations
could be applied by the backend by reordering received Work(s) orrepackaging
several queues in one.

Why would the order of message be less critical? Not sure what youmean by critical as it's contained in a given work load.

What I've done already:
a)early division in different queues, basing on affectedDirectoryProviders
b)refactoring/simplification of Workspace, no longer needed to keeptrack of
state for different DP as there is only one in the context.

c)shorter Lock times: no threads ever need more than one Lock;
work is sorted by DP, each lock is released before acquiring thenext one.
(deadlockFreeQueue is removed as not needed anymore)
before if we needed lock on DP's A,B,C the time of acquisitionlooked like:
Alock *********
Block    ******
CLock       ***
now it is more like
Alock ***
Block    ***
Clock       ***
And my goal is to make this possible, in separate threads when async:
Alock ***
Block ***
Clock ***
(not implemented yet: will need a new backend, but I'm preparing thecommon
stuff to make this possible)
d)The QueueProcessor can ask the Work about if they need anindexwriter,
indexreader or have any preference about one for when there is
possibility to make a choice (when we open both a reader and writer
anyway because of strict requirement of other Work in the same queue).

I partly follow you (a delete can be done by a writer in somesituations) but I don't quite understand why the work should describethat. What do you gain?

e)basing on d), DeleteLuceneWork is able to run either on reader orwriter
(when it's possible to do so, depending on (the number of different
classes using the same DP) == 1); In this last case the work is ableto
tell it "prefers" to be executed on an IndexWriter, but will be able
to do it's task with an IndexReader too (or the opposite?)


when would you need to still use the IR approach in that case?

f)"batch mode" is currently set on all DP if only one Work is oftype batch,
the division of Workspace per DP does not need this any more and batch
mode can be set independently.

good to have the flexibility but I am not sure we will ever need that.This case should not happen unless you merge queues from differenttransactions.

Another goal I have with this design is the possibility to aggregate
different committed queues in one, having the possibility to
optimize away work (insert then delete => noop) considering theoriginal
order,

hum total ordering is hard (on multi VM) and this case (insert thendelete) is probably very uncommon. (though it could happen if youexecute the work of a whole day at once ; but then you face memoryissues to order queues).

but also call the strategy optimization again
to reorder the newly created work for best efficiency.
The final effect would be to obtain the same behavior of
my custom batch indexer, but optimizing not only indexing from scratch
but any type of load.
I hope to not scare you, the resulting code is quite simple and I
think there are actually less LOC than the current trunk has;
I've not prepared any special case Test, I just run all existing ones.


let's try and chat on IM around that.



kind regards,
Sanne


_______________________________________________
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev

[hibernate-dev] Re: Search: backend refactoring

Reply via email to