On  Sep 7, 2008, at 05:41, Sanne Grinovero wrote:

The short question:
        may I add some methods to the implementations of LuceneWork?
        I'm refactoring the backends and it would help, but there
        is a warning there in the javadoc about not changing it freely.

        Sanne

The short answer is no, I don't think it should be needed. LuceneWork should be the minimal contract needed when sending info across the wire. What additional info do you need to forward?



The same question, a bit more verbose:

Hi,
I've been puzzling about several optimization in Search I would like
to implement,
but am needing to do some refactoring in the
org.hibernate.search.backend package.
(mostly done actually, but needing your ideas)

Most changes affect "lucene" implementation, but the code would be
greatly simplified,
more readable and (performing better too IMHO) if I'm permitted to change the current implementations of LuceneWork; however there's a big warning there about a requirement to be backwards compatible with the serialized form.
(btw OptimizeLuceneWork is missing the "magic serialization number")

optimize does not cross the wire



I would like to add them some methods, and a single field which could actually
be transient so I could attempt to maintain the compatibility.
Additionally I've been thinking that iff you like to keep the LuceneWork as a very simple transport and prefer to not add methods, it would be nicer to
have just one class and have the AddLuceneWork/DeleteLuceneWork/... to
differentiate
by a field (using org.hibernate.search.backend.WorkType ?)

I am open to this approach. I initially created subclasses because the necessary data was different between works.


to mark the different type of work; so I could add
the methods I'm needing to the enum.
Also I could see some use of having an UpdateLuceneWork too, so that it is the backend implementation's business to decide if he wants to split it in a
delete+insert or do something more clever:
the receive order of messages would be less critical and some clever
optimizations
could be applied by the backend by reordering received Work(s) or repackaging
several queues in one.

Why would the order of message be less critical? Not sure what you mean by critical as it's contained in a given work load.



What I've done already:
a)early division in different queues, basing on affected DirectoryProviders

b)refactoring/simplification of Workspace, no longer needed to keep track of
state for different DP as there is only one in the context.

c)shorter Lock times: no threads ever need more than one Lock;
work is sorted by DP, each lock is released before acquiring the next one.
(deadlockFreeQueue is removed as not needed anymore)
before if we needed lock on DP's A,B,C the time of acquisition looked like:
Alock *********
Block    ******
CLock       ***
now it is more like
Alock ***
Block    ***
Clock       ***
And my goal is to make this possible, in separate threads when async:
Alock ***
Block ***
Clock ***
(not implemented yet: will need a new backend, but I'm preparing the common
stuff to make this possible)

d)The QueueProcessor can ask the Work about if they need an indexwriter,
indexreader or have any preference about one for when there is
possibility to make a choice (when we open both a reader and writer
anyway because of strict requirement of other Work in the same queue).

I partly follow you (a delete can be done by a writer in some situations) but I don't quite understand why the work should describe that. What do you gain?



e)basing on d), DeleteLuceneWork is able to run either on reader or writer
(when it's possible to do so, depending on (the number of different
classes using the same DP) == 1); In this last case the work is able to
tell it "prefers" to be executed on an IndexWriter, but will be able
to do it's task with an IndexReader too (or the opposite?)

when would you need to still use the IR approach in that case?



f)"batch mode" is currently set on all DP if only one Work is of type batch,
the division of Workspace per DP does not need this any more and batch
mode can be set independently.

good to have the flexibility but I am not sure we will ever need that. This case should not happen unless you merge queues from different transactions.



Another goal I have with this design is the possibility to aggregate
different committed queues in one, having the possibility to
optimize away work (insert then delete => noop) considering the original
order,

hum total ordering is hard (on multi VM) and this case (insert then delete) is probably very uncommon. (though it could happen if you execute the work of a whole day at once ; but then you face memory issues to order queues).

but also call the strategy optimization again
to reorder the newly created work for best efficiency.
The final effect would be to obtain the same behavior of
my custom batch indexer, but optimizing not only indexing from scratch
but any type of load.
I hope to not scare you, the resulting code is quite simple and I
think there are actually less LOC than the current trunk has;
I've not prepared any special case Test, I just run all existing ones.

let's try and chat on IM around that.



kind regards,
Sanne

_______________________________________________
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev

Reply via email to