during some tests for the refactoring I was doing I've found HSEARCH-263 which is currently blocking me; could you give me some directions about how to solve it best?
Sanne 2008/9/8 Sanne Grinovero <[EMAIL PROTECTED]>: > 2008/9/8 Emmanuel Bernard <[EMAIL PROTECTED]>: >> >> >> On Sep 7, 2008, at 05:41, Sanne Grinovero wrote: >> >>> The short question: >>> may I add some methods to the implementations of LuceneWork? >>> I'm refactoring the backends and it would help, but there >>> is a warning there in the javadoc about not changing it freely. >>> >>> Sanne >> >> The short answer is no, I don't think it should be needed. LuceneWork should >> be the minimal contract needed when sending info across the wire. What >> additional info do you need to forward? > > I'll try keeping the serialized fields the same, I don't currently > need more information > across the wire but the code is helped a lot if I may add some > methods, no state change > so I suppose there should be no problem. > >> >>> >>> >>> The same question, a bit more verbose: >>> >>> Hi, >>> I've been puzzling about several optimization in Search I would like >>> to implement, >>> but am needing to do some refactoring in the >>> org.hibernate.search.backend package. >>> (mostly done actually, but needing your ideas) >>> >>> Most changes affect "lucene" implementation, but the code would be >>> greatly simplified, >>> more readable and (performing better too IMHO) if I'm permitted to change >>> the >>> current implementations of LuceneWork; however there's a big warning there >>> about a requirement to be backwards compatible with the serialized form. >>> (btw OptimizeLuceneWork is missing the "magic serialization number") >> >> optimize does not cross the wire > > didn't see that, thanks. good idea actually. > >>> >>> I would like to add them some methods, and a single field which could >>> actually >>> be transient so I could attempt to maintain the compatibility. >>> Additionally I've been thinking that iff you like to keep the LuceneWork >>> as >>> a very simple transport and prefer to not add methods, it would be nicer >>> to >>> have just one class and have the AddLuceneWork/DeleteLuceneWork/... to >>> differentiate >>> by a field (using org.hibernate.search.backend.WorkType ?) >> >> I am open to this approach. I initially created subclasses because the >> necessary data was different between works. > > Nice, but will keep the option as last resort. > I'm currently happy adding polymorhic methods, without state changes. > >> >>> >>> to mark the different type of work; so I could add >>> the methods I'm needing to the enum. >>> Also I could see some use of having an UpdateLuceneWork too, so that it is >>> the backend implementation's business to decide if he wants to split it in >>> a >>> delete+insert or do something more clever: >>> the receive order of messages would be less critical and some clever >>> optimizations >>> could be applied by the backend by reordering received Work(s) or >>> repackaging >>> several queues in one. >> >> Why would the order of message be less critical? Not sure what you mean by >> critical as it's contained in a given work load. > > I mean the order inside the same workload is less critical, currently > the updates > are split in two, then removes are done first.. that part. > I would like to use a "reorder engine" to tweak some stuff (is working very > nice > in alpha code locally); if I know it's an update the reorder engine is > simpler, > otherwise it needs to detect operations on same entity. > >>> >>> What I've done already: >>> a)early division in different queues, basing on affected >>> DirectoryProviders >>> >>> b)refactoring/simplification of Workspace, no longer needed to keep track >>> of >>> state for different DP as there is only one in the context. >>> >>> c)shorter Lock times: no threads ever need more than one Lock; >>> work is sorted by DP, each lock is released before acquiring the next one. >>> (deadlockFreeQueue is removed as not needed anymore) >>> before if we needed lock on DP's A,B,C the time of acquisition looked >>> like: >>> Alock ********* >>> Block ****** >>> CLock *** >>> now it is more like >>> Alock *** >>> Block *** >>> Clock *** >>> And my goal is to make this possible, in separate threads when async: >>> Alock *** >>> Block *** >>> Clock *** >>> (not implemented yet: will need a new backend, but I'm preparing the >>> common >>> stuff to make this possible) >>> >>> d)The QueueProcessor can ask the Work about if they need an indexwriter, >>> indexreader or have any preference about one for when there is >>> possibility to make a choice (when we open both a reader and writer >>> anyway because of strict requirement of other Work in the same queue). >> >> I partly follow you (a delete can be done by a writer in some situations) >> but I don't quite understand why the work should describe that. What do you >> gain? > > just abstraction, the delete work has a method with some logic to give > the correct > answer, the add work just returns a constant value "I need the indexwriter". > The engine checks all work and then has to accomodate all needs, > and could avoid to open an IR or an IW. > >>> >>> >>> e)basing on d), DeleteLuceneWork is able to run either on reader or writer >>> (when it's possible to do so, depending on (the number of different >>> classes using the same DP) == 1); In this last case the work is able to >>> tell it "prefers" to be executed on an IndexWriter, but will be able >>> to do it's task with an IndexReader too (or the opposite?) >> >> when would you need to still use the IR approach in that case? > > for example if I have more work to do on an IR, but no other work > needing a IW; in this case I prefer to run the deletion by using an IR. > I am thinking to put overloaded methods to perform the work > doit( IR, other needed stuff); > doit( IW, other..); > where the AddWork could throw an unsupportedOperation on the wrong one, > the DeleteWork able to accomplish both (in some situations), > and in the case of IR can do it (internal choice) in two different ways > (depending on configuration) selecting the fastest non-broken option. > >>> >>> >>> f)"batch mode" is currently set on all DP if only one Work is of type >>> batch, >>> the division of Workspace per DP does not need this any more and batch >>> mode can be set independently. >> >> good to have the flexibility but I am not sure we will ever need that. This >> case should not happen unless you merge queues from different transactions. > > agree, not needed. It is more like a side-effect of the context > separation; I was thinking in > changing the code to achieve current behavior but thought it could be a > welcomed > side-effect so it staid there. > >>> >>> Another goal I have with this design is the possibility to aggregate >>> different committed queues in one, having the possibility to >>> optimize away work (insert then delete => noop) considering the original >>> order, >> >> hum total ordering is hard (on multi VM) and this case (insert then delete) >> is probably very uncommon. (though it could happen if you execute the work >> of a whole day at once ; but then you face memory issues to order queues). > > Well I didn't implement it, just that after this work it would be > easier to do this > type of optimizations; there are a lot more you can think about. > (as detecting "update, update" on same entity, I suppose this actually happens > more frequently). > >> >>> but also call the strategy optimization again >>> to reorder the newly created work for best efficiency. >>> The final effect would be to obtain the same behavior of >>> my custom batch indexer, but optimizing not only indexing from scratch >>> but any type of load. >>> I hope to not scare you, the resulting code is quite simple and I >>> think there are actually less LOC than the current trunk has; >>> I've not prepared any special case Test, I just run all existing ones. >> >> let's try and chat on IM around that. > > nice; I'm trying already and have organized my work this way: > * some refactoring of existing engine to cleanup code > * test it still is all working > * build an experimental QueueProcessor in a separate package > >> >>> >>> >>> kind regards, >>> Sanne >> >> > _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev