I do think the idea of an abstract class (or interface) SegmentWriter is compelling.
Each DWPT would be a [single-threaded] SegmentWriter. And then we'd make a MultiThreadedSegmentWriterWrapper (manages a collection of SegmentWriters, deleting to them, aggregating RAM used across all, manages picking which ones to flush, etc.). Then, a SlicedSegmentWriter (say) would write to separate slices, single threaded, and then you could make it multi-threaded by wrapping w/ the above class. Though SegmentWriter isn't a great name since it would in general write to multiple segments. Indexer is a little too broad though :) Something like that maybe? Also, allowing an app to directly control the underlying SegmentWriters inside IndexWriter (instead of letting the multi-threaded wrapper decide for you) is compelling for way advanced apps, I think. EG your app may know it's done indexing from source A for a while, so, you should right now go and flush it (whereas the default "flush the one using the most RAM" could leave that source unflushed for a quite a while, tying up RAM, unless we do some kind of LRU flushing policy or something). Mike On Wed, Apr 21, 2010 at 2:27 AM, Shai Erera <[email protected]> wrote: > I'm not sure that a Parallel DW would work for PI because DW is too internal > to IW. Currently, the approach I've been thinking about for PI is to tackle > it from a high level, e.g. allow the application to pass a Directory, or > even an IW instance, and PI will play the coordinator role, ensuring that > merge of segments happens across all the slices in accordance, implementing > two-phase operations etc. A Parallel DW then does not fit nicely w/ that > approach (unless we want to refactor how IW works completely) because DW is > not aware of the Directory, and if PI indeed works over IW instances, then > each will have its own DW. > > So there are two basic approaches we can take for PI (following current > architecture) - either let PI manage IW, or have PI a sort of IW itself, > which handles events at a much lower level. While the latter is more robust > (and based on current limitations I'm running into, might be even easier to > do), it lacks the flexibility of allowing the app to plug any IW it wants. > That requirement is also important, if the application wants to use PI in > scenarios where it keeps some slices in RAM and some on disk, or it wants to > control more closely which fields go to which slice, so that it can at some > point in time "rebuild" a certain slice outside PI and replace the existing > slice in PI w/ the new one ... > > We should probably continue the discussion on PI, so I suggest we either > move it to another thread or on the issue directly. > > Mike - I agree w/ you that we should keep the life of the application > developers easy and that having IW itself support concurrency is beneficial. > Like I said ... it was just a thought which was aimed at keeping our life > (Lucene developers) easier, but that probably comes second compared to > app-devs life :). I'm not at all sure also that that would have make our > life easier ... > > So I'm good if you want to drop the discussion. > > Shai > > On Tue, Apr 20, 2010 at 8:16 PM, Michael Busch <[email protected]> wrote: >> >> On 4/19/10 10:25 PM, Shai Erera wrote: >>> >>> It will definitely simplify multi-threaded handling for IW extensions >>> like Parallel Index … >>> >> >> I'm keeping Parallel indexing in mind. After we have separate DWPT I'd >> like to introduce parallel DWPTs, that write different slices. >> Synchronization should not be a big worry then, because writing is >> single-threaded. >> >> We could introduce a new abstract class SegmentWriter, which DWPT would >> implement. An extension would be ParallelSegmentWriter, which would manage >> multiple SegmentWriters. Or maybe SegmentSliceWriter would be a better >> name. >> >> Michael >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
