I agree that Infinispan case is not much different from
RamDirectory. The major difference is that in RD (also
FileDirectory) changes are not batched like in ID. If I do not wrap
changes in InfinispanDirectory(simple remove tx.begin() from
obtain() method and tx.commit() from release() in InfinispanLock),
and immediately commit every change made by IW it works well.
Hovewer it makes indexing really slower, because of frequent
replication to other nodes.
Sanne it's good remark that IW commit is kind of flush.
I've attached patch with InfinispanDirectory, failing test is
testDirectoryWithMultipleThreads in InfinispanDirectoryTest class.
It fails randomly. I think problem is Infinispan commit on
lockRelease() in org.apache.lucene.index.IndexWriter (line 1658) is
after IW commit() (line 1654).
Is it because, the IndexWriter only clean files if no indexReaders
are reading them (how would that be detected)?
It can happen if IndexWriter clean file, and IndexReader try to
access that cleaned file.
2009/9/23 Sanne Grinovero <sanne.grinov...@gmail.com>
I agree It should work the same way; The IndexWriter cleans files
whenever it likes to, it doesn't try to detect readers, and this
shouldn't have any effect on the working of readers.
The IndexReader opens the "SegmentsInfo" first, and immediately
after** gets a reference to the segments listed in this SegmentsInfo.
No IndexWriter will ever change an existing segment, only add new
files or eventually delete old ones (segments merge,optimize).
The deletion of segments is the interesting subject: when using Files
it uses "delete at last close", which works because the IR needing it
have it opened already**; when using the RAMDirectory they have a
reference preventing garbage collection.
( the two "**" are assuming the same event occurred correctly,
otherwise an exception is thrown at opening)
When using Infinispan it shouldn't be much different than the
RAMDirectory? so even if the needed segment is deleted, the IR
holds a
reference to the Java object locally since it was opened.
Łukcasz, do you have some failing test?
Sanne
2009/9/23 Emmanuel Bernard <emman...@hibernate.org>:
> Conceptually I don't understand why it does work in a pure file
system
> directory (ie IndexReader can go and process queries with the
IndexWriter
> goes about its business) and not when using Infinispan.
> Is it because, the IndexWriter only clean files if no
indexReaders are
> reading them (how would that be detected)?
> On 22 sept. 09, at 20:46, Łukasz Moreń wrote:
>
> I need to provide this same lifecycle for IndexWriter as for
Infinispan tx -
> IW is created: tx is started, IW is commited: tx is commited. It
assures
> that IndexReader doesn't read old data from directory.
> Infinispan transaction can be started when IW acquires the lock,
but its
> commit on IW lock release, as it is done so far, causes a problem:
>
> index writer close {
> index writer commit(); //changes are visible for IndexReaders
>
> //Index reader starts reading here, i.e. tries to access
file "A"
>
> index writer lockRelease(); //changes in Infinispan directory are
> commited, file "A" was removed, IndexReader cannot find it and
crashes
> }
>
> I think Infinispan tx have to be commited just before IW commit,
and the
> problem is where to put in code.
>
> W dniu 22 września 2009 18:24 użytkownik Emmanuel Bernard
> <emman...@hibernate.org> napisał:
>>
>> Can you explain in more details what is going on.
>> Aside from that Workspace has been Sanne's baby lately so he
will be the
>> best to see what design will work in HSearch. That being said, I
don't like
>> the idea of subclassing / overriding very much. In my
experience, it has
>> lead to more bad and unmaintainable code than anything else.
>> On 22 sept. 09, at 02:16, Łukasz Moreń wrote:
>>
>> Hi,
>>
>> Thanks for explanation.
>> Maybe better I will concentrate on the first release and postpone
>> distributed writing.
>>
>> There is already LockStrategy that uses Infinispan. With using
it I was
>> wrapping changes made by IndexWriter in Infinispan transaction,
because of
>> performance reasons -
>> on lock obtaining transaction was started, on lock release
transaction was
>> commited. Hovewer Ispn transaction commit on lock release is not
good idea
>> since IndexWriter calls index commit before lock is released(and
ispn
>> transaction is committed).
>> I was thinking to override Workspace class and
getIndexWriter(start
>> infinispan tx), commitIndexWriter (commit tx) methods to wrap
IndexWrite
>> lifecycle, but this needs few other changes. Some other ideas?
>>
>> Cheers,
>> Lukasz
>>
>> 2009/9/21 Sanne Grinovero <sanne.grinov...@gmail.com>
>>>
>>> Hi Łukasz,
>>> you've rightful concerns, because the way the IndexWriter tries
to
>>> achieve the lock
>>> that will bring some trouble; As far as I remember we decided
in this
>>> first release
>>> to avoid multiple writer nodes because of this reasons
>>> (that's written in your docs?)
>>>
>>> Actually it shouldn't be very hard to do, as the LockStrategy is
>>> pluggable (see changes from HSEARCH-345)
>>> and you could implement one delegating to an Infinispan eager
lock on
>>> some key,
>>> like the default LockStrategy takes a file lock in the index
directory.
>>>
>>> Maybe it's simpler to support this distributed writing instead of
>>> sending the queue to some single
>>> (elected) node? Would be cool, as the Document Analysis effort
would
>>> be distributed,
>>> but I have no idea if this would be more or less efficient than a
>>> single node writing; it could
>>> bring some huge data transfers along the wire during segments
merging
>>> (basically fetching
>>> the whole index data at each node performing a segment merge);
maybe
>>> you'll need to
>>> play with IndexWriter settings (
>>>
>>>
http://docs.jboss.org/hibernate/stable/search/reference/en/html_single/#lucene-indexing-performance
>>> )
>>> probably need to find the sweet spot for "merge_factor".
>>> I just saw now that MergePolicy is now re-implementable, but I
hope
>>> that won't be needed.
>>>
>>> Sanne
>>>
>>> 2009/9/21 Łukasz Moreń <lukasz.mo...@gmail.com>:
>>> > Hi,
>>> >
>>> > I'm wondering if it is reasonable to have multiple threads/
nodes that
>>> > modifies indexes in Lucene Directory based on Infinispan?
Let's assume
>>> > that
>>> > two nodes try to update index in this same time. First one
creates
>>> > IndexWriter and obtains
>>> > write lock. There is high propability that second node throws
>>> > LockObtainFailedException (as one IndexWriter is allowed on
single
>>> > index)
>>> > and index is not modified. How is that? Should be always only
one node
>>> > that
>>> > makes changes in
>>> > the index?
>>> >
>>> > Cheers,
>>> > Lukasz
>>> >
>>> > W dniu 15 września 2009 01:39 użytkownik Łukasz Moreń
>>> > <lukasz.mo...@gmail.com> napisał:
>>> >>
>>> >> Hi,
>>> >>
>>> >> With using JMeter I wanted to check if Infinispan dir does
not crash
>>> >> under
>>> >> heavy load in "real" use and check performance in comparison
with
>>> >> none/other
>>> >> directories.
>>> >> However appeared problem when multiple IndexWriters tries to
modify
>>> >> index
>>> >> (test InfinispanDirectoryTest) - random deadlocks, and Lucene
>>> >> exceptions.
>>> >> IndexWriter tries to access files in index that were removed
before.
>>> >> I'm
>>> >> looking into it, but not having good idea.
>>> >>
>>> >> Concerning the last part, I think similar thing is done in
>>> >> InfinispanDirectoryProviderTest. Many threads are making
changes and
>>> >> searching (not checking if db is in sync with index).
>>> >> If threads finish their work, with Lucene query I'm checking
if index
>>> >> contains as many results as expected. Maybe you meant
something else?
>>> >> Would be good to run each node in different VM.
>>> >>
>>> >>> Great ! Looking forward to it. What state are things in at
the moment
>>> >>> if I want to play around with it ?
>>> >>
>>> >> Should work with with one master(updates index) and one many
slave
>>> >> nodes
>>> >> (sends changes to master). I tried with one master and one
slave (both
>>> >> with
>>> >> jms and jgroups backend) and worked ok. Still fails if
multiple nodes
>>> >> want
>>> >> to modify index.
>>> >>
>>> >> I've attached patch with current version.
>>> >>
>>> >> Cheers,
>>> >> Łukasz
>>> >>
>>> >> 2009/9/13 Michael Neale <michael.ne...@gmail.com>
>>> >>>
>>> >>> Great ! Looking forward to it. What state are things in at
the moment
>>> >>> if I want to play around with it ?
>>> >>>
>>> >>> Sent from my phone.
>>> >>>
>>> >>> On 13/09/2009, at 7:26 PM, Sanne Grinovero
>>> >>> <sanne.grinov...@gmail.com>
>>> >>> wrote:
>>> >>>
>>> >>> > 2009/9/12 Michael Neale <michael.ne...@gmail.com>:
>>> >>> >> That does sounds pretty cool. Would be nice if the
lucene indexes
>>> >>> >> could scale along with how people will want to use
infinispan.
>>> >>> >> Probably worth playing with.
>>> >>> >
>>> >>> > Sure, this is the goal of Łukasz's work; We know compass
has
>>> >>> > some good Directories, but we're building our own as one
based
>>> >>> > on Infinispan is not yet available.
>>> >>> >
>>> >>> >>
>>> >>> >> Sent from my phone.
>>> >>> >>
>>> >>> >> On 13/09/2009, at 8:37 AM, Jeff Ramsdale <jeff.ramsd...@gmail.com
>
>>> >>> >> wrote:
>>> >>> >>
>>> >>> >>> I'm afraid I haven't followed the Infinispan-Lucene
>>> >>> >>> implementation
>>> >>> >>> closely, but have you looked at the Compass Project?
>>> >>> >>> (http://www.compass-project.org/overview.html) It
provides a
>>> >>> >>> simplified interface to Lucene (optional) as well as
Directory
>>> >>> >>> implementations built on Terracotta, Gigaspaces and
Coherence.
>>> >>> >>> The
>>> >>> >>> latter, in particular, might be a useful guide for the
Infinispan
>>> >>> >>> implementation. I believe it's mature enough to have
solved many
>>> >>> >>> of
>>> >>> >>> the most difficult problems of implementing Directory
on a
>>> >>> >>> distributed
>>> >>> >>> Map.
>>> >>> >>>
>>> >>> >>> If someone has any experience with Compass
(particularly it's
>>> >>> >>> Directory implementations) I'd be interested in hearing
about
>>> >>> >>> it...
>>> >>> >>> It's Apache 2.0 licensed, btw.
>>> >>> >>>
>>> >>> >>> -jeff
>>> >>> >>> _______________________________________________
>>> >>> >>> infinispan-dev mailing list
>>> >>> >>> infinispan-...@lists.jboss.org
>>> >>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> >>> >> _______________________________________________
>>> >>> >> infinispan-dev mailing list
>>> >>> >> infinispan-...@lists.jboss.org
>>> >>> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> >>> >>
>>> >>> >
>>> >>> > _______________________________________________
>>> >>> > infinispan-dev mailing list
>>> >>> > infinispan-...@lists.jboss.org
>>> >>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> >>>
>>> >>> _______________________________________________
>>> >>> infinispan-dev mailing list
>>> >>> infinispan-...@lists.jboss.org
>>> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> >
>>> >
>>
>>
>
>
>
<
InfinispanDirectoryProvider_22_09_2009
.patch>_______________________________________________
infinispan-dev mailing list
infinispan-...@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev