Re: Using Lucene to search live, being-edited documents

Umesh Prasad Fri, 21 Jan 2011 18:12:15 -0800

Hi,
   One work around would be to version the documents and store the
version as well as the timestamp of indexed document into the index.


Reading between lines I assume that
Document is
a) stored in some DB/File :
b) indexed in lucene index

User Search  On on b)
Document ids
but documents are displayed to user after retrieving from a).

Now I do not know a way in which I can keep a) and b) completely in
sync in realtime. As there will be some time taken in indexing
operation itself. a) --> b) .

Instead we can do following.
a) stored : Document ID + Document Text + Document Version +
Modification Time Stamp (T1)
b) Indexed : Document ID + Document Text + Document Version +
Modification Time Stamp (T2) (when indexed) (broken into date + hour +
mins + sec for minimizing number of terms)

User Searches b)
Search System gets Document ID + Modification Time Stamp (T2) and gives to
Presentation layer which compares the  T1 & T2.
If T2 < T1, Skip the result.

Assumption : Stored document is always in sync. Documents are
persisted somewhere and not served from memory.

Thanks & Regards
Umesh Prasad



On Sat, Jan 22, 2011 at 1:29 AM, software visualization
<softwarevisualizat...@gmail.com> wrote:
> Hi sorry for the long delay.
>
> The idea is that a single user is editing a single document. As they edit,
> any indexes built against the document become stale, actually wrong.
> Example:  references to specific localities within this document are all
> instantly wrong the first time a user types a new beginning  character-
> they're all off by one. Deleting  words is of course disastrous etc. etc.
>  So our story is- we used to have this document nicely indexed and now we
> have nothing useful.
>
> Considering what Lucene does prior to indexing, stemming for instance,  I am
> not sure no, I am quite sure I can't  recreate the same powerful indexing
> functionality.
>
> But it seems wrong  to lure our users into opening this document with
> promises that this that and the other thing is has been located for them
> only to invalidate all that just because they began to edit the document. I
> understand why that happens , but my users are perhaps not as tech savvy and
> I think it will just feel "wrong" to them.
>
> So I am looking for a way around this.
>
>
>
> On Tue, Jan 4, 2011 at 1:25 PM, adasal <adam.salt...@gmail.com> wrote:
>
>> I would think this is more like it.
>> But the essential thing, so it seems to me, is whether there is a
>> requirement for a serialised index, i.e. a more permanent record, aside
>> from
>> the saved document.
>> Then, if there is a penalty to creating the index compared to regex,
>> stringsearch or so, it is justified on other grounds.
>> I think it is an interesting q. when does that requirement emerge?
>> There is size of document.
>> But there would also be field types. I think I have this right. This is
>> really a classification system, so more than bare regex.
>> There must be other criteria that apply to this use case, too?
>>
>> Adam
>>
>> p.s. we (in my work project) are just beginning to use Lucene for geometry
>> objects and I am looking forward to understanding its use better,
>> including,
>> possibly, expanding it to other use cases apart from geo objects.
>>
>> On 3 January 2011 15:31, Robert Muir <rcm...@gmail.com> wrote:
>>
>> > On Mon, Jan 3, 2011 at 10:16 AM, Grant Ingersoll <gsing...@apache.org>
>> > wrote:
>> > > There is also the MemoryIndex, which is in contrib and is designed for
>> > one document at a time.  That being said, basic grep/regex is probably
>> fast
>> > enough.
>> > >
>> >
>> > In cases where you are doing a 'find' in a document similar to what a
>> > wordprocessor would do (especially if you want to iterate
>> > forwards/backwards through matches etc), you might want to consider
>> > something like
>> > http://icu-project.org/apiref/icu4j/com/ibm/icu/text/StringSearch.html
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >
>> >
>>
>



-- 
---
Thanks & Regards
Umesh Prasad

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Using Lucene to search live, being-edited documents

Reply via email to