date:20060327

Re: Re-creating IndexSearcher after update

2006-03-27 Thread Nick Atkins

Luc,

I tried adding your DelayCloseIndexSearcher to my project (a Tomcat app
where the index is repeatedly searched and frequently updated) and as
soon as an index modify occurs (by a separate thread) and I call
closeWhenDone() in the main thread I get
IllegalStateException("closeWhenDone() already called").  The Exception
is thrown for every subsequent search attempt.

Any ideas?

Thanks,

Nick.

Vanlerberghe, Luc wrote:
> Yep,
>
> I created DelayCloseIndexSearcher just for this scenario and it's
> running in production for about half a year now...
>
> There's an usage example in the javadoc, but it can be optimised even
> more (without touching the code that does the searches, handles the
> hits, etc...).
>
> In my production environment, isCurrent() is called in a separate
> thread.  If it returns false, a new DelayCloseIndexSearcher instance is
> created, some warming up is done and only then the existing one is
> replaced and closeWhenDone is called on it.
>
> Luc
>
> -Original Message-
> From: Koji Sekiguchi [mailto:[EMAIL PROTECTED] 
> Sent: dinsdag 21 maart 2006 9:24
> To: java-user@lucene.apache.org
> Subject: RE: Re-creating IndexSearcher after update
>
> Hi Steve,
>
> DelayCloseIndexSearcher may suit your requirement?
>
> Please check:
> http://issues.apache.org/jira/browse/LUCENE-445
>
> Hope this helps.
>
> Koji
>
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>   

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

PhraseQuery with synonyms or having n tokens at the same tokenposition.

2006-03-27 Thread Ramana Jelda

Hi,
PhraseQuery is not working as I wanted,when indexed with synonyms.

ex:
I have indexed name: "sony dsc-d cybershot" as following tokens provided
token positions.
1: [sony:0->4]

2: [dsc:5->10]

3: [dscd:5->10] 

4: [d:5->10] 

5: [cybershot:11->20] 



So "dsc-d" is tokenized into 3 tokens "dsc", "dscd" and "d" at the same
token location. Indexing part is ok. But the problem is with searching.

PhraseQuery "dsc cybershot" is not returning any results. Because "dsc" &
"cybershot" are not 0-spanned(But my imagination is they are). I could
increase the span at search time. But it is not fitting well into our
needs.It is also hard to decide the maximus span in our case and also
returned results are different, which I don't want.



Thanks in advance,

Jelda

-"Impossible is Nothing"---


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

add word filtering?

2006-03-27 Thread abdul muhaimin

Hi all

I'm really new to lucene. In fact I just found it when i googled a few days
ago. Never thought that java have this kind of excellent library for free.

I would like to ask a few questions, which is where to add if we would like
to filter certain text from being searched, and filter certain results from
being displayed, or display alternative result for filtered results when
we're using lucent? Instead of just editing the resutls .jsp page (from the
demo) is there any better way?

Any information is greatly appreciated.

Thanks in advance

RE: add word filtering?

2006-03-27 Thread Satuluri, Venu_Madhav

Are you asking that common words not be searched? For this, you can use
StopFilter to prevent words from being indexed and searched.
Alternatively, you can use StandardAnalyzer, which in addition to
removing stop words also does more sophisticated tokenizing. 

Venu

-Original Message-
From: abdul muhaimin [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 27, 2006 3:13 PM
To: java-user@lucene.apache.org
Subject: add word filtering?


Hi all

I'm really new to lucene. In fact I just found it when i googled a few
days
ago. Never thought that java have this kind of excellent library for
free.

I would like to ask a few questions, which is where to add if we would
like
to filter certain text from being searched, and filter certain results
from
being displayed, or display alternative result for filtered results when
we're using lucent? Instead of just editing the resutls .jsp page (from
the
demo) is there any better way?

Any information is greatly appreciated.

Thanks in advance

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Get All Entries

2006-03-27 Thread StefanH


Hello Everyone,

I have 6000 Entries in my Lucene DB and if I search for entries with "00*"
in the Number-Field it works fine. But additional I must have alle entries
no matter which number they have. A Term like "*" doesn't work. How can I
get all entries?
The code of my search is:
IndexSearcher is = new IndexSearcher( INDEX_DIR );

QueryParser parser = new QueryParser( "number", 
analyzer);
Query query = parser.parse("00*");

Hits hits = is.search( query, new Sort("number") );

Thanks for your help

Stefan H
--
View this message in context: 
http://www.nabble.com/Get-All-Entries-t1348226.html#a3606783
Sent from the Lucene - Java Users forum at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Get All Entries

2006-03-27 Thread Satuluri, Venu_Madhav

I believe there's a MatchAllDocsQuery class from Lucene 1.9 onwards. You
can run this query to get all documents.

If you are not using 1.9, to my knowledge, you would have to add a
redundant field that would true for all documents and query on that
field. Something like Field.Keyword("AllDocsTrue", "true") and add this
to your doc. You can run the query AllDocsTrue:true to get all your
docs.

Venu

-Original Message-
From: StefanH [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 27, 2006 3:24 PM
To: java-user@lucene.apache.org
Subject: Get All Entries



Hello Everyone,

I have 6000 Entries in my Lucene DB and if I search for entries with
"00*"
in the Number-Field it works fine. But additional I must have alle
entries
no matter which number they have. A Term like "*" doesn't work. How can
I
get all entries?
The code of my search is:
IndexSearcher is = new IndexSearcher( INDEX_DIR
);

QueryParser parser = new QueryParser( "number",
analyzer);
Query query = parser.parse("00*");

Hits hits = is.search( query, new Sort("number")
);

Thanks for your help

Stefan H
--
View this message in context:
http://www.nabble.com/Get-All-Entries-t1348226.html#a3606783
Sent from the Lucene - Java Users forum at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Get All Entries

2006-03-27 Thread StefanH


It works perfect. After installation of 1.9. I've the MatchAllDocsQuery.
Thanks!
--
View this message in context: 
http://www.nabble.com/Get-All-Entries-t1348226.html#a3610840
Sent from the Lucene - Java Users forum at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Phrase Query query

2006-03-27 Thread Richard Gunderson


Hi

I'm using PhraseQuery in conjunction with WhiteSpaceAnalyzer but it's
giving me slightly unusual results. If I have a text file containing the
text (quotes are just for clarity):

"Hello this is some text"

I don't find any results when I search.

But if I put spaces before and after the phrase:

" Hello this is some text "

then it does work. I'm breaking the phrase down into Terms, and setting the
slop to '0' by the way.

I'm kind of see that this makes sense, given the name: WhiteSpaceAnalyzer.
But aren't newlines, carriage-returns etc also treated as whitespace?

Thanks for your help!

Regards

Richard Gundersen
Honda UK - ISD
Tel: +44 (0)1753 590681
**
This email is confidential and intended solely for the use of the
individual to whom it is addressed. Any views or opinions presented are
solely those of the author and do not necessarily represent those of Honda
Motor Europe Ltd. or any of its group of companies.

If you are not the intended recipient, be advised that you have received
this email in error and that any use, dissemination, forwarding, printing
or copying of this email is strictly prohibited.

Visit our website: http://www.honda.co.uk
**


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: span query scoring vs boolean query scoring

2006-03-27 Thread Doug Cutting


Vincent Le Maout wrote:

I am missing something ? Is it intented or is it a bug ?


Looks like a bug.  Can you submit a patch?

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: span query scoring vs boolean query scoring

2006-03-27 Thread Doug Cutting


Vincent Le Maout wrote:

I am missing something ? Is it intented or is it a bug ?


Looks like a bug.  Can you please submit a bug report, and, ideally, 
attach a patch?


Thanks,

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene indexing on Hadoop distributed file system

2006-03-27 Thread Doug Cutting


Igor Bolotin wrote:

If somebody is interested - I can post our changes in TermInfosWriter and
SegmentTermEnum code, although they are pretty trivial.


Please submit this as a patch attached to a bug report.

I contemplated making this change to Lucene myself, when writing Nutch's 
FsDirectory, but thought that no one else would ever be interested in 
using it.  Now that's been proven wrong!


Note that any change to the file format must be back-compatible.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Does Optimize preserve index order?

2006-03-27 Thread Yonik Seeley

On 3/24/06, chan kang <[EMAIL PROTECTED]> wrote:
> What I want to do is to show the results in
> chronological order. (btw, the index contains the time field)
> One solution I have thought up was:
> 1. index the whole set
> 2. read in all the time field values
> 3. re-index the whole set according to time
>(heard that the index order is same as insertion order)
> 4. optimize.
>
>
> However, although I think the step 3 would result
> in a sorted index, isn't there a possibility that
> step 4 might ruin all the sortedness?
> - Wouldn't optimizing break the order in which they
>   are indexed?

Index order is retained, so your plan should work fine.

How long is sorting actually taking?  FYI, the first time you sort on
a field will take much longer because a fieldcache entry must be
populated.

-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Phrase Query query

2006-03-27 Thread Otis Gospodnetic

Richard,

WhitespaceTokenizer (the tokenizer that WhitespaceAnalyzer uses) really just 
tokenizes on space characters:

  /** Collects only characters which do not satisfy
   * [EMAIL PROTECTED] Character#isWhitespace(char)}.*/
  protected boolean isTokenChar(char c) {
return !Character.isWhitespace(c);
  }

Otis

- Original Message 
From: Richard Gunderson <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday, March 27, 2006 10:56:18 AM
Subject: Phrase Query query


Hi

I'm using PhraseQuery in conjunction with WhiteSpaceAnalyzer but it's
giving me slightly unusual results. If I have a text file containing the
text (quotes are just for clarity):

"Hello this is some text"

I don't find any results when I search.

But if I put spaces before and after the phrase:

" Hello this is some text "

then it does work. I'm breaking the phrase down into Terms, and setting the
slop to '0' by the way.

I'm kind of see that this makes sense, given the name: WhiteSpaceAnalyzer.
But aren't newlines, carriage-returns etc also treated as whitespace?

Thanks for your help!

Regards

Richard Gundersen
Honda UK - ISD
Tel: +44 (0)1753 590681
**
This email is confidential and intended solely for the use of the
individual to whom it is addressed. Any views or opinions presented are
solely those of the author and do not necessarily represent those of Honda
Motor Europe Ltd. or any of its group of companies.

If you are not the intended recipient, be advised that you have received
this email in error and that any use, dissemination, forwarding, printing
or copying of this email is strictly prohibited.

Visit our website: http://www.honda.co.uk
**


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene indexing on Hadoop distributed file system

2006-03-27 Thread Andrzej Bialecki


Doug Cutting wrote:

Igor Bolotin wrote:
If somebody is interested - I can post our changes in TermInfosWriter 
and

SegmentTermEnum code, although they are pretty trivial.


Please submit this as a patch attached to a bug report.

I contemplated making this change to Lucene myself, when writing 
Nutch's FsDirectory, but thought that no one else would ever be 
interested in using it.  Now that's been proven wrong!


Note that any change to the file format must be back-compatible.


This could be solved by putting a marker value in the first 8 bytes (== 
-1L), which would indicate that the real length is at the end. This way 
the new implementation will be able to read old indexes.


--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: PhraseQuery with synonyms or having n tokens at the same tokenposition.

2006-03-27 Thread Daniel Naber

On Montag 27 März 2006 11:17, Ramana Jelda wrote:

> I have indexed name: "sony dsc-d cybershot" as following tokens provided
> token positions.
> 1: [sony:0->4]
>
> 2: [dsc:5->10]
>
> 3: [dscd:5->10]
>
> 4: [d:5->10]
>
> 5: [cybershot:11->20]

If the first number is the token position, the tokens "dsc", "dscd", and 
"d" are obviously *not* at the same position. You need to call 
setPositionIncrement(0) to add a token at the same position during 
indexing. If that doesn't help, please provide a small test case that 
shows the problem.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: delte documents into index

2006-03-27 Thread Tom Hill

On Samstag 25 März 2006 00:39, Tom Hill wrote:

> IndexModifier won't work
> in multithreaded scenario, at least as far as I can tell.

Yes it does, but you need to use one IndexModifier object from all classes
(see the javadoc).

Regards
 Daniel

I stand corrected (after going back and reading the code more carefully ;-).

Thanks,

Tom

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene indexing on Hadoop distributed file system

2006-03-27 Thread Igor Bolotin

Does it make sense to change TermInfosWriter.FORMAT in the patch?

Igor


On 3/27/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
>
> Igor Bolotin wrote:
> > If somebody is interested - I can post our changes in TermInfosWriter
> and
> > SegmentTermEnum code, although they are pretty trivial.
>
> Please submit this as a patch attached to a bug report.
>
> I contemplated making this change to Lucene myself, when writing Nutch's
> FsDirectory, but thought that no one else would ever be interested in
> using it.  Now that's been proven wrong!
>
> Note that any change to the file format must be back-compatible.
>
> Doug
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

to OR or not

2006-03-27 Thread Amol Bhutada

Hi everybody,
I am using lucene in almost every web application I am working on. It's simply 
a great software.
I have developed an advanced search with Lucene 1.4. Now I am looking for 
developing a fuzzy search i.e get one search string from the user and search 
across all fields of member documents. 

I can think of two options :
- form a OR query using given search string for all fields
- add one more field ( say keyword ) to the member document with all 
information of the user.

- Are there any other options ?
- Which will be a better option for the system which has arround one million 
documents each having 20 fields and performance is a major concern?

thanks
Amol 





Sent via the WebMail system at mail.synechron.com


 
   



Mail Disclaimer: This e-mail and any files transmitted with it are confidential 
and the views expressed in the same are not necessarily the views of Synechron, 
and its Directors, Management or Employees. This communication represents the 
originator's personal views and opinions. If you are not the intended recipient 
or the person responsible for delivering the e-mail to the intended recipient, 
be advised that you have received this e-mail by error, and that any use, 
dissemination, forwarding, printing, or copying of this e-mail is strictly 
prohibited. You shall be under obligation to keep the contents of this e-mail, 
strictly confidential and shall not disclose, disseminate or divulge the same 
to any Person, Company, Firm or Entity. Even though Synechron uses up-to-date 
virus checking software to scan it's emails please ensure you have adequate 
virus protection before you open or detach any documents from this 
transmission. Synechron does not accept any liability for viruses 
 or vulnerabilities. The rights to monitor all e-mail communication through our 
network are reserved with us.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene indexing on Hadoop distributed file system

2006-03-27 Thread Doug Cutting


Igor Bolotin wrote:

Does it make sense to change TermInfosWriter.FORMAT in the patch?


Yes.  This should be updated for any change to the format of the file, 
and this certainly constitutes a format change.  This discussion should 
move to [EMAIL PROTECTED]


Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

How to write to and read from the same index

2006-03-27 Thread Nick Atkins

I'm using Lucene running on Tomcat to index a large amount of email data
and as the indexer runs through the mailbox creating, merging and
deleting documents it does lots of searches at the same time to see if
the document exists.  Actually all my modification operations are done
"in batch" every x seconds or so.

This seems to cause me lots of problems.  It believe it is not possible
to keep a single Searcher open while the index is being modified so the
only way is to detect the index changes, close the old one and create a
new one.  However, doing this causes the number of file handles to grow
beyond the max allowed by the system.  I have tried using Luc's
DelayCloseIndexSearcher with his Factory example but as my index is
modified frequently this causes lots of new DelayCloseIndexSearcher
objects.  The way it calls close on them when there are no more usages
doesn't seem to keep the number of file handles down, they just grow.  I
would expect close to release file handles to the system when nothing is
using the object (I even set it explicitly to null) but this does not
happen.

If this problem makes sense, has anyone else faced it, and does anyone
have a solution?

Cheers,

Nick.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: add word filtering?

2006-03-27 Thread abdul muhaimin

No. I'm sorry I didn't convey my question very well. Anyway thanks a lot for
the info.

What I really meant is, I want to filter out some words like for example,
"violence" & "hatred" from the search engine results. Consequently lucene
will display some alternative results for the above attempted search, such
as "Peace to the world." instead of the searched "violence".

How can I do it?

On 3/27/06, Satuluri, Venu_Madhav <[EMAIL PROTECTED]> wrote:
>
> Are you asking that common words not be searched? For this, you can use
> StopFilter to prevent words from being indexed and searched.
> Alternatively, you can use StandardAnalyzer, which in addition to
> removing stop words also does more sophisticated tokenizing.
>
> Venu
>
> -Original Message-
> From: abdul muhaimin [mailto:[EMAIL PROTECTED]
> Sent: Monday, March 27, 2006 3:13 PM
> To: java-user@lucene.apache.org
> Subject: add word filtering?
>
>
> Hi all
>
> I'm really new to lucene. In fact I just found it when i googled a few
> days
> ago. Never thought that java have this kind of excellent library for
> free.
>
> I would like to ask a few questions, which is where to add if we would
> like
> to filter certain text from being searched, and filter certain results
> from
> being displayed, or display alternative result for filtered results when
> we're using lucent? Instead of just editing the resutls .jsp page (from
> the
> demo) is there any better way?
>
> Any information is greatly appreciated.
>
> Thanks in advance
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: Re-creating IndexSearcher after update

PhraseQuery with synonyms or having n tokens at the same tokenposition.

add word filtering?

RE: add word filtering?

Get All Entries

RE: Get All Entries

RE: Get All Entries

Phrase Query query

Re: span query scoring vs boolean query scoring

Re: span query scoring vs boolean query scoring

Re: Lucene indexing on Hadoop distributed file system

Re: Does Optimize preserve index order?

Re: Phrase Query query

Re: Lucene indexing on Hadoop distributed file system

Re: PhraseQuery with synonyms or having n tokens at the same tokenposition.

Re: delte documents into index

Re: Lucene indexing on Hadoop distributed file system

to OR or not

Re: Lucene indexing on Hadoop distributed file system

How to write to and read from the same index

Re: add word filtering?

21 matches

Site Navigation

Mail list logo

Footer information