Re: reading indice

2006-10-16 Thread heritrix . lucene
Read *org.apache.lucene.index.IndexReader *And *org.apache.lucene.search.IndexSearcher There are description available in these docs. * On 10/17/06, EDMOND KEMOKAI <[EMAIL PROTECTED]> wrote: Can someone tell me how read an index into memory, or how to open an existing index for reading?

reading indice

2006-10-16 Thread EDMOND KEMOKAI
Can someone tell me how read an index into memory, or how to open an existing index for reading? -- "talk trash and carry a small stick." PAUL KRUGMAN (NYT)

reloading index

2006-10-16 Thread EDMOND KEMOKAI
Hi Guys How do you reload an index. I have a webapp which might need to be redeployed but whenever i test FSDirectory.list(), nothing is returned. The segments and .cfs file is in the directory but those aren't recognized either. -- "talk trash and carry a small stick." PAUL KRUGMAN (NYT)

Re: PrefixFilter and WildcardQuery

2006-10-16 Thread Erick Erickson
Well, depending on what you mean by wildcard, a prefixfilter isn't necessarily what you want. If wildcard means abc*, then prefixfilter is right. If it means ab*cd?fg, a prefix filter isn't useful unless you want to do some fancy indexing. Think about writing your own filter. Wrap it in a Constan

Re: PrefixFilter and WildcardQuery

2006-10-16 Thread Doron Cohen
hi Vasu, how about using ChainedFilter(yourPrefixFilters[], ChainedFilter.AND)? vasu shah <[EMAIL PROTECTED]> wrote on 16/10/2006 17:50:27: > Hi, > > I have have multiple fields that I need to search on. All these > fields need to support wildcard search. I am ANDing these search > fields using Bo

PrefixFilter and WildcardQuery

2006-10-16 Thread vasu shah
Hi, I have have multiple fields that I need to search on. All these fields need to support wildcard search. I am ANDing these search fields using BooleanQuery. There is no need for score in my search. How do I implement these. I have seen PrefixFilter and it sounds promising. But then how do I

Re: Help with Custom Analyzer

2006-10-16 Thread Doron Cohen
Otis Gospodnetic <[EMAIL PROTECTED]> wrote on 16/10/2006 14:32:13: > Hi Ryan, > > StandardAnalyzer should already be smart about keeping email > addresses as a single token: > > // email addresses > | (("."|"-"|"_") )* "@" > (("."|"-") )+ > > > (this is from StandardAnalyzer.jj) > > As for cha

Re: Help with Custom Analyzer

2006-10-16 Thread Ryan O'Hara
Sorry, I wasn't really concerned with email addresses - I was just using that as an example. How would I tell the StandardAnalyzer that I want a certain phrase to be tokenized as a token? Surround by quotes or ..? Also, how would you recommend manipulating the Reader object? You said s

Re: Help with Custom Analyzer

2006-10-16 Thread Bill Taylor
It is not THAT hard to write a custom analyzer, that is what I did. I found that there is a bug in the setup, however, in that there are two incompatible definitions of Token. The generated file Tokenizer.java refers to the wrong definition of Token so I ahve to patch it before it will compil

Re: Help with Custom Analyzer

2006-10-16 Thread Otis Gospodnetic
Hi Ryan, StandardAnalyzer should already be smart about keeping email addresses as a single token: // email addresses | (("."|"-"|"_") )* "@" (("."|"-") )+ > (this is from StandardAnalyzer.jj) As for changing the text you feed to Lucene, that's all up to you. Changing the String seems l

Help with Custom Analyzer

2006-10-16 Thread Ryan O'Hara
I have a few questions regarding writing a custom analyzer. My situation is that I would like to use the StandardAnalyzer but with some data-specific rules. I was wondering if there was a way of telling the StandardAnalyzer to treat a string of text, that would normally be tokenized into m

Re: java.io.IOException: read past EOF

2006-10-16 Thread John Gilbert
turns out i needed a seek method. i ended up modeling it after the RAM Directory. i turned the RAMFile into an @Entity. the directory accesses the EntityManager. and i am using JBossCache. preliminary testing shows comparable response times.

Re: BooleanQuery.TooManyClauses exception

2006-10-16 Thread Chris Hostetter
RangeQueries expand to a boolean query containing all terms in the range, so it doesn't matter if you search on a course grain range, if you store the dates with high granulatiry -- the number of terms will be high. this wiki page discusses some of the merrits of using multiple date fields with v

BooleanQuery.TooManyClauses exception

2006-10-16 Thread Bushey, John
Hi - Can someone explain the reason why I'm getting the TooManyClauses exception? I have a general understanding of the issue based on my reading, but I don't understand the mechanics of the it. Specifically how is my query being expanded to cause this problem? How am I exceeding the default 102

Re: Searching pdf, getting page number

2006-10-16 Thread Steven Rowe
Hi Bill, Bill Taylor wrote: > On Oct 16, 2006, at 5:44 AM, Christoph Pächter wrote: >> I know that I can index pdf-files (using a third-party library). > > Could you please tell me where to find this library? There are several PDF extraction packages listed here (look under the "Lucene Document

Re: Looking for a stemmer that can return all inflected forms

2006-10-16 Thread Steven Rowe
Hi Jong, Jong Kim wrote: > I'm looking for a stemmer that is capable of returning all morphological > variants of a query term (to be used for high-recall search). For example, > given a query term of 'cares', I would like to be able to generate 'cares', > 'care', 'cared', and 'caring'. To ac

Re: Parallel Index Search

2006-10-16 Thread Michael McCandless
Supriya Kumar Shyamal wrote: If I am not mistaken the process of locking the Index by different objects like IndexReader or Indexwriter, theoratically only one Thread can access the index at a time. Actually, only one writer can write to the index at once. Multiple readers can read from the

Re: Searching pdf, getting page number

2006-10-16 Thread Bill Taylor
On Oct 16, 2006, at 5:44 AM, Christoph Pächter wrote: Hi, I know that I can index pdf-files (using a third-party library). Could you please tell me where to find this library? Is it possible to search the index for a phrase, getting not only the document, but also the page number in the (pd

Re: Parallel Index Search

2006-10-16 Thread Supriya Kumar Shyamal
Michael McCandless wrote: Supriya Kumar Shyamal wrote: If I am not mistaken the process of locking the Index by different objects like IndexReader or Indexwriter, theoratically only one Thread can access the index at a time. Actually, only one writer can write to the index at once. Multiple

RE: Avoiding sort by date

2006-10-16 Thread Graham Stead
Given that you want to score new documents higher (implicitly sorting them), I wonder whether Solr's FunctionQuery (specifically ReciprocalFloatFunction - http://incubator.apache.org/solr/docs/api/org/apache/solr/search/function/Re ciprocalFloatFunction.html) may also be helpful. It gives newer doc

Re: Parallel Index Search

2006-10-16 Thread Michael McCandless
Supriya Kumar Shyamal wrote: If I am not mistaken the process of locking the Index by different objects like IndexReader or Indexwriter, theoratically only one Thread can access the index at a time. Actually, only one writer can write to the index at once. Multiple readers can read from the

Re: Big problem with big indexes

2006-10-16 Thread Ariel Isaac Romero Cartaya
First af all, what is your machine architecture ??? Do you have a super pc ??? I'm running this on a dual xeon hyperthreading 2,4 Ghz, 1 Gb RAM, HD SATA. I Can not get the times results you get. I think that the problem may be in the structure of my index, for example I use a special analyzer fo

Re: Searching pdf, getting page number

2006-10-16 Thread Erick Erickson
Well, anything's possible . There's nothing magic about Lucene and its interaction with, say, a PDF document. What you put into the index is all you can get out. So.. You could index the PDF document by pages. That is, each page is a lucene "document", related by some ID (NOT the lucene doc_id,

Parallel Index Search

2006-10-16 Thread Supriya Kumar Shyamal
Hello All, If I am not mistaken the process of locking the Index by different objects like IndexReader or Indexwriter, theoratically only one Thread can access the index at a time. When we do search on the index it creates a commit lock so the other thread does not modify the index, so other

Searching pdf, getting page number

2006-10-16 Thread Christoph Pächter
Hi, I know that I can index pdf-files (using a third-party library). Is it possible to search the index for a phrase, getting not only the document, but also the page number in the (pdf-)document? Or is it even possible to get a bookmark, leading to this page? I am thankful for any information y

Re: Query not finding indexed data

2006-10-16 Thread Erik Hatcher
On Oct 16, 2006, at 2:44 AM, Antony Bowesman wrote: Doron Cohen wrote: Hi Antony, you cannot instruct the query parser to do that. Note that an Thanks, I suspected as much. I've changed it to make the field tokenized. field name. This is an application logic to know that a certain qu