Re: Too many results with RegexQuery

2009-05-18 Thread Huntsman84
I mean "too many terms", excuse me. JoelG wrote: > > "but in some cases the search returns too many results" > do you *really* mean you get "too many results"? or do you actually mean > you get a "too many terms" exception due to the query expansion? > > > > -Original Message- >

Using Lucene for a classification problem

2009-05-18 Thread Jeetendra Mirchandani
Hi Lucene users, This might seem a little vague to people just using lucene. I am trying to see if I can use lucene for my specific problem I am trying to build a classification solution, where in I need to index each *structured* document into its category in training phase, and lookup a suitabl

CustomScoreQuery numerical precision problem

2009-05-18 Thread ac
hello, I am using CustomScoreQuery for result ranking. A field of my documents is parsable as an integer value, the magnide of which exceeds the precision of the float type. A sample value of this field is 24118569 However, due to the nature of CustomScoreQuery, a cast from int to float is perform

Re: Content is not allowed in prolog

2009-05-18 Thread Grant Ingersoll
I don't think this is a Lucene problem. If I had to guess, it looks like whatever you are uploading is expecting XML and you are sending a zip file. I'd suggest you look at the OpenCMS docs and ask over there. -Grant On May 18, 2009, at 5:14 PM, samurai123 wrote: Hi, Can someone pleas

Content is not allowed in prolog

2009-05-18 Thread samurai123
Hi, Can someone please help with this. I have been trying to import lucene 2.4.1 module with OpenCMS 7.0.5 and getting this error. The build was successful though. Any help will be appreciated! SAX error reading module import from Reason: SAX error reading module import from .. Reason:

Lucene 2.9

2009-05-18 Thread Zhang, Lisheng
Hi, I know lucene 2.9 would be the next release, do we have the release date yet (roughly, 6 months away, or longer)? Knowing this would help us to schedule our work, thanks for helps! Lisheng - To unsubscribe, e-mail: java-us

RE: Searching index problems with tomcat

2009-05-18 Thread Marco Lazzara
I've put the index in a folder named RDFIndexLucene(home/marco/RDFIndexLucene), and when i run the query,(for example) if I delete the folder Tomcat says :"no segments* file found in org.apache.lucene.store.FSDirectory@/home/marco/RDFIndexLucene" It means that Lucene try to search in the index but

RE: Searching index problems with tomcat

2009-05-18 Thread Uwe Schindler
If it is a webstart app, how do you distribute the index. The webstart app is downloaded to the user's computer and executed there. The index is not transferred on webapp download, if it is not included in the JAR file. Opening indexes from within JAR files (using Class.getResourceAsStream) is not

Searching index problems with tomcat

2009-05-18 Thread Marco Lazzara
Hi everybody, I've a problem with my searching index. I've created a stand alone application and it works perfectly. I've put them on tomcat launching with java web start,but if I run the query(the same query) I always obtain no results!!!Why?? Obviously My tomcat app is looking at the same index f

Re: Problems searching index

2009-05-18 Thread Ian Lea
Please start a new thread for a new question. And you need to provide more info. Here's a wild guess: your tomcat app is looking at a different index from your standalone app. -- Ian. On Mon, May 18, 2009 at 5:26 PM, Marco Lazzara wrote: > Hi everybody, > I've a problem with my searching ind

Re: Problems searching index

2009-05-18 Thread Marco Lazzara
Hi everybody, I've a problem with my searching index. I've created a stand alone application and it works perfectly.I've put them on tomcat launching with java web start but If I run the query(the same query) I always obtain no resultplease help me!!! Marco Lazzara Il giorno lun, 18/05/2

Re: Too many results with RegexQuery

2009-05-18 Thread Joel Halbert
"but in some cases the search returns too many results" do you *really* mean you get "too many results"? or do you actually mean you get a "too many terms" exception due to the query expansion? -Original Message- From: Huntsman84 Reply-To: java-user@lucene.apache.org To: java-user@lucen

Re: Problems searching index

2009-05-18 Thread Marco Lazzara
Hi everybody, I've a problem with my searching index. I've created a stand alone application and it works perfectly. I've put them on tomcat launching with java web start,but if I run the query(the same query) I always obtain no results!!!Why?? help Me!!!thanks a lot!! Marco Lazzara 2009/5/18 Eri

Too many results with RegexQuery

2009-05-18 Thread Huntsman84
Hi, I am using RegexQuery aiming to get a list of records from a regular expression, but in some cases the search returns too many results, and for that my program throws an Exception. How could I customize the query or the searcher to, for example, get just a set of results? Thank you so much!

Re: Problems searching index

2009-05-18 Thread Eric LeVin
Apparently it was because I needed to actually look at the method signature a bit closer :) not reassigning my indexReader instance would mean I was always using the first instance created which wouldn't have the documents in them. Thanks again for your help! -Eric LeVin balasubramanian sud

Re: Problems searching index

2009-05-18 Thread Eric LeVin
Thanks so much for your help Balasubramanian. Interestingly enough, I tried doing the following: indexWriter.addDocument(doc); indexWriter.commit(); indexWriter.optimize(); indexReader.reopen(); indexSearcher = new IndexSearcher(indexReade

Re: Problems searching index

2009-05-18 Thread balasubramanian sudaakeran
Hi Eric LeVin, I think whenever you reopen the indexReader you have to re-create indexSearcher also. This is because reopen of indexReader will give you a new instance if the underlying data is changed. Function documentation for IndexReader.reopen * If the index has not changed since this i

RE: how to get the word before and the word after the matched Term?

2009-05-18 Thread Aditya
Continuing to what Matt said, answer to your question: there is no direct library to give this. Also try sandbox based "highlight" related code base. Best Regards, Aditya -Original Message- From: Matthew Hall [mailto:mh...@informatics.jax.org] Sent: Monday, May 18, 2009 6:58 PM To: java

Problems searching index

2009-05-18 Thread Eric LeVin
Hi Everyone-- So I'm not quite sure what is going on with my Lucene index, but I'm having some issues searching. I've created a simple little index of 10 documents as follows: id: 1 type: Article content: <> id: 10 type: Article content: <> So I created a simple TermQuery for start

Re: relevance function for scores

2009-05-18 Thread Joel Halbert
It's not really a Lucene code question, as such, but it's certainly something that Lucene users may have implemented before... I'm hoping ;) -Original Message- From: Erick Erickson Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: relevance function for s

Re: relevance function for scores

2009-05-18 Thread Erick Erickson
In that case, I'll have to defer to folks who actually know somethingabout that part of the code . Erick On Mon, May 18, 2009 at 9:25 AM, Joel Halbert wrote: > Hi Erick, > > Thanks for the pointer. Sorry if the question was a bit unclear but > basically I'm looking to see if anyone has any poin

Re: Re: how to get the word before and the word after the matched Term?

2009-05-18 Thread Kamal Najib
Thank you for the reply. Kamal Original Message: Well, when you get the Document object, you have access to the fields in that document, including the text that was searched against. You could simply retrieve this string, and then use simple java String manipulation to get what you want. Matt

Re: how to get the word before and the word after the matched Term?

2009-05-18 Thread Matthew Hall
Well, when you get the Document object, you have access to the fields in that document, including the text that was searched against. You could simply retrieve this string, and then use simple java String manipulation to get what you want. Matt Kamal Najib wrote: Hi all, I want to get the

Re: relevance function for scores

2009-05-18 Thread Joel Halbert
Hi Erick, Thanks for the pointer. Sorry if the question was a bit unclear but basically I'm looking to see if anyone has any pointers on the actual mathematical functions or models to use (rather than the implementation). I'd be really interested to hear what others have used to solve this - since

how to get the word before and the word after the matche d Term?

2009-05-18 Thread Kamal Najib
Hi all, I want to get the word before and the word after the matched Term.For Example if i have the Text " The drug was freshly prepared at 4-hour intervals . Eleven courses were administered to seven patients at this dose level and no patient experienced nausea or vomiting" and the matched Te

Re: relevance function for scores

2009-05-18 Thread Erick Erickson
Have you looked at TopDocCollector? Basically, you can tell itto only return you the top N docs by score (N is arbitrary). What you then have is an array of raw score and doc ID pairs AND a max score. NOTE: "raw score" is not normalized, i.e. is not guaranteed to be between 0 and 1. So now you ca

Re: Getting a score of a specific document

2009-05-18 Thread Erick Erickson
As best I understand it, you DO NOT WANT A FILTER. Filters do notcontribute to scoring, therefore do not rank your documents. If you use a filter, the most irrelevant document could be first. You want to use a HitCollector, see the link in my last e-mail. That link includes an example of using a bi

relevance function for scores

2009-05-18 Thread Joel Halbert
Hi, I'd like to apply a score filter. I realise that filtering by absolute (i.e. anything less than x) scores is pretty meaningless. In my case I want to filter based on relative score - or on some function of score which looks for clustering of documents around certain score values. Context: I

Re: Max size of index? How do search engines avoid this?

2009-05-18 Thread Danil Ε’ORIN
2GB size is a limitation of OS and/or file systems, not of the index as supported by Lucene. There is some other kind of limitation in Lucene: number of documents < 2147483648 However the size of the lucene index may reach tens and hundreds of GB way before that. If you are thinking about BIG inde

Re: Max size of index? How do search engines avoid this?

2009-05-18 Thread mark harwood
>techniques used by big search engines to search among such huge data. Two keywords here - partitioning and replication. Partitioning is breaking the content down into shards and assigning shards to servers. These can then be queried in parallel to make search response times independent of the

Max size of index? How do search engines avoid this?

2009-05-18 Thread raistlink
Hi, I think I've read that there is a limit for de index, may be 2Gb for fat machines. If this is right I ask you for good resources (webs or books) about programming search engines to know about the techniques used by big search engines to search among such huge data. Thanks -- View this messag