Re: lucene/solr full text search

2010-07-30 Thread Shuai Weng
I just tried the long query string as you suggested and it works great. Thanks, Shuai On Jul 30, 2010, at 1:35 PM, Ian Lea wrote: > Yes, you can do that. Make a Query for the 30 papers and use that > with your main query in a BooleanQuery if doing it programatically. > Or with so few documents

Re: lucene/solr full text search

2010-07-30 Thread Ian Lea
Yes, you can do that. Make a Query for the 30 papers and use that with your main query in a BooleanQuery if doing it programatically. Or with so few documents and papers to match, just in a long string via QueryParser. See http://lucene.apache.org/java/3_0_2/queryparsersyntax.html for details on

Re: InverseWildcardQuery

2010-07-30 Thread Justin
> make both a stemmed field and an unstemmed field While this approach is easy and would work, it means increasing the size of the index and reindexing every document. However, the information is already available in the existing field and runtime analysis is certainly faster than more disk I/O

RE: InverseWildcardQuery

2010-07-30 Thread Steven A Rowe
> > you want what Lucene already does, but that's clearly not true > > Hmmm, let's pretend that "contents" field in my example wasn't analyzed at > index > time. The unstemmed form of terms will be indexed. But if I query with a > stemmed > form or use QueryParser with the SnowballAnalyzer, I'm

Re: InverseWildcardQuery

2010-07-30 Thread Justin
> you want what Lucene already does, but that's clearly not true Hmmm, let's pretend that "contents" field in my example wasn't analyzed at index time. The unstemmed form of terms will be indexed. But if I query with a stemmed form or use QueryParser with the SnowballAnalyzer, I'm not going to

RE: Term browsing much slower in Lucene 3.x.x

2010-07-30 Thread Nader, John P
Mike, We took your suggestion and refactored like this: TermEnum termEnum = indexReader.terms(new Term(field, "0")); TermDocs allTermDocs = indexReader.termDocs(); while(termEnum.next() && termEnum.term().field().equals(field) { allTermsDocs.seek(termEnum); while(allTermDocs.next()) {

Re: lucene/solr full text search

2010-07-30 Thread Shuai Weng
Hi Ian, In your example below, how do we set the parameters so we can search for "category:computers" AND "text:words"? Thanks, Shuai On Jul 30, 2010, at 9:56 AM, Ian Lea wrote: > Depending on what exactly you mean by "subset" and "index pool", then yes. > > If you've got one lucene index co

RE: InverseWildcardQuery

2010-07-30 Thread Steven A Rowe
Hi Justin, > > an example > > PerFieldAnalyzerWrapper analyzers = > new PerFieldAnalyzerWrapper(new KeywordAnalyzer()); > // myfield defaults to KeywordAnalyzer > analyzers.addAnalyzer("content", new SnowballAnalyzer(luceneVersion, > "English")); > // analyzers affects the indexed field valu

Re: lucene/solr full text search

2010-07-30 Thread Shuai Weng
Sorry for the confusion.. Currently, we have total 7000 fulltext papers (with the pubmed IDs stored as the unique IDs) in the lucene index. We were wondering if we can search for a given term in a subset of these papers (eg, 30 papers; by providing a list of the pubmed IDs) instead of search

Re: MultiPhraseQuery throws ArrayIndexOutOfBounds Exception

2010-07-30 Thread Michael McCandless
Nice catch -- thanks! I will fix. Mike On Fri, Jul 30, 2010 at 11:20 AM, jayendra patil wrote: > Working on the nightly build of solr and lucene - > > MultiPhraseQuery throws ArrayIndexOutOfBounds Exception for the words > defined as synonyms > > SEVERE: java.lang.ArrayIndexOutOfBoundsException

Re: InverseWildcardQuery

2010-07-30 Thread Justin
> an example PerFieldAnalyzerWrapper analyzers = new PerFieldAnalyzerWrapper(new KeywordAnalyzer()); // myfield defaults to KeywordAnalyzer analyzers.addAnalyzer("content", new SnowballAnalyzer(luceneVersion, "English")); // analyzers affects the indexed field value IndexWriter writer = new I

RE: InverseWildcardQuery

2010-07-30 Thread Steven A Rowe
Hi Justin, > Unfortunately the suffix requires a wildcard as well in our case. There > are a limited number of prefixes though (10ish), so perhaps we could > combine them all into one query. We'd still need some sort of > InverseWildcardQuery implementation. > > > use another analyzer so you don'

Re: lucene/solr full text search

2010-07-30 Thread Ian Lea
Depending on what exactly you mean by "subset" and "index pool", then yes. If you've got one lucene index containing docs docno: 1 category: computers text: some words about computers docno: 2 category: computers text: some more words about computers docno: 3 category: finance text: some words

Re: InverseWildcardQuery

2010-07-30 Thread Justin
> assuming that you mistakenly used the same field name Nope, wasn't a mistake. We'd have to dynamically iterate through an unknown number of fields if we didn't use the same one. - Original Message From: Steven A Rowe To: "java-user@lucene.apache.org" Sent: Fri, July 30, 2010 11:1

RE: InverseWildcardQuery

2010-07-30 Thread Steven A Rowe
Hi Justin, > [...] "*:* AND -myfield:foo*". > > If my document contains "myfield:foobar" and "myfield:dog", the document > would be thrown out because of the first field. I want to keep the > document because the second field does not match. I'm assuming that you mistakenly used the same field n

Re: InverseWildcardQuery

2010-07-30 Thread Justin
> indexing your terms in reverse Unfortunately the suffix requires a wildcard as well in our case. There are a limited number of prefixes though (10ish), so perhaps we could combine them all into one query. We'd still need some sort of InverseWildcardQuery implementation. > use another analyze

RE: InverseWildcardQuery

2010-07-30 Thread Uwe Schindler
With all these requirements you slow down your queries immense. You should think about indexing your terms different: - if you need leading wildcards, think about indexing your terms in reverse! Wildcards starting with * needs to iterate all terms, so it's very slow (and because of this defaults t

Re: InverseWildcardQuery

2010-07-30 Thread Ian Lea
> I think you're suggesting, for example, "*:* AND -myfield:foo*". Yes, I think that is equivalent. > If my document contains "myfield:foobar" and "myfield:dog", the document would > be thrown out because of the first field. I want to keep the document because > the second field does not match.

MultiPhraseQuery throws ArrayIndexOutOfBounds Exception

2010-07-30 Thread jayendra patil
Working on the nightly build of solr and lucene - MultiPhraseQuery throws ArrayIndexOutOfBounds Exception for the words defined as synonyms SEVERE: java.lang.ArrayIndexOutOfBoundsException: 5 at org.apache.lucene.search.MultiPhraseQuery$MultiPhraseWeight.scorer(MultiPhraseQuery.java:191)

Re: InverseWildcardQuery

2010-07-30 Thread Justin
I think you're suggesting, for example, "*:* AND -myfield:foo*". If my document contains "myfield:foobar" and "myfield:dog", the document would be thrown out because of the first field. I want to keep the document because the second field does not match. Related, is there a way to use wildcards

lucene/solr full text search

2010-07-30 Thread Shuai Weng
Hey, I was wondering if we can search info from a subset of papers instead of from the whole index pool. Thanks, Shuai - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-u

Re: InverseWildcardQuery

2010-07-30 Thread Ian Lea
I can't get my head round exactly what you want, but a standard lucene technique is a BooleanQuery holding a MatchAllDocsQuery and a second query, can be anything, having Occur.MUST_NOT. I guess that is a way of inverting the second query. -- Ian. On Fri, Jul 30, 2010 at 3:29 PM, Justin wrote

InverseWildcardQuery

2010-07-30 Thread Justin
Any hints on making something like an InverseWildcardQuery? We're trying to find all documents that have at least one field that doesn't match the wildcard query. Or is there a way to inverse any particular query? - To

Antwort: Re: Re: Highlighter wildcard problems: NoClassDefFoundError in Linux/CentOS 5.4, works in Windows XP

2010-07-30 Thread Markus Roth
Well it turns out that your suggestion was true. I added lucene-memory-3.0.2.jar from the contrib/memory folder to the CLASSPATH and it works. The odd thing is that I most definitely have not added the jar to the CP in Windows - and there wildcards work (with just core and highlight). Thanks

Re: Re: Highlighter wildcard problems: NoClassDefFoundError in Linux/CentOS 5.4, works in Windows XP

2010-07-30 Thread Ian Lea
Because the highlighter only uses MemoryIndex if wildcards are involved? I don't use the highlighter package so have no idea if that is correct or not, but the message java.lang.NoClassDefFoundError: org/apache/lucene/index/memory/MemoryIndex is clear. The jvm can't find that class. -- Ian.

Antwort: Re: Highlighter wildcard problems: NoClassDefFoundError in Linux/CentOS 5.4, works in Windows XP

2010-07-30 Thread Markus Roth
First of all, thanks for your response. But how can that be true if a search-term without a wildcard (and the highlighting of the results) works fine? Greetings, Markus Ian Lea

Re: Highlighter wildcard problems: NoClassDefFoundError in Linux/CentOS 5.4, works in Windows XP

2010-07-30 Thread Ian Lea
Your linux set up is evidently missing a jar file - the one that contains org/apache/lucene/index/memory/MemoryIndex. Or it is there but not in the CLASSPATH, or something else along those lines. -- Ian. On Fri, Jul 30, 2010 at 2:30 PM, Markus Roth wrote: > > > Hello everyone, > > I'm using

Highlighter wildcard problems: NoClassDefFoundError in Linux/CentOS 5.4, works in Windows XP

2010-07-30 Thread Markus Roth
Hello everyone, I'm using lucene for obvious purposes and I'm trying to highlight search-term results. libraries I use: lucene-core version: 3.0.2 lucene-highlighter version: 3.0.2 Dev-System: WinXP Pro 32Bit, jdk1.6.0_20, java version "1.6.0_20" Java

Modifying idf()?

2010-07-30 Thread Pablo Mendes
Hi all, I'd like to do a very simple change to the idf computation, but I can't seem to wrap my head around it. There are very useful hints in the javadocs for "Changing Similarity" for new tf() and lengthNorm() behavior, but it was a little bit blurrier for idf() http://lucene.apache.org/java/3_0

Re: Closing and reopening readers

2010-07-30 Thread Ian Lea
http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/index/IndexReader.html#reopen%28%29 ... If the index has not changed since this instance was (re)opened, then this call is a NOOP and returns this instance -- Ian. On Fri, Jul 30, 2010 at 9:16 AM, Gregory Tarr wrote: > I'm having t

Closing and reopening readers

2010-07-30 Thread Gregory Tarr
I'm having trouble with the IndexReader class as per below: (using lucene 2.9.1) RAMDirectory dir = new RAMDirectory(); createIndex(dir); IndexReader reader = IndexReader.open(dir); IndexReader reader2 = reader.reopen(); reader.close(); reader2.terms(); // AlreadyClosedException - this IndexReader