I just tried the long query string as you suggested and it works great.
Thanks,
Shuai
On Jul 30, 2010, at 1:35 PM, Ian Lea wrote:
> Yes, you can do that. Make a Query for the 30 papers and use that
> with your main query in a BooleanQuery if doing it programatically.
> Or with so few documents
Yes, you can do that. Make a Query for the 30 papers and use that
with your main query in a BooleanQuery if doing it programatically.
Or with so few documents and papers to match, just in a long string
via QueryParser. See
http://lucene.apache.org/java/3_0_2/queryparsersyntax.html for details
on
> make both a stemmed field and an unstemmed field
While this approach is easy and would work, it means increasing the size of the
index and reindexing every document. However, the information is already
available in the existing field and runtime analysis is certainly faster than
more disk I/O
> > you want what Lucene already does, but that's clearly not true
>
> Hmmm, let's pretend that "contents" field in my example wasn't analyzed at
> index
> time. The unstemmed form of terms will be indexed. But if I query with a
> stemmed
> form or use QueryParser with the SnowballAnalyzer, I'm
> you want what Lucene already does, but that's clearly not true
Hmmm, let's pretend that "contents" field in my example wasn't analyzed at
index
time. The unstemmed form of terms will be indexed. But if I query with a
stemmed
form or use QueryParser with the SnowballAnalyzer, I'm not going to
Mike,
We took your suggestion and refactored like this:
TermEnum termEnum = indexReader.terms(new Term(field, "0"));
TermDocs allTermDocs = indexReader.termDocs();
while(termEnum.next() && termEnum.term().field().equals(field) {
allTermsDocs.seek(termEnum);
while(allTermDocs.next()) {
Hi Ian,
In your example below, how do we set the parameters so we can search for
"category:computers" AND "text:words"?
Thanks,
Shuai
On Jul 30, 2010, at 9:56 AM, Ian Lea wrote:
> Depending on what exactly you mean by "subset" and "index pool", then yes.
>
> If you've got one lucene index co
Hi Justin,
> > an example
>
> PerFieldAnalyzerWrapper analyzers =
> new PerFieldAnalyzerWrapper(new KeywordAnalyzer());
> // myfield defaults to KeywordAnalyzer
> analyzers.addAnalyzer("content", new SnowballAnalyzer(luceneVersion,
> "English"));
> // analyzers affects the indexed field valu
Sorry for the confusion..
Currently, we have total 7000 fulltext papers (with the pubmed IDs stored as
the unique IDs)
in the lucene index. We were wondering if we can search for a given term in a
subset of these papers
(eg, 30 papers; by providing a list of the pubmed IDs) instead of search
Nice catch -- thanks! I will fix.
Mike
On Fri, Jul 30, 2010 at 11:20 AM, jayendra patil
wrote:
> Working on the nightly build of solr and lucene -
>
> MultiPhraseQuery throws ArrayIndexOutOfBounds Exception for the words
> defined as synonyms
>
> SEVERE: java.lang.ArrayIndexOutOfBoundsException
> an example
PerFieldAnalyzerWrapper analyzers =
new PerFieldAnalyzerWrapper(new KeywordAnalyzer());
// myfield defaults to KeywordAnalyzer
analyzers.addAnalyzer("content", new SnowballAnalyzer(luceneVersion,
"English"));
// analyzers affects the indexed field value
IndexWriter writer = new I
Hi Justin,
> Unfortunately the suffix requires a wildcard as well in our case. There
> are a limited number of prefixes though (10ish), so perhaps we could
> combine them all into one query. We'd still need some sort of
> InverseWildcardQuery implementation.
>
> > use another analyzer so you don'
Depending on what exactly you mean by "subset" and "index pool", then yes.
If you've got one lucene index containing docs
docno: 1
category: computers
text: some words about computers
docno: 2
category: computers
text: some more words about computers
docno: 3
category: finance
text: some words
> assuming that you mistakenly used the same field name
Nope, wasn't a mistake. We'd have to dynamically iterate through an unknown
number of fields if we didn't use the same one.
- Original Message
From: Steven A Rowe
To: "java-user@lucene.apache.org"
Sent: Fri, July 30, 2010 11:1
Hi Justin,
> [...] "*:* AND -myfield:foo*".
>
> If my document contains "myfield:foobar" and "myfield:dog", the document
> would be thrown out because of the first field. I want to keep the
> document because the second field does not match.
I'm assuming that you mistakenly used the same field n
> indexing your terms in reverse
Unfortunately the suffix requires a wildcard as well in our case. There are a
limited number of prefixes though (10ish), so perhaps we could combine them all
into one query. We'd still need some sort of InverseWildcardQuery
implementation.
> use another analyze
With all these requirements you slow down your queries immense. You should
think about indexing your terms different:
- if you need leading wildcards, think about indexing your terms in reverse!
Wildcards starting with * needs to iterate all terms, so it's very slow (and
because of this defaults t
> I think you're suggesting, for example, "*:* AND -myfield:foo*".
Yes, I think that is equivalent.
> If my document contains "myfield:foobar" and "myfield:dog", the document would
> be thrown out because of the first field. I want to keep the document because
> the second field does not match.
Working on the nightly build of solr and lucene -
MultiPhraseQuery throws ArrayIndexOutOfBounds Exception for the words
defined as synonyms
SEVERE: java.lang.ArrayIndexOutOfBoundsException: 5
at
org.apache.lucene.search.MultiPhraseQuery$MultiPhraseWeight.scorer(MultiPhraseQuery.java:191)
I think you're suggesting, for example, "*:* AND -myfield:foo*".
If my document contains "myfield:foobar" and "myfield:dog", the document would
be thrown out because of the first field. I want to keep the document because
the second field does not match.
Related, is there a way to use wildcards
Hey,
I was wondering if we can search info from a subset of papers
instead of from the whole index pool.
Thanks,
Shuai
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-u
I can't get my head round exactly what you want, but a standard lucene
technique is a BooleanQuery holding a MatchAllDocsQuery and a second
query, can be anything, having Occur.MUST_NOT. I guess that is a way
of inverting the second query.
--
Ian.
On Fri, Jul 30, 2010 at 3:29 PM, Justin wrote
Any hints on making something like an InverseWildcardQuery?
We're trying to find all documents that have at least one field that doesn't
match the wildcard query.
Or is there a way to inverse any particular query?
-
To
Well it turns out that your suggestion was true. I added
lucene-memory-3.0.2.jar from the contrib/memory folder to the CLASSPATH and
it works.
The odd thing is that I most definitely have not added the jar to the CP in
Windows - and there wildcards work (with just core and highlight).
Thanks
Because the highlighter only uses MemoryIndex if wildcards are involved? I
don't use the highlighter package so have no idea if that is correct or not,
but the message
java.lang.NoClassDefFoundError: org/apache/lucene/index/memory/MemoryIndex
is clear. The jvm can't find that class.
--
Ian.
First of all, thanks for your response.
But how can that be true if a search-term without a wildcard (and the
highlighting of the results) works fine?
Greetings,
Markus
Ian Lea
Your linux set up is evidently missing a jar file - the one that contains
org/apache/lucene/index/memory/MemoryIndex. Or it is there but not in the
CLASSPATH, or something else along those lines.
--
Ian.
On Fri, Jul 30, 2010 at 2:30 PM, Markus Roth wrote:
>
>
> Hello everyone,
>
> I'm using
Hello everyone,
I'm using lucene for obvious purposes and I'm trying to highlight
search-term results.
libraries I use:
lucene-core version: 3.0.2
lucene-highlighter version: 3.0.2
Dev-System:
WinXP Pro 32Bit, jdk1.6.0_20,
java version "1.6.0_20"
Java
Hi all,
I'd like to do a very simple change to the idf computation, but I can't seem
to wrap my head around it.
There are very useful hints in the javadocs for "Changing Similarity" for
new tf() and lengthNorm() behavior, but it was a little bit blurrier for
idf()
http://lucene.apache.org/java/3_0
http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/index/IndexReader.html#reopen%28%29
...
If the index has not changed since this instance was (re)opened, then
this call is a NOOP and returns this instance
--
Ian.
On Fri, Jul 30, 2010 at 9:16 AM, Gregory Tarr wrote:
> I'm having t
I'm having trouble with the IndexReader class as per below: (using
lucene 2.9.1)
RAMDirectory dir = new RAMDirectory();
createIndex(dir);
IndexReader reader = IndexReader.open(dir);
IndexReader reader2 = reader.reopen();
reader.close();
reader2.terms(); // AlreadyClosedException - this IndexReader
31 matches
Mail list logo