An update, I have managed to get it to not fail by debugging and changing the
value of org.apache.lucene.store.InputIndex.preUTF8Strings = true. The value is
always false when it fails.
Mike
-Original Message-
From: Mike Streeton [mailto:mike.stree...@connexica.com]
Sent: 28 April
I have an index that works fine on Lucene 2.3.2 but fails to open in 2.4.1, it
always fails with an Read past EOF. The index does contain some field names
with german umlaut characters in
Any ideas?
Many Thanks
Mike
CheckIndex v2.3.2
NOTE: testing will be more thorough if you run java with
I have now managed to quantify the error, it only affects Lucene 2.2 build
indexes and occurs after a period of time reusing a TermDocs object, I have
modified my test app top be a little more verbose about the conditions it fails
under. Hopefully someone can track the bug down in Lucene. I have
Thanks
Mike
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: 10 November 2007 22:49
To: java-user@lucene.apache.org
Subject: Re: TermDocs.skipTo error
On Nov 9, 2007 11:40 AM, Mike Streeton <[EMAIL PROTECTED]> wrote:
> I have just t
I have just tried this again using the index I built with lucene 2.1 but
running the test using lucene 2.2 and it works okay, so it seems to be
something related to an index built using lucene 2.2.
Mike
-Original Message-
From: Mike Streeton [mailto:[EMAIL PROTECTED]
Sent: 09 November
I have tried this again using Lucene 2.1 and as Erick found it works okay, I
have tried it on jdk 1.6 u1 and u3 both work, but both fail when using lucene
2.2
Mike
-Original Message-
From: Mike Streeton [mailto:[EMAIL PROTECTED]
Sent: 09 November 2007 16:05
To: java-user
Subject: Re: TermDocs.skipTo error
FWIW, running Lucene 2.1, Java 1.5 all I get is some numbers being printed
out
0
1
2
.
.
.
90,000
and ran through the above 4 times or so
Erick
On Nov 9, 2007 5:51 AM, Mike Streeton <[EMAIL PROTECTED]>
wrote:
> I have posted before about
I have posted before about a problem with TermDocs.skipTo () but never managed
to reproduce it. I have now got it to fail using the following program, please
can someone try it and see if they get the stack trace:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Array
index
Can TermDocs be reused i.e. can you do.
TermDocs docs = reader.termDocs();
docs.seek(term1);
int i = 0;
while (docs.next()) {
i++;
}
docs.seek(term2);
int j = 0;
while (docs.next()) {
j++;
}
Reuse does seem to work but I get ArrayIndexOutOfBoundsExceptions from
BitVector it I reu
Are there any issues surrounding TermDocs.skipTo(). I have a index that works
okay if I use TermDocs.next() to find next doc id, but using skipTo to go to
the one after a point can miss sometimes.
e.g. Iterating using TermDocs.next() and TermDocs.doc() 1,50,1,2 but
suing TermDocs.skipTo
I would use a RangeFilter instead of using the default Boolean query as
this will always break at some point with Too many Boolean clauses.
Extend QueryParser to sort this out. As far as extracting information
from log files I would look at creating yourself a LogAnalyzer that can
interpret the co
The only way you might get the performance you want is to have multiple
IndexWriters writing to different indexes and then addAll are the end.
You would obviously have to handle the multi threading and distribution
of the parts of the log to each writer.
Mike
www.ardentia.com the home of NetSearc
Chris,
Thanks for this I will have to do it the long hand way, we are trying
to create "search marts" containing a smaller index from a much larger
one, so cloning and deleting will not work.
Thanks
Mike
www.ardentia.com the home of NetSearch
-Original Message-
From: Chris Hostetter
I want to copy a selection of documents from one index to another. I can
get the Document objects from the IndexReader and write them to the
target index using the IndexWriter. The problem I have is this loses
fields that have not been stored, is there a way round this.
Thanks
Mike
www.
This is how we solve the range query problem using filters. The nice
part about it is you can use a range in a query so several ranges can be
ORed/ANDed or NOTed together if required, instead of applying a range
filter to the who query. (Assumes dates in MMDD format)
Hope this helps Mike.
Ext
The simplest solution is always the best - when storing the page, do not
break up sentences. So a page will be all the sentences that occur on
it. If a sentence starts on one page and finishes on the next it will be
included in both pages in the index.
Hope this helps
Mike
www.ardentia.com the h
What performs best across multiple indexes:
Each index with an IndexReader with an IndexSearcher on top and the
searchers linked with a ParallelMultiSearcher
Or
Each index with an IndexReader linked with a MultiReader and an
IndexSearcher on top
Many Thanks
Mike
www.ardentia
The simplest solution to this I would suggest is to decode the id to
relevance score e.g.
Select id, addfield
>From mytable
Where id in (1,2,3,4,5,50,60,70)
Order by case id when 1 then 0.9 when 2 then 0.8 when 3 then 0.7
end desc
You will have to generate the in () and the case statement bu
We recently ran some benchmarks on Linux with 4 xeon cpus and 2gb of
heap (not that this was needed). We managed to easily get 1000 term
based queries a second, this including the query execution time and
retrieving the top 10 documents from the index. We did notice some
contention as adding more c
>From memory addIndexes() also does and optimization before hand, this
might be what is taking the time.
Mike
www.ardentia.com the home of NetSearch
-Original Message-
From: heritrix.lucene [mailto:[EMAIL PROTECTED]
Sent: 22 June 2006 05:05
To: java-user@lucene.apache.org
Subject: Re: ad
When you talk about indexing emails are you indexing Outlook mails? We
have only found a few libraries that will do this and all require
Outlook to be online at the time i.e. you cannot index PST files
standalone.
As far as indexing goes index each address in a separate un-tokenized
field not spac
We wrote ours for NetSearch to handle this specific issue. I suggest you
create a holder class to hold the IndexReader and IndexSearcher, this
can close them in the finalizer. Clients keep the holder until they are
finished and then discard it. When it is completely de-referenced it
will be closed.
When doing this use a filter to restrict the query results to just those
for a users company. This will not affect the ranking then.
Mike
www.ardentia.com the home of NetSearch
-Original Message-
From: Mufaddal Khumri [mailto:[EMAIL PROTECTED]
Sent: 31 March 2006 20:33
To: java-user@luce
You need to encode the numbers by padding to the left or another method,
we do this we know what fields are numerics and extend QueryParser to
encode the fields for searching. We also decode the number on display
below is the functions we use, the tricky bit is getting negative
numbers to work corr
Override QueryParser and intercept queries of specific fields producing
TermQuery instead of letting it be generated from the analyzed value
using the default parser. If you want to look for "New Yo" try also
creating a prefix query from the TermQuery.
Mike
www.ardentia.com the home of NetSearch
When using the TermEnum method won't the terms be analyzed i.e. split in
to single words and lowercase, will this be a problem if your grouping
name is 2+ words mixed case etc?
Mike
www.ardentia.com the home of NetSearch
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTE
A simple solution if you only have 20,000 docs is just to iterate
through the hits and count them up against each color etc, this could be
in a HitCollector. The balance here is performance vs memory usage, if
you have a lot of users I would go for a solution that was less
efficient but used a lot
There are a number of ways of doing this. One way I would suggest if simply to
store the CONTENTS fields and prefix it with the field name. So instead of
storing a single CONTENTS field for a document, store a CONTENTS field for each
other field with the field name prefixing each field value. E.
Use BitSets to intersect the two queries. First knock up a HitCollector
that generates a bit set for the document set you want to search
(A,B,C,X,Y,Z). Then do another query generating a bit set for the
criteria on (C,X,Y). Then just interest the two bits sets using the
"and" method.
Mike
www.ard
For the recent questions about this here are a couple of methods for
encoding/decoding long values that will be sorted into order by a range
query
public static String encodeLong(long num) {
String hex = Long.toHexString(num < 0 ? Long.MAX_VALUE -
(0xL ^ num) : num);
many Boolean queries or does not return any results at all.
Mike
-Original Message-
From: Mike Streeton [mailto:[EMAIL PROTECTED]
Sent: 25 January 2006 11:28
To: java-user@lucene.apache.org
Subject: RE: Range queries
I can recommend this method, this is how we do it, but what we store in
I can recommend this method, this is how we do it, but what we store in
the index is the long converted to a 16 digit number hex. The extended
parser converts entered queries containing longs field to have hex. We
obviously also do the conversion before we display the value. Floating
point numbers
How do you go about getting our product listed on the Powered By Lucene
web site (http://wiki.apache.org/jakarta-lucene/PoweredBy) and latest
new in the Wiki.
Many Thanks
Mike
www.ardentia.com
Is there a way of altering the way lucene parses a default string to use
AND instead of OR, e.g. usually "joe bloggs" is executed as "joe OR
bloggs", is there a flag to change this to "joe AND bloggs" which seems
to be the way most search engines work.
Thanks
Mike
Thanks for this, I did not really explain my self well in the original
question, what I was interested to know is would a single Searcher
constructed from a MultiReader (across several different indexes) work
better than a MultiSearcher constructed from IndexSearchers each
pointing at a single inde
I have several indexes I want to search together. What performs better a
single searcher on a multi reader or a single multi searcher on multiple
searchers (1 per index).
Thanks
Mike
I have been given an index with a term that has been stored as a keyword
and contains spaces. We are parsing a query using QueryParser but given
'myfield:"abc def"' it generates a PhraseQuery for myfield:abc and
myfield:def. What is needed is a TermQuery(new Term(myfield,"abc def")).
Can you tell q
37 matches
Mail list logo