RE: Read past EOF

2009-04-28 Thread Mike Streeton
An update, I have managed to get it to not fail by debugging and changing the value of org.apache.lucene.store.InputIndex.preUTF8Strings = true. The value is always false when it fails. Mike -Original Message- From: Mike Streeton [mailto:mike.stree...@connexica.com] Sent: 28 April

Read past EOF

2009-04-28 Thread Mike Streeton
I have an index that works fine on Lucene 2.3.2 but fails to open in 2.4.1, it always fails with an Read past EOF. The index does contain some field names with german umlaut characters in Any ideas? Many Thanks Mike CheckIndex v2.3.2 NOTE: testing will be more thorough if you run java with

RE: TermDocs.skipTo error

2007-11-14 Thread Mike Streeton
I have now managed to quantify the error, it only affects Lucene 2.2 build indexes and occurs after a period of time reusing a TermDocs object, I have modified my test app top be a little more verbose about the conditions it fails under. Hopefully someone can track the bug down in Lucene. I have

RE: TermDocs.skipTo error

2007-11-12 Thread Mike Streeton
Thanks Mike -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: 10 November 2007 22:49 To: java-user@lucene.apache.org Subject: Re: TermDocs.skipTo error On Nov 9, 2007 11:40 AM, Mike Streeton <[EMAIL PROTECTED]> wrote: > I have just t

RE: TermDocs.skipTo error

2007-11-09 Thread Mike Streeton
I have just tried this again using the index I built with lucene 2.1 but running the test using lucene 2.2 and it works okay, so it seems to be something related to an index built using lucene 2.2. Mike -Original Message- From: Mike Streeton [mailto:[EMAIL PROTECTED] Sent: 09 November

RE: TermDocs.skipTo error

2007-11-09 Thread Mike Streeton
I have tried this again using Lucene 2.1 and as Erick found it works okay, I have tried it on jdk 1.6 u1 and u3 both work, but both fail when using lucene 2.2 Mike -Original Message- From: Mike Streeton [mailto:[EMAIL PROTECTED] Sent: 09 November 2007 16:05 To: java-user

RE: TermDocs.skipTo error

2007-11-09 Thread Mike Streeton
Subject: Re: TermDocs.skipTo error FWIW, running Lucene 2.1, Java 1.5 all I get is some numbers being printed out 0 1 2 . . . 90,000 and ran through the above 4 times or so Erick On Nov 9, 2007 5:51 AM, Mike Streeton <[EMAIL PROTECTED]> wrote: > I have posted before about

TermDocs.skipTo error

2007-11-09 Thread Mike Streeton
I have posted before about a problem with TermDocs.skipTo () but never managed to reproduce it. I have now got it to fail using the following program, please can someone try it and see if they get the stack trace: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Array index

Reuse TermDocs

2007-11-05 Thread Mike Streeton
Can TermDocs be reused i.e. can you do. TermDocs docs = reader.termDocs(); docs.seek(term1); int i = 0; while (docs.next()) { i++; } docs.seek(term2); int j = 0; while (docs.next()) { j++; } Reuse does seem to work but I get ArrayIndexOutOfBoundsExceptions from BitVector it I reu

TermDocs.skipTo

2007-10-29 Thread Mike Streeton
Are there any issues surrounding TermDocs.skipTo(). I have a index that works okay if I use TermDocs.next() to find next doc id, but using skipTo to go to the one after a point can miss sometimes. e.g. Iterating using TermDocs.next() and TermDocs.doc() 1,50,1,2 but suing TermDocs.skipTo

RE: Using Lucene to search log files

2006-12-11 Thread Mike Streeton
I would use a RangeFilter instead of using the default Boolean query as this will always break at some point with Too many Boolean clauses. Extend QueryParser to sort this out. As far as extracting information from log files I would look at creating yourself a LogAnalyzer that can interpret the co

RE: Index Rows as Documents? Help me design a solution

2006-07-26 Thread Mike Streeton
The only way you might get the performance you want is to have multiple IndexWriters writing to different indexes and then addAll are the end. You would obviously have to handle the multi threading and distribution of the parts of the log to each writer. Mike www.ardentia.com the home of NetSearc

RE: Copying documents

2006-07-26 Thread Mike Streeton
Chris, Thanks for this I will have to do it the long hand way, we are trying to create "search marts" containing a smaller index from a much larger one, so cloning and deleting will not work. Thanks Mike www.ardentia.com the home of NetSearch -Original Message- From: Chris Hostetter

Copying documents

2006-07-25 Thread Mike Streeton
I want to copy a selection of documents from one index to another. I can get the Document objects from the IndexReader and write them to the target index using the IndexWriter. The problem I have is this loses fields that have not been stored, is there a way round this. Thanks Mike www.

RE: Date ranges - getting the approach right

2006-07-20 Thread Mike Streeton
This is how we solve the range query problem using filters. The nice part about it is you can use a range in a query so several ranges can be ORed/ANDed or NOTed together if required, instead of applying a range filter to the who query. (Assumes dates in MMDD format) Hope this helps Mike. Ext

RE: Searching for a phrase which spans on 2 pages

2006-07-12 Thread Mike Streeton
The simplest solution is always the best - when storing the page, do not break up sentences. So a page will be all the sentences that occur on it. If a sentence starts on one page and finishes on the next it will be included in both pages in the index. Hope this helps Mike www.ardentia.com the h

Searcher performance

2006-07-07 Thread Mike Streeton
What performs best across multiple indexes: Each index with an IndexReader with an IndexSearcher on top and the searchers linked with a ParallelMultiSearcher Or Each index with an IndexReader linked with a MultiReader and an IndexSearcher on top Many Thanks Mike www.ardentia

RE: Sorting & SQL-Database

2006-07-04 Thread Mike Streeton
The simplest solution to this I would suggest is to decode the id to relevance score e.g. Select id, addfield >From mytable Where id in (1,2,3,4,5,50,60,70) Order by case id when 1 then 0.9 when 2 then 0.8 when 3 then 0.7 end desc You will have to generate the in () and the case statement bu

RE: search performance benchmarks

2006-06-26 Thread Mike Streeton
We recently ran some benchmarks on Linux with 4 xeon cpus and 2gb of heap (not that this was needed). We managed to easily get 1000 term based queries a second, this including the query execution time and retrieving the top 10 documents from the index. We did notice some contention as adding more c

RE: addIndexes() is taking infinite time ...

2006-06-21 Thread Mike Streeton
>From memory addIndexes() also does and optimization before hand, this might be what is taking the time. Mike www.ardentia.com the home of NetSearch -Original Message- From: heritrix.lucene [mailto:[EMAIL PROTECTED] Sent: 22 June 2006 05:05 To: java-user@lucene.apache.org Subject: Re: ad

RE: indexing emails

2006-06-19 Thread Mike Streeton
When you talk about indexing emails are you indexing Outlook mails? We have only found a few libraries that will do this and all require Outlook to be online at the time i.e. you cannot index PST files standalone. As far as indexing goes index each address in a separate un-tokenized field not spac

RE: spring & lucene

2006-06-06 Thread Mike Streeton
We wrote ours for NetSearch to handle this specific issue. I suggest you create a holder class to hold the IndexReader and IndexSearcher, this can close them in the finalizer. Clients keep the holder until they are finished and then discard it. When it is completely de-referenced it will be closed.

RE: Regarding Indexes

2006-04-03 Thread Mike Streeton
When doing this use a filter to restrict the query results to just those for a users company. This will not affect the ranking then. Mike www.ardentia.com the home of NetSearch -Original Message- From: Mufaddal Khumri [mailto:[EMAIL PROTECTED] Sent: 31 March 2006 20:33 To: java-user@luce

RE: Using Range Queries

2006-02-08 Thread Mike Streeton
You need to encode the numbers by padding to the left or another method, we do this we know what fields are numerics and extend QueryParser to encode the fields for searching. We also decode the number on display below is the functions we use, the tricky bit is getting negative numbers to work corr

RE: Search on Keyword rather than Text?

2006-02-08 Thread Mike Streeton
Override QueryParser and intercept queries of specific fields producing TermQuery instead of letting it be generated from the analyzed value using the default parser. If you want to look for "New Yo" try also creating a prefix query from the TermQuery. Mike www.ardentia.com the home of NetSearch

RE: grouping results by fields

2006-01-31 Thread Mike Streeton
When using the TermEnum method won't the terms be analyzed i.e. split in to single words and lowercase, will this be a problem if your grouping name is 2+ words mixed case etc? Mike www.ardentia.com the home of NetSearch -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTE

RE: grouping results by fields

2006-01-30 Thread Mike Streeton
A simple solution if you only have 20,000 docs is just to iterate through the hits and count them up against each color etc, this could be in a HitCollector. The balance here is performance vs memory usage, if you have a lot of users I would go for a solution that was less efficient but used a lot

RE: Searching over more than one Fields

2006-01-30 Thread Mike Streeton
There are a number of ways of doing this. One way I would suggest if simply to store the CONTENTS fields and prefix it with the field name. So instead of storing a single CONTENTS field for a document, store a CONTENTS field for each other field with the field name prefixing each field value. E.

RE: searching specific documents

2006-01-30 Thread Mike Streeton
Use BitSets to intersect the two queries. First knock up a HitCollector that generates a bit set for the document set you want to search (A,B,C,X,Y,Z). Then do another query generating a bit set for the criteria on (C,X,Y). Then just interest the two bits sets using the "and" method. Mike www.ard

Range number queries

2006-01-26 Thread Mike Streeton
For the recent questions about this here are a couple of methods for encoding/decoding long values that will be sorted into order by a range query public static String encodeLong(long num) { String hex = Long.toHexString(num < 0 ? Long.MAX_VALUE - (0xL ^ num) : num);

RE: Range queries

2006-01-25 Thread Mike Streeton
many Boolean queries or does not return any results at all. Mike -Original Message- From: Mike Streeton [mailto:[EMAIL PROTECTED] Sent: 25 January 2006 11:28 To: java-user@lucene.apache.org Subject: RE: Range queries I can recommend this method, this is how we do it, but what we store in

RE: Range queries

2006-01-25 Thread Mike Streeton
I can recommend this method, this is how we do it, but what we store in the index is the long converted to a 16 digit number hex. The extended parser converts entered queries containing longs field to have hex. We obviously also do the conversion before we display the value. Floating point numbers

Lucene Web Site

2006-01-24 Thread Mike Streeton
How do you go about getting our product listed on the Powered By Lucene web site (http://wiki.apache.org/jakarta-lucene/PoweredBy) and latest new in the Wiki. Many Thanks Mike www.ardentia.com

Switching default parsing for Or and AND

2006-01-03 Thread Mike Streeton
Is there a way of altering the way lucene parses a default string to use AND instead of OR, e.g. usually "joe bloggs" is executed as "joe OR bloggs", is there a flag to change this to "joe AND bloggs" which seems to be the way most search engines work. Thanks Mike

RE: Performance Question

2005-11-14 Thread Mike Streeton
Thanks for this, I did not really explain my self well in the original question, what I was interested to know is would a single Searcher constructed from a MultiReader (across several different indexes) work better than a MultiSearcher constructed from IndexSearchers each pointing at a single inde

Performance Question

2005-11-11 Thread Mike Streeton
I have several indexes I want to search together. What performs better a single searcher on a multi reader or a single multi searcher on multiple searchers (1 per index). Thanks Mike

Terms contain spaces

2005-10-27 Thread Mike Streeton
I have been given an index with a term that has been stored as a keyword and contains spaces. We are parsing a query using QueryParser but given 'myfield:"abc def"' it generates a PhraseQuery for myfield:abc and myfield:def. What is needed is a TermQuery(new Term(myfield,"abc def")). Can you tell q