faceting and categorizing on color?

2006-06-16 Thread James Pine
Hey Everyone, I have been reading several threads about facet counts and category counts and was wondering if/how they might apply to searching for colors. Let's say that there is a Lucene index where each document corresponds to an image. In addition, each document contains the top 10 most freque

Re: faceting and categorizing on color?

2006-06-20 Thread James Pine
> First off, let me clear up somethign regarding your > index field structure, > you mentioned that you currently have documents that > look like this... > > : IMAGE 1 > : COLORS F0 FF FFF000 00 F0 FF > : E0 EE EEE000 00 > > If you are indexing it as Fie

Re: Can't open index

2006-06-22 Thread James Pine
Hey Thomas, It looks like your index file(s) are being stored on a remote file system. Is it possible that the network connection fails sometimes during your indexing/searching operation? If that's not the issue, you mention that you're creating your index file at the same time that you're search

Re: termquery beginners question

2006-06-26 Thread James Pine
Hey, I do not think mergeBooleanQueries is necessary. It sounds like what you want to do is this: Query userEntered = QueryParser.parse("foo bar"); Query otherQuery = new TermQuery(new Term("myfield","abc defg")); BooleanQuery completeQuery = new BooleanQuery(); completeQuery.add(userEntered,tru

Re: batch indexing using RAMDirectory

2006-06-28 Thread James Pine
Hey Eric, I think you want: fsWriter.addIndexes(Directory[] {ramDir}); to be: fsWriter.addIndexes(new Directory[]{ramDir}); JAMES --- zheng <[EMAIL PROTECTED]> wrote: > I am a novice in lucene. I write some code to do > batch indexing using > RAMDirectory according to the code provided in >

Re: Searching is taking a lot...

2006-06-28 Thread James Pine
A HitCollector object invokes its collect method on every document which matches the query/filter submitted to the Searcher.search method. I think all you would need to do is pass in the page number and results per page to your HitCollector constructor and then in the collect method do the bookeepi

Re: Searching is taking a lot...

2006-06-29 Thread James Pine
Hey, I'm not a performance guru, but it seems to me that if you've got millions of results coming back then you probably don't want to call ArrayList.add() each time, as it will have to grow itself a bunch of times. Also, even ints take up space in memory, so if you only need 20 of them, then stor

HitCollector and Sort Objects

2006-06-29 Thread James Pine
Hey, I've looked at the documentation for: org.apache.lucene.search.Searchable org.apache.lucene.search.Searcher org.apache.lucene.search.IndexSearcher and it struck me that there are no search methods with these signatures: void search(Query query, Filter filter, HitCollector results, Sort sor

BitSet in a HitCollector

2006-07-05 Thread James Pine
Hey Everyone, I'm using a HitCollector and would like to know the total number of results that matched a given query. Based on the JavaDoc, I this will do the trick: Searcher searcher = new IndexSearcher(indexReader); final BitSet bits = new BitSet(indexReader.maxDoc()); searcher.search(que

Re: BitSet in a HitCollector

2006-07-06 Thread James Pine
Hey, Sorry, I will explain a bit more about my collect method. Currently my collect method is executing IndexSearcher.doc(id) and storing some stuff in a Map which I can then retrieve from the HitCollector (much like the example in the Lucene In Action book). Of course that's somewhat expensive, s

RE: Managing a large archival (and constantly changing) database

2006-07-06 Thread James Pine
Hey, I found this thread to be very useful when deciding upon an indexing strategy. http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12700.html The system I work on has 3 million or so documents and it was (until a non-lucene performance issue came up) setup to add/delete new docum

RE: Managing a large archival (and constantly changing) database

2006-07-07 Thread James Pine
txt will contain "update"; however if the lucene internals do this: mkdir x echo original > x/x.txt cp -lr x x.copy --> rm x/x.txt echo update > x/x.txt diff x/x.txt x.copy/x.txt Then x/x.txt will have a different inode from x.copy/x.txt and thei

Re: General Approach: Analyzer versus Query

2006-07-10 Thread James Pine
Would Lucene's FuzzyQuery be useful in this case? I suppose it would depend on how meaningful the sequences of numbers are. http://lucene.apache.org/java/docs/api/org/apache/lucene/search/FuzzyQuery.html --- Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : I could (1) up front, put in both vers

SortComparatorSources and ScoreDocComparators

2006-07-11 Thread James Pine
Hey Everyone, I've had success in the past creating my own SortComparatorSources and ScoreDocComparators (basing my code on sec 6.1 from LIA); however, I'm starting to run into some performance issues with large indexes. When I started to probe deeper it seems that enumerating through the TermDocs