Re: Looking for a way to customize how StandardAnalyzer handles punctuation

2008-12-10 Thread Grant Ingersoll
Let's take a quick step back and see if it helps. Why do you feel you need the StandardAnalyzer to solve your problem? What else are you gaining from it? Would you be better served by a WhitespaceTokenizer? That being said, hacking up the grammar isn't as bad as you might think. There a

Re: Taxonomy in Lucene

2008-12-10 Thread Chris Hostetter
: >From what I understand: : faceted browse is a taxonomy of depth =1 not inherently, that's just one of the most common mental models. The common approaches for dealing iwth faceted search in Lucene all work equally well when dealing with a taxonomy as long as your faceting code is *aware* o

Re: Taxonomy in Lucene

2008-12-10 Thread Glen Newton
Oops. Thanks! :-) 2008/12/10 Gary Moore <[EMAIL PROTECTED]>: > svn co https://bobo-browse.svn.sourceforge.net/svnroot/bobo-browse/trunk > bobo-browse > -Gary > Glen Newton wrote: >> >> I don't think this is an Open Source project: I couldn't find any >> source on the site and the only download i

Re: Re Lucene analyzers

2008-12-10 Thread Chris Hostetter
: public final TokenStream tokenStream(String fieldName, Reader reader) : : Usually does a bunch of new filters, from what I seen in most of these : filters none of the use class member variables. Has anybody tried making : them static to avoid the creation of new objects. that wouldn't really

Re: Taxonomy in Lucene

2008-12-10 Thread Gary Moore
svn co https://bobo-browse.svn.sourceforge.net/svnroot/bobo-browse/trunk bobo-browse -Gary Glen Newton wrote: I don't think this is an Open Source project: I couldn't find any source on the site and the only download is a jar with .class files... -glen 2008/12/10 John Wang <[EMAIL PROTECTED]

Re: Taxonomy in Lucene

2008-12-10 Thread Glen Newton
I don't think this is an Open Source project: I couldn't find any source on the site and the only download is a jar with .class files... -glen 2008/12/10 John Wang <[EMAIL PROTECTED]>: > www.browseengine.com > -John > > On Wed, Dec 10, 2008 at 10:55 AM, Glen Newton <[EMAIL PROTECTED]> wrote: > >>

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-10 Thread Tim Sturge
It's LUCENE-1487. Tim On 12/10/08 1:13 PM, "Tim Sturge" <[EMAIL PROTECTED]> wrote: > Yes (mostly). It turns those terms into an OpenBitSet on the term array. > Then it does a fastGet() in the next() and skipTo() loops to see if the term > for that document is in the set. > > The issue is that

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-10 Thread Tim Sturge
Yes (mostly). It turns those terms into an OpenBitSet on the term array. Then it does a fastGet() in the next() and skipTo() loops to see if the term for that document is in the set. The issue is that fastGet() is not as fast as the two inequalities in FCRF. I didn't directly benchmark FCTF agains

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-10 Thread Michael McCandless
It'd be great to get this into Lucene. Does FieldCacheTermsFilter let you specify a set of arbitrary terms to filter for, like TermsFilter in contrib/queries? And it's space/time efficient once FieldCache is populated? Mike Tim Sturge wrote: Mike, Mike, I have an implementation of Fie

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-10 Thread Tim Sturge
Mike, Mike, I have an implementation of FieldCacheTermsFilter (which uses field cache to filter for a predefined set of terms) around if either of you are interested. It is faster than materializing the filter roughly when the filter matches more than 1% of the documents. So it's not better for a

Re: Taxonomy in Lucene

2008-12-10 Thread John Wang
We are doing a release shortly which contains API change.Let us know if you need help. -John On Wed, Dec 10, 2008 at 11:27 AM, John Wang <[EMAIL PROTECTED]> wrote: > www.browseengine.com > -John > > > On Wed, Dec 10, 2008 at 10:55 AM, Glen Newton <[EMAIL PROTECTED]>wrote: > >> From what I unders

Re: Taxonomy in Lucene

2008-12-10 Thread John Wang
www.browseengine.com -John On Wed, Dec 10, 2008 at 10:55 AM, Glen Newton <[EMAIL PROTECTED]> wrote: > From what I understand: > faceted browse is a taxonomy of depth =1 > > A taxonomy in general has an arbitrary depth: > > Example: Biological taxonomy: > > Kingdom Animalia > Phylum Acanthocepha

Re: Taxonomy in Lucene

2008-12-10 Thread Glen Newton
>From what I understand: faceted browse is a taxonomy of depth =1 A taxonomy in general has an arbitrary depth: Example: Biological taxonomy: Kingdom Animalia Phylum Acanthocephala Class Archiacanthocephala Phylum Annelida Kingdom Fungi Phylum Ascomycota Class Ascomycetes

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-10 Thread Jason Rutherglen
Hi M.S., Do you think it would be cool to have some faceting built into Lucene at some point? -J On Tue, Dec 9, 2008 at 10:11 PM, Michael Stoppelman <[EMAIL PROTECTED]>wrote: > Yeah looks similar to what we've implemented for ourselves (although I > haven't looked at the implementation). We've

Re: Taxonomy in Lucene

2008-12-10 Thread Karsten F.
Hi Dipak, Which kind of "Taxonomy"? Where is the difference to "faceted browsing" in your case? best regards Karsten Kesarkar, Dipak wrote: > > Hi > > I want to include Taxonomy feature in my search. > > Does Lucene support Taxonomy? How? > > If not, is there in different way to add Tax

Re: Chinese Analyzer evaluation

2008-12-10 Thread Grant Ingersoll
Have you tried the Chinese options in the contrib/analysis JAR? I can't speak to their quality, so you will need to test. On Dec 9, 2008, at 10:02 PM, Cooper Geng wrote: I found these libraries from the google engine. But I have no experience on using these classes. Do you any suggestion o

Re: Lucene SpellChecker returns no suggetions after changing Server

2008-12-10 Thread Grant Ingersoll
So, what changed with the server? From the looks of your code, you're passing the same index into both the Spellchecker and the IndexReader. The spelling index is separate from the main index. See the example at: http://lucene.apache.org/java/2_4_0/api/contrib-spellchecker/org/apache/luc

Re: Taxonomy in Lucene

2008-12-10 Thread Niels Ott
Paul et al, Paul Libbrecht schrieb: The way we have solved this in intergeo (or... are about to) is to use query expansion, we explain that in: http://www.activemath.org/~paul/pubs/cross-curriculum-search.html more hints welcome. The Lucene in Action book provides an example solution for i

Re: Issue upgrading from lucene 2.3.2 to 2.4 (moving from bitset to docidset)

2008-12-10 Thread Michael McCandless
In your approach, roughly how many filters do you have cached? It seems like it could be quite a few (one for each color, one for each type, etc)? You might be able to modify the new (on Lucene trunk) FieldCacheRangeFilter to achieve this same filtering without actually having to mater

Re: Taxonomy in Lucene

2008-12-10 Thread Paul Libbrecht
Hello Dipak, I'm interested by an answer here if you find one. The way we have solved this in intergeo (or... are about to) is to use query expansion, we explain that in: http://www.activemath.org/~paul/pubs/cross-curriculum-search.html more hints welcome. paul Le 10-déc.-08 à 06:29, Ke