Re: Hebrew and Hindi analyzers

2009-02-17 Thread Robert Muir
hey i've played around with trying to get towards a reasonable gpl hebrew analyzer for lucene but don't have anything yet... just messing during my spare time. in general it wasnt hard to munge the hspell perl scripts with some java code into producing a morphological analyzer but from what I see

Re: Unique Filter on search results

2009-02-17 Thread selvaa
It wont fit it to my requirement . then I need to maintain different Indexer,Searcher .It will bring mess up in my Architecture ... 黄成 wrote: > > Does it make sense to add another index only included UserName,Web Page > Name > and other statistic fields? > > On Tue, Feb 17, 2009 at 2:33 P

Hebrew and Hindi analyzers

2009-02-17 Thread Zhang, Lisheng
Hi, Are there free Hebrew and Hindi language analyzers for lucene? I searched archive and found some discussions, but did not see clear pointers to downloadable classes. Thanks very much for helps, Lisheng - To unsubscribe, e-ma

Re: Querying for a catagory

2009-02-17 Thread Erick Erickson
OK, I think I'm getting it, but I'm slow sometimes. The first thing I'd try is to make sure you index the user with each document. Then in you HitCollector.collect, use FieldSelector to load ONLY the user ID from each document and add the score for that doc to that user (you'll have to keep some s

Re: Querying for a catagory

2009-02-17 Thread AmigoProgrammer
I previous posts I have used document for both a file (e.g. Word or Pdf) and a Lucene document. Let me try again: A client can have many files but a file only has one client. For some queries I am not interested in the individual files that match the query, but rather in the sum of the score for

Identify the fields with matching only

2009-02-17 Thread Haroldo Nascimento
Hi, I did a search and need to identify in which fields had occurred matching. Is possible ? Thanks _ Cansado de espaço para só 50 fotos? Conheça o Spaces, o site de relacionamentos com até 6,000 fotos! http://www.amigosdomes

Re: newbie seeking explanation of semantics of "Field" class

2009-02-17 Thread Erick Erickson
This confused me on my first encounter, but it all makes sense after a while The first thing to understand is that Store and Index are orthogonal.That is, when you index a field that data is placed in the inverted index and is searchable, whether or not you store it. But it is not retrievable

RE: newbie seeking explanation of semantics of "Field" class

2009-02-17 Thread Uwe Schindler
Hi Paul, > I have copied some code and it is working for me, but I am a little > uncertain how to decide what value of Field.Index and Field.Store to > choose in order to get the behavior I'd like. If I read the javadocs, and > decide to ignore all the "expert" items, it looks like this: > > Fiel

Re: newbie seeking explanation of semantics of "Field" class

2009-02-17 Thread Matthew Hall
Comments inline: rolaren...@earthlink.net wrote: R2.4 I have been looking through the soon-to-be-superseded (by its 2nd ed.) book "Lucene In Action" (hope it's ok on this newsgroup to say I like that book); also at these two tutorials: http://darksleep.com/lucene/ and http://www.informit.com/ar

newbie seeking explanation of semantics of "Field" class

2009-02-17 Thread rolarenfan
R2.4 I have been looking through the soon-to-be-superseded (by its 2nd ed.) book "Lucene In Action" (hope it's ok on this newsgroup to say I like that book); also at these two tutorials: http://darksleep.com/lucene/ and http://www.informit.com/articles/article.aspx?p=461633&seqNum=3 and also a

Re: Faceted search with OpenBitSet/SortedVIntList

2009-02-17 Thread Paul Elschot
On Tuesday 17 February 2009 10:12:12 Raffaella Ventaglio wrote: > Thanks for sharing this info. > In any case, this is not a problem for me since I have used only the "idea" > to choose between OpenBitSet and SortedViIntList from contrib BooleanFilter, > but I have then implemented it in my own fac

Re: Unique Filter on search results

2009-02-17 Thread 黄成
Does it make sense to add another index only included UserName,Web Page Name and other statistic fields? On Tue, Feb 17, 2009 at 2:33 PM, selvaa wrote: > > Hi, > I am creating a tracker for web applications. I am indexing all the > user credentials while they are logging . > The

Re: distinct queries for search and scoring

2009-02-17 Thread Michael McCandless
Is your scoring query also doing some filtering? If so, you could drive the search with your scoring query, and then pass in as a filter your second query wrapped with QueryWrapperFilter. I think that's effectively your last option, which should be the most efficient one. Or, if the scoring que

Re: Querying for a catagory

2009-02-17 Thread 黄成
Sort is helpful. Maybe you should change you index structure if you think you need a group by. On Tue, Feb 17, 2009 at 9:30 PM, Erick Erickson wrote: > Well, I can imagine several schemes, how suitable they are depends > upon some as yet unspecified characteristics of your problem space. > > You

Re: Querying for a catagory

2009-02-17 Thread Erick Erickson
Well, I can imagine several schemes, how suitable they are depends upon some as yet unspecified characteristics of your problem space. You don't want to iterate blindly over the responses in a HitCollector.collect method unless your index is quite small (see the API docs for an explanation). If

Re: termDocs / termEnums performance increase for 2.4.0

2009-02-17 Thread Michael McCandless
It's interesting that you found this speedup... I'm not sure offhand what changes led to the speedup (but I'm still happy about it!). But... why do you need to iterate through all terms, and all docs for each term, in the first place? EG this is what FieldCache does in order to populate v

Re: optimization problem

2009-02-17 Thread Michael McCandless
It's odd that optimize is creating such tiny segments, and then that these tiny segments wind up consuming so much disk space. Can you turn on IndexWriter's infoStream, and post the output of the attempts to optimize? Are you sure there are no unhandled exceptions being logged to the sy

Re: Querying for a catagory

2009-02-17 Thread AmigoProgrammer
A relevant client is one that is related to one or more documents found by a search. I would store client as a keyword with a document and I would like the query to return clients with the sum of relevant documents score. A client with many low scoring documents could be as relevant as a client

distinct queries for search and scoring

2009-02-17 Thread Morus Walter
Hallo, I'm currently thinking about what the best solution would be for the following request: - a lucene index should be queried for a number of search criteria - the score for each result should not be the normal query score, but an indicator on the similarity between the matched document and

Re: Faceted search with OpenBitSet/SortedVIntList

2009-02-17 Thread Raffaella Ventaglio
Thanks for sharing this info. In any case, this is not a problem for me since I have used only the "idea" to choose between OpenBitSet and SortedViIntList from contrib BooleanFilter, but I have then implemented it in my own facets manager structure, so I do not use the "removed" finalResult method.