Re: Problems about using Lucene to generate tag cloud..

2008-04-04 Thread John Wang
check out http://www.browseengine.com tag cloud impl on lucene is avail. -John On Wed, Apr 2, 2008 at 4:12 PM, Daniel Noll <[EMAIL PROTECTED]> wrote: > On Thursday 03 April 2008 08:08:09 Dominique Béjean wrote: > > Hum, it looks like it is not true. > > Use a do-while loop make the first terms.t

document boost and omitted norms

2008-04-04 Thread Karl Wettin
Is it so that document and field boosts are omitted together with Field#setOmitNorms? By setting lengthNorm to 1f in the Similarity for these fields and not omitting norms would fix it? karl - To unsubscribe, e-mail: [EMA

Question About Hits

2008-04-04 Thread Matthew Hall
This is more of a trying to understand the design sort of question, but its still something I need to able to succinctly express to my project manager. I know that lucene is by design not allowing us to see which fields were hit for a given document in an easy manner. Instead it presents us w

Re: Ability to sort integer fields on large index

2008-04-04 Thread Mark Miller
You should use DateTools to break up your time stamp into multiple fields. This can work a lot faster than using a field with so many different terms. Are you using a RangeQuery? If you are, ditch it and use a ConstantScoreRangeQuery. The former will expand the query to a boolean that contains eac

Ability to sort integer fields on large index

2008-04-04 Thread Fleming Shi
Here is the problem: - Single Large index with upto 200 million documents - Each document contains field using epoch timestamp format (padding is required when creating range requests) - One of the frequently used search query, a range request on the timestamp field (10 digits) - Other searc

Re: Search emails - parsing mailbox (mbox) files

2008-04-04 Thread Grant Ingersoll
You might have a look at Aperture (http://aperture.sourceforge.net). It supports a fair number of mail sources including mbox and imap, I think. -Grant On Apr 4, 2008, at 1:52 PM, Antony Bowesman wrote: Subodh Damle wrote: Is there any reliable implementation for parsing email mailbox f

Re: Error tolerant text search with Lucene?

2008-04-04 Thread Mathieu Lecarme
Marjan Celikik a écrit : Mathieu Lecarme wrote: wever I don't fully understand what do you mean by "iterate over your query". I would like a conceptual answer how is this done with Lucene, not a technical one.. Your query is a tree, with BooleanQuery as branch and other query as leaf. If you

Re: unexpected query results (AND and OR)

2008-04-04 Thread Erick Erickson
I believe you must capitalize the AND. lower-case 'and' is ignored. you could also construct your own BoolenQuery if you wanted. I recommend copy of Luke to interactively examine how queries are parsed. Also, toString is your friend Best Erick On Fri, Apr 4, 2008 at 9:41 AM, Jamie <[EMAI

Re: unexpected query results (AND and OR)

2008-04-04 Thread Jamie
I would also like to point out that we also thought about using a filter but it is being used for other things. Jamie wrote: Hi there I need some help in understanding Lucene's query mechanism. I am receiving unexpected query results when combining terms with AND and OR operators. We are usi

unexpected query results (AND and OR)

2008-04-04 Thread Jamie
Hi there I need some help in understanding Lucene's query mechanism. I am receiving unexpected query results when combining terms with AND and OR operators. We are using Lucene to index emails. Our problem is that when we execute a search such as '(from:"[EMAIL PROTECTED]") and (to:"[EMAIL PR

Re: Error tolerant text search with Lucene?

2008-04-04 Thread Marjan Celikik
Mathieu Lecarme wrote: wever I don't fully understand what do you mean by "iterate over your query". I would like a conceptual answer how is this done with Lucene, not a technical one.. Your query is a tree, with BooleanQuery as branch and other query as leaf. If you wont to transforma query

Re: Error tolerant text search with Lucene?

2008-04-04 Thread Mathieu Lecarme
Marjan Celikik a écrit : Mathieu Lecarme wrote: You have to iterate over your query, if it's a BooleanQuery, keep it, if it's a TermQuery, replace it with a BooleanQuery with all variants of the Term with Occur.SHOULD M. Thanks.. however I don't fully understand what do you mean by "iterat

Re: Error tolerant text search with Lucene?

2008-04-04 Thread Marjan Celikik
Mathieu Lecarme wrote: You have to iterate over your query, if it's a BooleanQuery, keep it, if it's a TermQuery, replace it with a BooleanQuery with all variants of the Term with Occur.SHOULD M. Thanks.. however I don't fully understand what do you mean by "iterate over your query". I wou

Re: Search emails - parsing mailbox (mbox) files

2008-04-04 Thread Antony Bowesman
Subodh Damle wrote: Is there any reliable implementation for parsing email mailbox files (mbox format), especially large (>50MB) archives ? Even after searching lucene mailing list archives, googling around, I couldn't find one. I took a look at Apache James project which seems to offer some supp

Re: Error tolerant text search with Lucene?

2008-04-04 Thread Mathieu Lecarme
Marjan Celikik a écrit : Hi everyone, I know that there are packages that support the "Did you mean ... ?" search features with lucene which tries to find the most suited correct-word query.. however, so far I haven't encountered the opposite search feature: given a correct query, find all docum

Re: Lucene Proximity Searches

2008-04-04 Thread Ana Rabade
I am using ngrams and I need to force that a group of them are together, but if any of them fails, I need that the document is also scored. Perhaps you could help me to find the solution or give me a reference of which changes I must do. I am using SpanNearQuery, because the ngrams must be in order

Re: Lucene 2.3.0 and NFS

2008-04-04 Thread Michael McCandless
Rajesh parab wrote: Hi, We are currently using Lucene 2.0 for full-text searches within our enterprise application, which can be deployed in clustered environment. We generate Lucene index for data stored inside relational database. As Lucene 2.0 did not have solid NFS support and as we wanted

Re: PhraseQuery little bug?

2008-04-04 Thread Ivan Vasilev
In Lucene sytax in htis case ~5 means slop=5 - this is for Span queries. I think the problem is that in the class PhraseQuery the slop that we set some times is interpreted as inclusive other times exclusive. When it is considered inclusive then the distance between "apple" and "pear" is 5, bec