Re: Q: Highlighter + Search symbols "*, ?, ~"

2006-11-21 Thread Stephan Spat
Hey Jeff! Storey, Jeff schrieb: Could you explain what you did for your solution? This is a problem I'm currently facing as well. But, for example, if the user searches for "head~" would you also be able to highlight "read" and "dead" if they are returned or just "head" without the ~. It i

Re: Multi-Index Spellchecker

2006-11-21 Thread Mark Miller
Thanks Hoss, I hadn't looked at the indexDictionary method yet. It does not appear to be what I am looking for though...I should have been more explicit - I am using the spellchecker for a 'did you mean search', so I am not using a dedicated spell check index. Instead I am passing the index tha

Multi-Index Spellchecker

2006-11-21 Thread Mark Miller
Does anyone have any interested in making the spellchecker work across more than one index? Does the coder of the spellchecker have any advice/dont do that moron info etc ? - Mark

Fw: Urgent : Specific search problem with whitespace analyzer

2006-11-21 Thread Krishnendra Nandi
Hi, I am doing "field:text" kind of search using my own analyzer which behaves like whitespaceanalyzer. Following are the code snippets for my own whitespaceanalyzer and whitespacetokenizer. // WhiteSpaceAnalyzerMaestro.java package com.hewitt.itk.maestro.support.service.simplesearch; import

Re: Q: Wildcard searching with germ an umlauts (ä, ö, ß, ...)

2006-11-21 Thread Antony Bowesman
Stephan Spat wrote: Hello again! It replaces german umlauts, e.g. ä <=> a, ü <=> u, ... . So no umlauts are in the index. For searching I use the same Analyzer. When I do a simple search for a word with umlauts there is no problem. But if I use addidionally wildcards I suppose the word is not

Re: Ordered Proximity searching, does it exist?

2006-11-21 Thread Erik Hatcher
On Nov 20, 2006, at 11:20 PM, Adam wrote: Dear Lucene Users, Is there a way or has someone been able to implement an ordered proximity search. Lucene currently uses the "word1 word2"~5 query to find tokens that are within 5 words of each other in any order. What I've been asked to do is

Ordered Proximity searching, does it exist?

2006-11-21 Thread Adam
Dear Lucene Users, Is there a way or has someone been able to implement an ordered proximity search. Lucene currently uses the "word1 word2"~5 query to find tokens that are within 5 words of each other in any order. What I've been asked to do is find only the results that are for instance with

Re: is there any way to find unique records ?

2006-11-21 Thread Bhavin Pandya
Hi Erick, If your asking for a list of all the unique values for a particular field, see TermDocs and/or TermEnum which will allow you to look at, say, all the values stored for some field. A trick here is to seek (new Term("field", ""));. By putting nothing in the value, you effectively enumera

Re: Limiting QueryParser

2006-11-21 Thread Antony Bowesman
Mark Miller wrote: if you scan the query and escape all colons (ie \:) then you should be good (I have not verified). Of course you will not be able to do a field search, but that seems to be what your after. Thanks for that suggestion. However, a standard un-escaped parse gives Input - impo

Re: Combining scores

2006-11-21 Thread José Ramón Pérez Agüera
i've some code to do that, but it is not really friendly yet :-( Anyway is quite simple. You need merge the postings that you obtain for the differents queries using TermDocs. With TermDocs you obtain the internal ids for the docs related to terms. If you merge the TermDocs for each word that a

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-21 Thread Stanislav Jordanov
Switch to the old scorer (via BooleanQuery.setUseScorer14(true) ) solved the performance issue - now Lucene 1.9.1 & 2.0.0 perform on the same load test just as 1.4.3 does Thanks a lot Yonik! Any chance there exists a non-professional explanation what's the difference between old and new boole

Re: Fw: Urgent : Specific search problem with whitespace analyzer

2006-11-21 Thread Chris Hostetter
: I have modified the tokenizer class by making it return characters in : lower case. there is really no reason to do this ... have your analyzer use the WhitespaceTokenizer, wrapped in a LowerCaseFilter ... that will illiminate some of your custom code, and perhaps some of your problems as well.

Delete contents from index

2006-11-21 Thread spinergywmy
Hi, How can I delete the contents from Index file? Is there any example that I can refer to? Thanks. regards, Wooi Meng -- View this message in context: http://www.nabble.com/Delete-contents-from-index-tf2668566.html#a7441161 Sent from the Lucene - Java Users mailing list archive at Na

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-21 Thread Yonik Seeley
On 11/21/06, Stanislav Jordanov <[EMAIL PROTECTED]> wrote: We've identified a significant querying performance decrease after switching from Lucene 1.4.3 to 1.9.1. It is steadily demonstrated no mater if the concurrent querying threads are 1, 2, 4 or 8 (or even more) - If N queries are executed a

Re: Implementing scoring in Lucene

2006-11-21 Thread Chris Hostetter
: Is there a step by step guide on how to implement the scoring function : for Apache Lucene? : The help given on the website is not easy to follow. : : How do I integrate the search function into my website? First off, what help did you look at? ... did you start with the tutorial? http://luce

Combining scores

2006-11-21 Thread Luis Rodrigo Aguado
Hi all, I am working in a project that, for each query from the user, builds four or five different queries and tries to combine the results. The first part is already working, but, as I have read that the scores from different queries are not comparable at all among them, I am a bit stuck in

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-21 Thread Chris Hostetter
: Could you also try a nightly build to test the later performance improvement : on BooleanScorer2? The nightly builds are here: : http://people.apache.org/builds/lucene/java/nightly/ : The jar is called lucene-core-nightly.jar in the .tar.gz build. : : It's not likely that this is faster than th

Re: RAMDirectory vs MemoryIndex

2006-11-21 Thread jm
Ok, thanks, I'll give MemoryIndex a go, and if that is not good enoguh I will explore the other options then. On 11/21/06, Wolfgang Hoschek <[EMAIL PROTECTED]> wrote: On Nov 21, 2006, at 7:43 AM, jm wrote: > Hi, > > I have to decide between using a RAMDirectory and MemoryIndex, but > not sure

Re: is there any way to find unique records ?

2006-11-21 Thread Chris Hostetter
serach the archives for "faceted searching" and "category counts" and you should find lots of discussions on this topic. : Date: Tue, 21 Nov 2006 20:30:22 +0530 : From: Bhavin Pandya <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org, Bhavin Pandya <[EMAIL PROTECTED]> : To: java-user@luc

Re: RAMDirectory vs MemoryIndex

2006-11-21 Thread Wolfgang Hoschek
On Nov 21, 2006, at 7:43 AM, jm wrote: Hi, I have to decide between using a RAMDirectory and MemoryIndex, but not sure what approach will work better... I have to run many items (tens of thousands) against some queries (100 at most), but I have to do it one item at a time. And I already have

Federated search (lucene custom and nutch)?

2006-11-21 Thread spamsucks
Hi Everybody, I am successfully using lucene to index/display results for a hugely successful tourism site... We even for nearby's of attractions of different categories. Love it. The next step is to start indexing all the "legacy" content, which numbers around 3000 or so JSP's that will n

Re: Limiting QueryParser

2006-11-21 Thread Antony Bowesman
Chris Hostetter wrote: : important:conference agenda : I want to end up with : : +subject:important +subject:conference +subject:agenda : : I've written something to do this, but I know it is not as clever as QP as : currently it can only create BooleanQueries with TermQueries and cannot handle

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-21 Thread Paul Elschot
Stanislav, Could you also try a nightly build to test the later performance improvement on BooleanScorer2? The nightly builds are here: http://people.apache.org/builds/lucene/java/nightly/ The jar is called lucene-core-nightly.jar in the .tar.gz build. It's not likely that this is faster than th

Re: how to search string with words

2006-11-21 Thread spinergywmy
Hi Erick, I did take a look at the link that u provided me, and I have try myself but I have no return reesult. My search string is "third party license readme" Below r the codes that I wrote, please point me out where I have done wrong. readerA = IndexReader.open(DsConstant.ind

Re: RAMDirectory vs MemoryIndex

2006-11-21 Thread karl wettin
21 nov 2006 kl. 16.43 skrev jm: Any thoughts? You can also try InstantiatedIndex, similair in speed and design with a MemoryIndex, but can handle multiple documents, IndexReader, IndexWriter, IndexModifier et.c. just like any Directory implementation. It requires a minor patch to the Lu

Re: Use case for term vector's token position/offset?

2006-11-21 Thread Grant Ingersoll
Hi Jong, I think these are useful for things like highlighting (I think contrib/highlighter can use them); other post processing algorithms such as: question answering, calculating co-occurrences (find the 6 terms to the left and right of the term at position 16). Perhaps you want to giv

Sorting on distance from a long/lat

2006-11-21 Thread spamsucks
I am successfully able to search for "nearbys" given a longitude and a latitude. The basic summary of how I do this is that I add 1000 to the long/lat values and use a RangeFilter in my query. In my display results, I display the results ordered by distance from the original long/lat. What I

Re: how to search string with words

2006-11-21 Thread spinergywmy
Hi, Thanks Martin. I have one question, what does that slop does within span near query? What is the difference between 0 and 1? I have seen the source from Lucene, one of the example putting slop as 4. Could u pls explain that to me. Thanks. regards, Wooi Meng -- View this message in contex

Re: NOT queries

2006-11-21 Thread Antony Bowesman
Daniel Naber wrote: That's correct. For the "find everything" part you can use MatchAllDocsQuery. Thanks - I hadn't noticed that Query. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMA

Re: Combining scores

2006-11-21 Thread Erick Erickson
This is a *really* simplistic approach, but why not just submit all 4 or 5 queries at once ina BooleanQuery and let Lucene do all the work for you? Or are the 4 or 5 queries such that they don't combine easily with MUST, MUST_NOT or SHOULD in a BooleanQuery? Best Erick On 11/21/06, Luis Rodrigo

Querying performance decrease in 1.9.1 and 2.0.0

2006-11-21 Thread Stanislav Jordanov
Hi guys, We've identified a significant querying performance decrease after switching from Lucene 1.4.3 to 1.9.1. It is steadily demonstrated no mater if the concurrent querying threads are 1, 2, 4 or 8 (or even more) - If N queries are executed against 1.9.1 for a given time, then 1.4.3 execu

Re: is there any way to find unique records ?

2006-11-21 Thread Erick Erickson
I don't think I understand what "only unique records from a single field" means. If it's a unique value in a filed, there'll only be one document in the hits object and there's no cost to iterating, so I doubt that's what you mean. If your asking for a list of all the unique values for a particu

Re: NOT queries

2006-11-21 Thread Daniel Naber
On Tuesday 21 November 2006 23:14, Antony Bowesman wrote: > I > assume that you first have to create a BooleanClause that finds > everything and then another Clause that removes the "attribute". > > Is this right or is there another way to do it? That's correct. For the "find everything" part yo

Re: is there any n-gram analyzer available??

2006-11-21 Thread Bob Carpenter
heritrix.lucene wrote: Thanks for your reply. This analyzer creates combination of words. I am looking for analyzer where you can break up the words into their n-grams. For example: 2-grams of google - > go, oo, og, gl, le like that. This is also easy. You can check out our sample in Gospoden

Re: how to search string with words

2006-11-21 Thread Martin Braun
spinergywmy schrieb: > Hi Erick, > >I did take a look at the link that u provided me, and I have try myself > but I have no return reesult. > >My search string is "third party license readme" > hhm with a quick look I would suggest that you have to split the string into individual terms,

RAMDirectory vs MemoryIndex

2006-11-21 Thread jm
Hi, I have to decide between using a RAMDirectory and MemoryIndex, but not sure what approach will work better... I have to run many items (tens of thousands) against some queries (100 at most), but I have to do it one item at a time. And I already have the lucene Document associated with each

Re: does anyone know of a 'smart' categorizing text pattern finder?

2006-11-21 Thread Erik Hatcher
On Nov 21, 2006, at 5:46 PM, Bob Carpenter wrote: LingPipe in Action. Now that's a book I'd love to own! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Sorting on distance from a long/lat

2006-11-21 Thread Dennis Watson
Hi, I apologize if this is slightly off topic. I have not implemented this, but the idea came to me after reading another post about measuring distance in lucene. It may be completely impractical, however it seems it COULD work at least if the area to be indexed could be constrained. What if

Re: How to do a "starts with" search

2006-11-21 Thread Antony Bowesman
Martin Braun wrote: Please refer to the answers to my question on this list: http://www.nabble.com/forum/ViewPost.jtp?post=7337585&framed=y Shortly spoken: SpanFirstQuery works like a charm :) Thanks Martin, that looks just right. I'll try it. Antony --

Re: Fwd: Hibernate Lucene trademark issues

2006-11-21 Thread adasal
Thanks for link and your write up. On 19/11/06, Shay Banon <[EMAIL PROTECTED]> wrote: Since I do not want to invade Lucene user list regarding a discussion about Compass and Hiberante Search, but I still think that it is something that needs to get answered, here is a link to my blog post disc

Re: Limiting QueryParser

2006-11-21 Thread Mark Miller
Keep in mind that this would not work for important:"conference agenda" as the quotes would be escaped and queryparser will not generate a phrase query - Mark Steven Rowe wrote: static String QueryParser.escape(String) should do the trick:

Re: does anyone know of a 'smart' categorizing text pattern finder?

2006-11-21 Thread Bob Carpenter
Vladimir Olenin wrote: Hi, I wonder if anyone here knows if there is a 'smart' text pattern finder, ideally written in Java. The library I'm looking for should be able to 'guess' the category of the particular text on the page, most probably by finding similarities between the bulk of the pag

Re: is there any way to find unique records ?

2006-11-21 Thread Erick Erickson
Ok, I think I get it now. You're right that you probably don't want to iterate the Hits object since that has performance issues once you get beyond 100 docs or so. Although, I don't know how big your result sets are. If they are guaranteed to be small, this may not matter. I'm guessing you want

Re: how to search string with words

2006-11-21 Thread spinergywmy
Hi guys, I have this problem searching all fields (metadata) using SpanFirstQuery. My scenario is if I just searching on one thing using SpanFirstQuery is not a problem. However, if I would have to search everything than I will not have any result return. For example, I search ba

Re: Analyzers and multiple languages (language detection)

2006-11-21 Thread Bob Carpenter
Antony Bowesman wrote: Hello, I'm new to Lucene and wanted some advice on analyzers, stemmers and language analysis. I've got LIA, so have read it's chapters. I am writing a framework that needs to be able to index documents from a range of languages where just the character set of the docu

Limiting QueryParser

2006-11-21 Thread Antony Bowesman
Hi, I have a search UI that allows search criteria to be input against specific fields, e.g. Subject. In order to create a suitable Lucene Query, I must analyze that String so that it becomes a set of Tokens which I can then turn into Terms. QueryParser seems to fit the bill for that, howev

Re: Analysis/tokenization of compound words (German, Chinese, etc.)

2006-11-21 Thread Bob Carpenter
eks dev wrote: Depends what yo need to do with it, if you need this to be only used as "kind of stemming" for searching documents, solution is not all that complex. If you need linguisticly correct splitting than it gets complicated. This is a very good point. Stemming for high recall is mu

Re: Sorting on distance from a long/lat

2006-11-21 Thread Dennis Watson
It is similar to the two Range Filter approach except my way is precomputed and probably faster than filtering through a potentially large result set. Also I can quickly compute a rough max distance between two any lat, lon pairs by compairing thier X1.X2.X3... path. Dennis Watson Sr SW Engin

Re: NOT queries

2006-11-21 Thread Daniel Noll
Antony Bowesman wrote: Hi, I'm writing a mapping mechanism between an existing search interface and Lucene and wondered how to support a single NOT/- query. Given the query "-attribute", then from an ealier comment by Chris Hostetter where he says "you can't have a negative clause in isolati

is there any way to find unique records ?

2006-11-21 Thread Bhavin Pandya
Hi, In lucene, is there any way to find only unique records from a single field ..? otherwise unnecessary i have to itereate through Hits and find out unique... plz help.. - Bhavin pandya

Re: Limiting QueryParser

2006-11-21 Thread Chris Hostetter
: important:conference agenda : I want to end up with : : +subject:important +subject:conference +subject:agenda : : I've written something to do this, but I know it is not as clever as QP as : currently it can only create BooleanQueries with TermQueries and cannot handle : PhraseQuery, so would

Re: Limiting QueryParser

2006-11-21 Thread Steven Rowe
static String QueryParser.escape(String) should do the trick: Look at the bottom of the below-linked page for the list of characters that the above method will escape:

RE: Q: Highlighter + Search symbols "*, ?, ~"

2006-11-21 Thread Storey, Jeff
Thanks for the quick reply. I'll be implementing this in the next couple of days. Appreciate it! Jeff -Original Message- From: Stephan Spat [mailto:[EMAIL PROTECTED] Sent: Monday, November 20, 2006 8:43 AM To: java-user@lucene.apache.org Subject: Re: Q: Highlighter + Search symbols "*, ?

NOT queries

2006-11-21 Thread Antony Bowesman
Hi, I'm writing a mapping mechanism between an existing search interface and Lucene and wondered how to support a single NOT/- query. Given the query "-attribute", then from an ealier comment by Chris Hostetter where he says "you can't have a negative clause in isolation by itself", I assume

Re: Sorting on distance from a long/lat

2006-11-21 Thread Chris Hostetter
I'm not really sure what an approach like this gaines you ... it provides a mechanism for ensuring that the lat/lon of all results are within a bounding box arround your start location -- but those bounding boxes are fixed when building your index. couldn't you achieve the same thing using a "lat

Re: is there any way to find unique records ?

2006-11-21 Thread Steven Rowe
Bhavin, Mark Harwood gives a solution that looks almost exactly like what you want: http://www.mail-archive.com/java-user@lucene.apache.org/msg05154.html Steve Chris Hostetter wrote: > serach the archives for "faceted searching" and "category counts" and you > should find lots of discussions

Re: a "fair" similarity

2006-11-21 Thread Bob Carpenter
Michael D. Curtin wrote: Daniel Naber wrote: Hi, as some of you may have noticed, Lucene prefers shorter documents over longer ones, i.e. shorter documents get a higher ranking, even if the ratio "matched terms / total terms in document" is the same. There's even more interesting kinds of

Re: RAMDirectory vs MemoryIndex

2006-11-21 Thread Wolfgang Hoschek
On Nov 21, 2006, at 12:38 PM, jm wrote: Ok, thanks, I'll give MemoryIndex a go, and if that is not good enoguh I will explore the other options then. To get started you can use something like this: for each document D: MemoryIndex index = createMemoryIndex(D, ...) for each query Q: