Re: similar contrib in lucene 2.1.0

2007-03-02 Thread Martin Braun
Hi Hans, > > I'm in the process to upgrade from 2.0 to 2.1, but are missing the > similar contrib (the jar only contains a Manifest). is this a bug or > is that on purpose? Take a look in: lucene-2.1.0/contrib/queries/ this is the new home, it is explained in the changelog why the code moved...

autocomplete with multiple terms

2007-02-22 Thread Martin Braun
Hello All, I am implementing a query auto-complete function à la google. Right now I am using a TermEnum enumerator on a specific field and list the Terms found. That works good for Searches with only one Term, but when the user's typing two or three words the function will autocomplete each Term

spnafirstquery and multiple field instances

2006-12-21 Thread Martin Braun
hello, with a SpanFirstQuery I want to realize a "starts with" search - that seems to work fine. But I have the Problem that I have documents with multiple titles and I thought I can do a sfq-search for each tiltle by adding multiple instances for the specific field: fo

boosting instead of sorting WAS: to boost or not to boost

2006-12-21 Thread Martin Braun
Hi Daniel, >> so a doc from 1973 should get a boost of 1.1973 and a doc of 1975 should >> get a boost of 1.1975 . > > The boost is stored with a limited resolution. Try boosting one doc by 10, > the other one by 20 or something like that. You're right. I thought that with the float values the r

to boost or not to boost

2006-12-20 Thread Martin Braun
Hello all, I am trying to boost more recent Docs, i.e. Docs with a greater year Value like this: if (title.getEJ() != null) { titleDocument.setBoost(new Float("1." + title.getEJ())); } so a doc from 1973 should get a boost of 1.1973 and a do

Re: Index XML file

2006-12-14 Thread Martin Braun
Hi Wooi, >Just wondering is there anyone used Digester to extract xml content and > index the xml file? Is there any source that I can refer to on how to > extract the xml contents. Or is there any other xml parser is much easier to > use? Perhaps this article may help: http://www-128.ibm.com

Re: how to search string with words

2006-11-21 Thread Martin Braun
spinergywmy schrieb: > Hi Erick, > >I did take a look at the link that u provided me, and I have try myself > but I have no return reesult. > >My search string is "third party license readme" > hhm with a quick look I would suggest that you have to split the string into individual terms,

Search "C++" with Solrs WordDelimiterFilter

2006-11-17 Thread Martin Braun
hi all, I would like to implement the possibility to search for "C++" and "C#" - I found in the archive the hint to customize the appropriate *.jj file with the code in NutchAnalysis.jj: // irregular words | <#IRREGULAR_WORD: (|)> | <#C_PLUS_PLUS: ("C"|"c") "++" > | <#C_SHARP: ("C"|"c") "#"

Re: Best approach for exact Prefix Field Query

2006-11-16 Thread Martin Braun
hi Erik, > "action and" is likely not a single Term, so you'll want to create a > SpanNearQuery of those individual terms (that match the way they were > when analyzed and indexed, mind you) and use a SpanNearQuery inside a > SpanFirstQuery. Make sense? Yes, it works (see below)! ... but with my

Re: Best approach for exact Prefix Field Query

2006-11-14 Thread Martin Braun
t; in the title) but i get (correct) results for "action", What am I doing wrong here? tia, martin > > Erik > > > On Nov 14, 2006, at 8:32 AM, Martin Braun wrote: > >> hi, >> >> i would like to provide a exact "PrefixField Search&quo

Best approach for exact Prefix Field Query

2006-11-14 Thread Martin Braun
hi, i would like to provide a exact "PrefixField Search", i.e. a search for exactly the first words in a field. I think I can't use a PrefixQuery because it would find also substrings inside the field, e.g. action* would find titles like "Action and knowledge" but also (that's what i don't want it

Re: Update an existing index

2006-11-08 Thread Martin Braun
WATHELET Thomas schrieb: > how to update a field in lucene? > I think you'll have to delete the whole doc and add the doc with the new field to the index... hth, martin - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Re: experiences with lingpipe

2006-11-02 Thread Martin Braun
Hi Breck, i have tried your tutorial and built (hopefully) a successful SpellCheck.model File with 49M. My Lucene Index directory is 2,4G. When I try to read the Model with the readmodel function, i get an "Exception in thread "main" java.lang.OutOfMemoryError: Java heap space", though I started j

Re: experiences with lingpipe

2006-10-25 Thread Martin Braun
Hi Breck, thanks for your answer. >> >> With lucenes spellcheck contribution I am not really satisfied because >> the Index has some (many?) mispelled words, so the did you mean class >> (from the jave.net example) is good in finding similar mispelled words. >> With the similarWords Function the

experiences with lingpipe

2006-10-23 Thread Martin Braun
hi all, does anybody have practical experiences with Ling Pipes Spellchecker (http://www.alias-i.com/lingpipe/demos/tutorial/querySpellChecker/read-me.html)? With lucenes spellcheck contribution I am not really satisfied because the Index has some (many?) mispelled words, so the did you mean clas

[Fwd: Spam filter for lucene project]

2006-10-05 Thread Martin Braun
Hello Rajiv, perhaps captcha's will solve your problem: http://en.wikipedia.org/wiki/CAPTCHA many open-source PHP products are using this like phpmyfaq and phpBB. So you can take a look at this code. hth, martin Original-Nachricht Von: Rajiv Roopan <[EMAIL PROTECTED]> Betre

best way indexing user queries

2006-09-07 Thread Martin Braun
Hello, I would like to index the user submitted queries to a given index. As a result of this I want to provide something like: people who searched for test searched also with these queries: +title:test +author:somename. I think the simple approach of just adding the queries as a string in a docu

what do i get with FieldCache.DEFAULT.getStrings(...);

2006-08-25 Thread Martin Braun
hello, I am using FieldCache.DEFAULT.getStrings in combination with an own HitCollector (I loop through all results and count the number of occurences of a fieldvalue in the results). My Problem is that I have Filed values like dt.|lat or ger.|eng. an it seems that only the last token of the field

Re: Update index

2006-08-23 Thread Martin Braun
Hi Thomas, > Is it possible to update fields in an existing index. > If yes how to proceed. > I think you can only delete a document and then reindex the updated document: public static int delTitle(String ID) { try { return writer.deleteDocuments(new Term("ID",ID))

Re: search for web address

2006-08-18 Thread Martin Braun
hello ould, sid'ahmed schrieb: > Hello, > I indexed my document but, Can I search for an address web, it returns > me no result, > and when I search the same address with a query like "http*" it returns > me a result, It depends on which analyzer you use: the StandardAnalyzer will do this with

Re: Special characters

2006-08-10 Thread Martin Braun
Hello Adrian, >> I am indexing some text in a java object that is "%772B" with the >> standard analyser and Lucene 2. >> >> Should I be able to search for this with the same text as the query, or >> do I need to do any escaping of characters? Besides Luke there are the AnalyzerUtils from the LIA

Re: About the use of HitCollector

2006-08-07 Thread Martin Braun
hi andy, > How can I use HitCollector to iterate over every returned document? You have to override the function collect for the HitCollector class and then store the retrieved Data in an array or map. Here is just a source-code scratch (is = IndexSearcher) is.search(query, null

Re: dash-words

2006-08-01 Thread Martin Braun
Hi Yonik, >> So a Phrase search to "The xmen story" will fail. With a slop of 1 the >> doc will be found. >> >> But when generating the query I won't know when to use a slop. So adding >> slops isn't a nice solution. > > If you can't tolerate slop, this is a problem. I use the WordDelimiterFilte

Re: email libraries

2006-07-27 Thread Martin Braun
Hi John, > Just for the record - I've been using javamail POP and IMAP providers in > the past, and they were prone to hanging with some servers, and resource > intensive. I've been also using Outlook (proper, not Outlook Express - > this is AFAIK impossible to work with) via a Java-COM bridge suc

Re: dash-words

2006-07-25 Thread Martin Braun
Hi Yonik, >> I can't figure out what the parameters does. ;) > > Yes, it will fail without slop... I don't think there is a practical > way around that. I am trying to analyze your WordDelimiterFilter. If I have x-men, after analyzing (with catenateAll) I get this: Analzying "The x-men story

Re: dash-words

2006-07-24 Thread Martin Braun
Yonik Seeley schrieb: > On 7/23/06, karl wettin <[EMAIL PROTECTED]> wrote: >> I'm want to filter words with a dash in them. >> >> ["x-men"] >> ["xmen"] >> ["x", "men"] >> >> All of above should be synonyms. The problem is ["x", "men"] requiring a >> distance between the terms and thus also matching

Re: Special characher & ; : % index/search question

2006-07-24 Thread Martin Braun
hi herbert, >> WhitespaceAnalyzer looks brutal. Is it possible that I keep >> StandardAnalyzer and at the same time to tell the parser to keep a >> list of chars during indexing? Perhaps it would be sufficient to use the WhitespaceAnalyzer and keep StandardAnalyzer for the other fields by using a

drill-down heuristics WAS: Where to find drill-down examples (source code)

2006-07-24 Thread Martin Braun
hi miles, thanks for the response. I think I didn't explain my Problem good enough. The harder problem for me is how to get the proposals for the refinement? I have a date-range of 16xx to now, for about 4 bn. docs. So the number of found documents could be quite large. But the distribution of t

Where to find drill-down examples (source code)

2006-07-21 Thread Martin Braun
Hello all, I want to realize a drill-down Function aka "narrow search" aka "refine search". I want to have something like: Refine by Date: * 1990-2000 (30 Docs) * 2001-2003 (200 Docs) * 2004-2006 (10 Docs) But not only DateRanges but also for other Categories. What I have found in the List-Arc

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-21 Thread Martin Braun
he example doc which may produce tokens that do not match those of the > indexed content. Use setAnalyzer() to ensure they are in sync. > > > > > - Original Message > From: Martin Braun <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Frid

Re: Problem finding similar documents with MoreLikeThis method.

2006-07-21 Thread Martin Braun
Hello, inspired by this thread, I also tried to implement a MoreLikeThis search. But I have the same Problem of a null query. I did set the Fieldname to a Field that is stored in the Index. But "like" just returns null. Here is my Code: Hits hits = this.is.search(new Ter

Spellchecker Download at lucene wiki outdated

2006-06-30 Thread Martin Braun
Hi all, I don't know who can update the Wiki Pages so I am just mailing here. The download of spellchecker1.1.zip contribution does not work with Lucene-2.0 anymore. http://wiki.apache.org/jakarta-lucene/SpellChecker?highlight=spellchecker1.1.zip So I wanted to build _only_ the spellcheck-contri

Re: question

2006-06-29 Thread Martin Braun
[EMAIL PROTECTED] schrieb: > hi, > > my problem is that i am using mysql db in which one table is > present and i want index each row in the table and then search > > plz reply > > how this can be done? http://wiki.apache.org/jakarta-lucene/LuceneFAQ How can I use Lucene to index a database? Co

Re: Searching is taking a lot...

2006-06-27 Thread Martin Braun
Hi chris, > searching everytime using a new searcher was taking time. So For testing, i > made it a static one and reused the same. This gave me a lot of > improvement. > Previously my query was taking approx 25 sec. But now most of the queries > are taking time between the 100 and 800 ms. Do you

Re: Search within multiple different subfolders

2006-06-22 Thread Martin Braun
hi, > > I'm hardly the lucene expert, but I don't think you can search just a > portion of the index. But that's effectively what you're doing if you > restrict the search to "son and.". I think there is also the possibility to write a custom search filter (org.apache.lucene.search.Filter), an

Indexing Dash concatenated words vs SynonymAnalyzer

2006-06-20 Thread Martin Braun
Hello all, german words are often dash-concatenated, e.g. West-Berlin or something like "C*-algebras and W*-algebras". I tend to write my own analyzer like the SynonymAnalyzer from the LIA-Book. I want to Index these words like this: West-Berlin => Westberlin | West | Berlin | "West Berlin" C*-