Re: german analyers xes me

2009-05-13 Thread Daniel Naber
On Tuesday 12 May 2009, Timon Roth wrote: > the queryparser is feeded with the germananalyzer and translates the > phrase to "offentlich finanx abgaberech". Have you checked the FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71 ? If that doesn't he

Re: changes

2008-11-09 Thread Daniel Naber
On Freitag, 7. November 2008, ChadDavis wrote: >  For > example, Field.Keyword() is gone.  Shouldn't I find this in that change > log? This was removed between 1.9 and 2.0. The plan was that users upgrade to 1.9, fix the deprecation warnings and only then go to 2.0. Thus no every method is ment

Re: Strange behaviour of FrenchAnalyzer when using accents

2008-11-08 Thread Daniel Naber
On Samstag, 8. November 2008, lamino wrote: >         String q = "secrétaire"; Does it help if you escape it like this: "secr\u00e9taire"? The java compiler might interpret non-ASCII chars differently, depending on the environment it runs in. Regards Daniel -- http://www.danielnaber.de ---

Re: Please help to interpret Lucene Boost results

2008-09-26 Thread Daniel Naber
On Freitag, 26. September 2008, student_t wrote: > A. query1 = +(content:(Pepsi)) I guess this is the string input you use for your queries, isn't it? It's more helpful to look at the toString() output of the parsed query to see how Lucene interpreted your input. Regards Daniel -- http://ww

Re: Lucene debug logging?

2008-09-04 Thread Daniel Naber
On Donnerstag, 4. September 2008, Justin Grunau wrote: > Is there a way to turn on debug logging / trace logging for Lucene? You can use IndexWriter's setInfoStream(). Besides that, Lucene doesn't do any logging AFAIK. Are you experiencing any problems that you want to diagnose with debugging?

Re: Case Sensitivity

2008-08-27 Thread Daniel Naber
On Mittwoch, 27. August 2008, Michael McCandless wrote: > Probably we should rename it to Field.Index.UN_TOKENiZED_NO_NORMS? I think it's enough if the api doc explains it, no need to rename it. What's more confusing is that (UN_)TOKENIZED should actually be called (UN_)ANALYZED IMHO. Regards

Re: MultiPhrase search

2008-08-26 Thread Daniel Naber
On Dienstag, 26. August 2008, Andre Rubin wrote: > Now I was the one who didn't follow: How do I add a query to an existing > query? Something like this should work: BooleanQuery bq = new BooleanQuery(); PrefixQuery pq = new PrefixQuery(...); bq.add(pq, BooleanClause.Occur.MUST); TermQuery tq =

Re: Combining Wildcard and Term Queries?

2008-08-26 Thread Daniel Naber
On Dienstag, 26. August 2008, Chris Bamford wrote: > That sounds like what I'm after - but how do I get hold of the > IndexReader so I can call IndexReader.terms(Term) ? > The code where I am doing this work is getFieldQuery(String field, > String queryText) of my custom query parser ... QueryPar

Re: MultiPhrase search

2008-08-26 Thread Daniel Naber
On Dienstag, 26. August 2008, Andre Rubin wrote: > I just have one more use case. I want the same prefix search as before, > plus another match in another field. Not sure if I'm following you, but you can create your own BooleanQuery programmatically, and then add the original PrefixQuery and an

Re: Combining Wildcard and Term Queries?

2008-08-26 Thread Daniel Naber
On Dienstag, 26. August 2008, Chris Bamford wrote: > Can you combine these two queries somehow so that they behave like a > PhraseQuery? You can use MultiPhraseQuery, see http://lucene.apache.org/java/2_3_2/api/core/org/apache/lucene/search/MultiPhraseQuery.html Regards Daniel -- http://www.d

Re: MultiPhrase search

2008-08-26 Thread Daniel Naber
On Dienstag, 26. August 2008, Andre Rubin wrote: > For some reason, the TermQuery is not returning any results, even when > querying for a single word (like on*). Sorry, I meant PrefixQuery. Also, do not add the "*" to the search string when creating the PrefixQuery. Regards Daniel -- http:/

Re: MultiPhrase search

2008-08-25 Thread Daniel Naber
On Montag, 25. August 2008, Andre Rubin wrote: > I tried it out but with no luck (I think I did it wrong). In any > case, is MultiPhraseQuery what I'm looking for? If it is, how should I > use the MultiPhraseQuery class? No, you won't need it. If you know that the field is not really tokenized

Re: Unique list of keywords

2008-08-08 Thread Daniel Naber
On Freitag, 8. August 2008, Martin vWysiecki wrote: > i have very much data, about 20GB of text, and need a unique list of > keywords based on my text in all docs from the whole index. Simply use IndexReader.terms() to iterate over all terms in the index. You can then use IndexReader.docFreq(Ter

Re: Lucene performance issues..

2008-07-27 Thread Daniel Naber
On Sonntag, 27. Juli 2008, Mazhar Lateef wrote: > We have also tried upgrading the lucene version to 2.3 in hope to > improve performance but the results were quite the opposite. but from my > research on the internet the Lucene version 2.3 is much faster and > better so why are we seeing such inc

Re: Boost token when storing document?

2008-07-13 Thread Daniel Naber
On Sonntag, 13. Juli 2008, Darren Govoni wrote: > Hi, > Sorry if I missed this in the documentation, but I wanted to know if > Lucene allows boosting of tokens _within_ a field when a document is > stored? Yes, you can use payloads for that, see http://wiki.apache.org/lucene-java/Payloads Re

Re: too many clauses exception

2008-07-04 Thread Daniel Naber
On Freitag, 4. Juli 2008, Gaurav Sharma wrote: > I am stuck with an exception in lucene (too many clauses). > When i am using a wild card such as a* i am getting too many clauses > exception. It saying maximum clause count is set to 1024. Is there any > way to increase this count. Please see http

Re: document retrieval 100 times slower after finishing some heavy disk operation

2008-06-28 Thread Daniel Naber
On Sonntag, 29. Juni 2008, qaz zaq wrote: > I have 2 FSDirectory indexes each with size about 500M. I have 2 > parallel search threads fetching 200 documents from these 2 > indexes which usually take less then 16ms. Fetching documents means that per document about 2 disk seeks are needed to acce

Re: document retrieval 100 times slower after finishing some heavy disk operation

2008-06-28 Thread Daniel Naber
On Sonntag, 29. Juni 2008, qaz zaq wrote: > indexes which usually take less then 16ms. However, everytime afer some > heavy disk operations (such as copy 1G size of a file into that disk) , > the document retrieval slows down to couple seconds immediately, > even well after this disk operation bei

Re: Getting irrelevant results using fuzzy query

2008-06-23 Thread Daniel Naber
On Montag, 23. Juni 2008, László Monda wrote: > According to the current Lucene documentation at > http://lucene.apache.org/java/2_3_2/api/index.html it seems to me that > the Query class doesn't have any explain() methods. It's in the IndexSearcher and it takes a query and a document number as i

Re: Getting irrelevant results using fuzzy query

2008-06-18 Thread Daniel Naber
On Mittwoch, 18. Juni 2008, László Monda wrote: > Additional info: Lucene seems to do the right thing when only few > documents are present, but goes crazy when there is about 1.5 million > documents in the index. Lucene works well with more documents (currently using it with 9 million). but the

Re: Getting irrelevant results using fuzzy query

2008-06-18 Thread Daniel Naber
On Mittwoch, 18. Juni 2008, László Monda wrote: > Since fuzzy searching is based on the Levenshtein distance, the distance > between "coldplay" and "coldplay" is 0 and the distance between > "coldplay" and "downplay" is 3 so how on earth is possible that when > searching for "coldplay", Lucene ret

Re: Displaying and highlighting results from a Wild Card and Fuzzy search using Lucene in Java

2008-06-01 Thread Daniel Naber
On Sonntag, 1. Juni 2008, syedfa wrote: > I am trying to display my results from doing a search of an xml document > (some quotes from shakespeare's "Hamlet") using a WildCard and Fuzzy > search, and then I'm trying to highlight the keyword(s) in the results, > but unfortunately I am having proble

Re: Fwd: Snowball not finding "purple"

2008-05-10 Thread Daniel Naber
On Samstag, 10. Mai 2008, Stephen Cresswell wrote: > If it was a difference between indexing / querying, why would lucene > find the word "ribbon" and not "purple" even though they appear in the > same document and are both exact matches? Using Snowball, "purple" becomes "purpl" but "ribbon" isn'

Re: Fwd: Snowball not finding "purple"

2008-05-10 Thread Daniel Naber
On Samstag, 10. Mai 2008, Stephen Cresswell wrote: > For some reason it seems that either Lucene or Snowball has a problem > with the color purple. According the snowball experts the problem is > with lucene. Can anyone shed any light? Thanks, You are aware that you need to use the same analyzer

Re: Search for phrases

2008-04-15 Thread Daniel Naber
On Dienstag, 15. April 2008, palexv wrote: > I have not tokenized phrases in index. > What query should I use? > Simple TermQuery does not work. Probably PhraseQuery with an argument like "java dev" (no asterisk). > If I try to use QueryParser , what analyzer should I use? Probably KeywordAnaly

Re: Search for phrases

2008-04-14 Thread Daniel Naber
On Montag, 14. April 2008, palexv wrote: > For example I need to search for "java de*" and recieve "java > developers", "java development", "developed by java" etc. If your text is tokenized, this is not supported by QueryParser but you can create such queries using MultiPhraseQuery. If you don'

Re: Compiled Term Hightlighter

2008-02-09 Thread Daniel Naber
On Samstag, 9. Februar 2008, Cesar Ronchese wrote: > I'm not a java developer, so I'm getting stuck on compiling the Term > Highlighter of source files acquired from the Lucene Sandbox. The highlighter is part of the release, in Lucene 2.3 it's under /build/contrib/highlighter/lucene-highlighter

Re: Escape character and Special character

2008-01-30 Thread Daniel Naber
On Mittwoch, 30. Januar 2008, Joshua W Hui wrote: > Thanks for the information. Does it also apply to fuzzy search? I think so. > Also, a simple question... how can I find out which release the fix will > go in? Currently, it only has a patch. It's not yet assigned to any version (it says "Fix

Re: Escape character and Special character

2008-01-30 Thread Daniel Naber
On Mittwoch, 30. Januar 2008, Joshua W Hui wrote: > When I tried to do a lucene search using escape character with other > special character like the following: > > SUBJECT:Yahoo\!~0.5 > SUBJECT:Yahoo\!* > > It seems the parser totally ignores the escape character, and becomes It's a known bug, s

Re: Retain the index

2008-01-27 Thread Daniel Naber
On Sonntag, 27. Januar 2008, anjana m wrote: >         IndexWriter writer = new IndexWriter(indexDir, new > StandardAnalyzer(), true); The true parameter means that the old index will be deleted, is that your problem? Regards Daniel -- http://www.danielnaber.de -

Re: Stemmers remove part of a query when using QueryParser

2008-01-26 Thread Daniel Naber
On Samstag, 26. Januar 2008, Jay Hill wrote: > I have added stemming Analyzer to my indexing and searching. I've tried > both Porter and KStem, have gotten very good results with both with > KStem being the best. The only problem is that, when analyzing on the > search end using QueryParser part o

Re: Is Fair Similarity working with lucene 2.2 ?

2008-01-22 Thread Daniel Naber
On Dienstag, 22. Januar 2008, Fabrice Robini wrote: > Oooops sorry, bad cut/paste... > > Here is the right one :-) The score is the same, so documents with a lower id (inserted earlier) will be returned first. So everything looks okay to me, or am I missing something? regards Daniel -- http

Re: Is Fair Similarity working with lucene 2.2 ?

2008-01-21 Thread Daniel Naber
On Montag, 21. Januar 2008, Fabrice Robini wrote: > I've tried the "fair" similarity described here > (http://www.nabble.com/a-%22fair%22-similarity-to5806739.html#a5806739) > with lucene 2.2 but it does not seems to work. What exactly doesn't work, don't you see an effect? At least the scores s

Re: Stemming and highlighting

2008-01-04 Thread Daniel Naber
On Freitag, 4. Januar 2008, Marjan Celikik wrote: > I am a new Lucene user and I would like to know the following. How does > Lucene bring together fuzzy queries and highlighting? You need to call rewrite() on the fuzzy query. This will expand the fuzzy query to all similar terms (e.g. belies~ -

Re: Prioiritze new documents

2007-12-30 Thread Daniel Naber
On Sonntag, 30. Dezember 2007, Dominik Bruhn wrote: > I already know this, but ALL documents got a bost of 1. The values > should a least differ some how, shouldnt they? Yes, it looks like a bug, at least in the javadoc. In FieldsReader, the document is created but setBoost() is never called. So

Re: Prioiritze new documents

2007-12-30 Thread Daniel Naber
On Sonntag, 30. Dezember 2007, Dominik Bruhn wrote: > Although I set the Boost via doc.setBoost(value) for each document > before writing it to the index it doesnt change anything. Even worse if > I look at the index using Luke (Version 0.7.1) each document got a boost > of 1 not of the value supp

Re: Simple Filter-Question

2007-12-30 Thread Daniel Naber
On Sonntag, 30. Dezember 2007, Dominik Bruhn wrote: > I know this is the wrong approach and that the right solution should be > a Filter. But I dont know which filter to use and how. The simplest approach is probably to wrap your limiting query with this class: http://lucene.apache.org/java/2_2

Re: Analyzer to use with MultiSearcher using various indexes for multiple languages

2007-12-18 Thread Daniel Naber
On Dienstag, 18. Dezember 2007, Jay Hill wrote: > We > have a requirement to search across multiple languages, so I'm planning > to use MultiSearcher, passing an array of all IndexSearchers for each > language. You will need to analyze the query once per language and then build a new BooleanQuer

Re: SpellChecker: Spanish Dictionary

2007-12-13 Thread Daniel Naber
On Donnerstag, 13. Dezember 2007, Haroldo Nascimento wrote: >   I am using the SpellCheck classes of Lucene for create  the "Did you > Mean" feature. >   I need load into memory all verbets of Spanish language (it wil be my > dictinary). > >   Where I can get (download) this dictionary. Maybe .txt

Re: content depending Analyzing

2007-12-10 Thread Daniel Naber
On Montag, 10. Dezember 2007, Helmut Jarausch wrote: > an Analyzer > implements a 'TokenStream(String fieldName, Reader reader)" > But for me that's too late. When tokenizing the TOC > field I would need access to the LANG field to decide > how to tokenize. IndexWriter contains an addDocument()

Re: Explanation

2007-11-23 Thread Daniel Naber
On Samstag, 24. November 2007, John Griffin wrote: >             System.out.println(indexSearcher.explain(query, > counter).toString()); I think you need to use hits.id() instead of counter. Regards Daniel -- http://www.danielnaber.de -

Re: AND query in SHOULD

2007-11-22 Thread Daniel Naber
On Donnerstag, 22. November 2007, Rapthor wrote: > I want to realize a search that finds the exact phrase I provide. You simply need to create a PhraseQuery. See http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/PhraseQuery.html Regards Daniel --

Re: neither IndexWriter nor IndexReader would delete documents

2007-11-20 Thread Daniel Naber
On Montag, 19. November 2007, flateric wrote: > the number returned by delete is 0, but the "uid" shows up in Luke so it > is there. Not sure what the problem might be, but it can surely be analyzed if you write a small self-contained test-case and post it here. Regards Daniel -- http://www.

Re: neither IndexWriter nor IndexReader would delete documents

2007-11-19 Thread Daniel Naber
On Sonntag, 18. November 2007, flateric wrote: > IndexReader ir = IndexReader.open(fsDir); > ir.deleteDocuments(new Term("uid", uid)); > ir.close(); > > Has absolutely no effect. What number does ir.deleteDocuments return? If it's 0, the uid cannot be found. If it's > 0: note that you need to re

Re: neither IndexWriter nor IndexReader would delete documents

2007-11-18 Thread Daniel Naber
On Sonntag, 18. November 2007, flateric wrote: > Has absolutely no effect. I also tried delete on the IndexWriter - no > effect. Please use the tool Luke to have a look inside your index to see if a document with field "uid" and the uid you're expecting really exists. The field should be UN_TOK

Re: Can I use Ispell dictionaries roe analizers in Lucene?

2007-11-18 Thread Daniel Naber
On Sonntag, 18. November 2007, Alebu wrote: > So what ispell dictionary actually is? List of rules for translation > some words (or sentence?) to 'base form'? Or what? It's a list of terms with optional flags. For example: walk/xy In a different file, the flag "x" would then be defined as "appe

Re: Can I use Ispell dictionaries roe analizers in Lucene?

2007-11-18 Thread Daniel Naber
On Sonntag, 18. November 2007, Alebu wrote: > 1. To analyze non English language I need to use specific analyzer. You don't have to, but it helps improving recall. > Can I use Ispell dictionaries with Lucene? It depends on the dictionary. Some dictionary authors use the ispell flagging system

Re: OutOfMemoryError on small search in large, simple index

2007-11-13 Thread Daniel Naber
On Dienstag, 13. November 2007, Lars Clausen wrote: > Can it be right that memory usage depends on size of the index rather > than size of the result? Yes, see IndexWriter.setTermIndexInterval(). How much RAM are you giving to the JVM now? Regards Daniel -- http://www.danielnaber.de ---

Re: Question regarding proximity search

2007-11-01 Thread Daniel Naber
On Thursday 01 November 2007 10:45, Sonu SR wrote: > I got confused of proximity search. I am getting different results for > the queries TTL:"test device"~2 and TTL:"device test"~2 Order is significant, this is described here: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc

Re: Hits.score mystery

2007-11-01 Thread Daniel Naber
On Wednesday 31 October 2007 19:14, Tom Conlon wrote: > 119.txt 17.865013    97%    (13 occurences) > 45.txt  8.600986 47%  (18 occurences) 45.txt might be a document with more therms so that its score is lower although it contains more matches. Regards Daniel -- http://www.danie

Re: org.apache.lucene.analysis.ngram ???

2007-10-30 Thread Daniel Naber
On Tuesday 30 October 2007 11:57, Marco wrote: > I'm trying to use the class > org.apache.lucene.analysis.ngram.EdgeNGramTokenizer. > I 'm using lucene 2.2.0 and I included i my classpath > lucene-core-2.2.0.jar. I have: That class is in contrib/analyzers/lucene-analyzers-2.2.0.jar Regards Dani

Re: Exception with org.apache.lucene.store.Directory

2007-10-27 Thread Daniel Naber
On Saturday 27 October 2007 15:11, dinesh chothe wrote: > Thanks for your reply. I have changed my all imports. > Even I am using > <%@ page import= "org.apache.lucene.store.Directory "%> > still also I am getting same error. Are your JAR files (Lucene etc) in WEB-INF/lib in your web ap

Re: Exception with org.apache.lucene.store.Directory

2007-10-27 Thread Daniel Naber
On Saturday 27 October 2007 13:20, dinesh chothe wrote: > <%@ page import= "org.apache.lucene.store.Directory.* "%> That's a class, not a package, so try: <%@ page import= "org.apache.lucene.store.Directory "%> Similar for the other classes. Regards Daniel -- http://www.danielnaber.de

Re: fuzzy search MultifieldQueryParser - Lucene 2.2

2007-10-26 Thread Daniel Naber
On Friday 26 October 2007 19:06, Zdeněk Vráblík wrote: > It works if query string ends with ~, but how to switch it on for all > query? That's not supported AFAIK. You will need to iterate over the query (recursively if it's an instance of BooleanQuery) and create a new query where all parts ar

Re: Sort by date with Lucene 2.2.0 ...

2007-10-23 Thread Daniel Naber
On Tuesday 23 October 2007 15:57, Dragon Fly wrote: > I tried specifying the field type using a SortField object but I got the > same result.  I'll be glad to write a stand-alone test case.  Should I > post the code to this thread when I'm done or should I submit some sort > of bug report? Thanks.

Re: MoreLikeThis across multiple fields question...

2007-10-21 Thread Daniel Naber
On Sunday 21 October 2007 17:21, Chris Sizemore wrote: > i'm using MoreLikeThis. i'm trying to run the document comparison across > more than one field in my index, but i'm not at all sure that it's > actually happening -- when i examine the constructed query, only one > field is mentioned! here's

Re: MoreLikeThis across multiple fields question...

2007-10-21 Thread Daniel Naber
On Sunday 21 October 2007 17:21, Chris Sizemore wrote: > i'm using MoreLikeThis. i'm trying to run the document comparison across > more than one field in my index, but i'm not at all sure that it's > actually happening -- when i examine the constructed query, only one > field is mentioned! here's

Re: Norm - please lit it up for me

2007-10-19 Thread Daniel Naber
On Friday 19 October 2007 19:07, Karl Wettin wrote: > doc[0] > doc[1] > > With normalization doc[0] and doc[1] are equally important. Omitting   > normalization makes doc[0] (usually) three times as important as doc[1]. Not quite, as the normalization only refers to the length of the document.

Re: Sample SynonymAnalyzer vs. Lucene 2.2

2007-10-19 Thread Daniel Naber
On Friday 19 October 2007 14:42, Sean Dague wrote: > Ends up only indexing the synonym, but not the base word itself. I cannot reproduce the problem, i.e. I see both the original term and its synonyms in the index. Maybe you can post the analyzer that uses this filter or a test case to reproduc

Re: Sort by date with Lucene 2.2.0 ...

2007-10-19 Thread Daniel Naber
On Thursday 18 October 2007 21:35, Dragon Fly wrote: > I'm am trying to sort a date field in my index but I'm seeing strange > results.  I have searched the Lucene user mail archive for Datetools but > still couldn't figure out the problem. It shouldn't make a difference but does it help if you s

Re: Problems with stemming/SpellChecker

2007-10-13 Thread Daniel Naber
On Saturday 13 October 2007 07:57, Christian Aschoff wrote: > But as fare as i see (in the API DOC), the GermanAnalyzer is attached   > to the IndexWriter, i can't find an way to attach an analyzer it to a   > single field... Or do i miss something? See PerFieldAnalyzerWrapper. Regards Daniel

Re: Problems with stemming/SpellChecker

2007-10-12 Thread Daniel Naber
On Friday 12 October 2007 15:48, Christian Aschoff wrote: >  indexWriter = new IndexWriter(MiscConstants.luceneDir,   > new GermanAnalyzer(), create); > [...] Not NO_NORMS is the problem but GermanAnalyzer. Try StandardAnalyzer on the field you get the suggestions from. Regards Daniel -- htt

Re: Weird operator precedence with default operator AND

2007-10-09 Thread Daniel Naber
On Tuesday 09 October 2007 09:55, Martin Dietze wrote: > I've been going nuts trying to use LuceneParser parse query > strings using the default operator AND correctly: The operator precedence is known to be buggy. You need to use parenthesis, e.g. (aa AND bb) OR (cc AND dd) regards Daniel -

Re: Indexing Speed using Java Lucene 2.0 and Lucene.NET 2.0

2007-09-10 Thread Daniel Naber
On Monday 10 September 2007 14:59, Laxmilal Menaria wrote: > I have created a Index Application using Java lucene 2.0 in java and > Lucene.Net 2.0 in VB.net. Both application have same logic. But when I > have indexed a database with 14000 rows from both application and same > machine, I surprised

Re: Reading Existing index

2007-08-11 Thread Daniel Naber
On Saturday 11 August 2007 02:20, Aleesh wrote: >  Need your help regarding reading existing index. Actually I am trying > to read an existing index ans just wanted to know, is there a way to > identify type of 'Analyzer' which was used at the index creation time? That information is not part of

Re: Fastest way to perform 'like' searches

2007-08-08 Thread Daniel Naber
On Wednesday 08 August 2007 10:28, Ard Schrijvers wrote: > Does anybody know a more efficient way? A PhraseQuery might get me > somewhere, isn't? No, you need to use MultiPhraseQuery, and you will need to first epxand the terms with the "*" yourself (e.g. using term enumeration). > as a phrase

Re: docFreq takes long time to execute in a multiple index environment

2007-08-06 Thread Daniel Naber
On Monday 06 August 2007 01:40, tierecke wrote: >         Term term=new Term("contents", termstr); >         TermEnum termenum=multireader.terms(term); >         int freq=termenum.docFreq(); IndexReader has a docFreq() method, no need to get a Term enumeration. regards Daniel -- http://www.da

Re: Query parsing?

2007-07-25 Thread Daniel Naber
On Wednesday 25 July 2007 00:44, Lindsey Hess wrote: > Now, I do not need Lucene to index anything, but I'm wondering if Lucene > has query parsing classes that will allow me to transform the queries. The Lucene QueryParser class can parse the format descriped at http://lucene.apache.org/java/d

Re: stop words, synonyms... what's in it for me?

2007-05-21 Thread Daniel Naber
On Monday 21 May 2007 22:53, bhecht wrote: > If someone searches for mainstrasse, my tools will split it again to > main and strasse, and then lucene will be able to find it. "strasse" will match "mainstrasse" but the phrase query "schöne strasse" will not match "schöne mainstrasse". However, th

Re: stop words, synonyms... what's in it for me?

2007-05-21 Thread Daniel Naber
On Monday 21 May 2007 22:05, bhecht wrote: > Is there any point for me to start creating custom analyzers with filter > for stop words, synonyms, and implementing my own "sub string" filter, > for separating tokens into "sub words" (like "mainstrasse"=> "main", > "strasse") Yes: I assume your doc

Re: Lucene Developer

2007-05-11 Thread Daniel Naber
On Friday 11 May 2007 19:21, Chris Hostetter wrote: > Please do not send resume requests to any of the @lucene email lists. > There is a wiki page listing parties available for hire who are > knowledgable in Lucene for this explicit purpose... > > http://wiki.apache.org/lucene-java/Support

Re: Simple, always do wildcard or fuzzy query

2007-05-11 Thread Daniel Naber
On Thursday 10 May 2007 23:09, bbrown wrote: > I think this is a simple question; or dont know. Is there a way to > automatically convert all tokens to wildcard query with any given input. Either just append the "*" before you pass your terms, or extend QueryParser and overwrite getFieldQuery()

Re: Locking in Lucene 2.1

2007-05-09 Thread Daniel Naber
On Wednesday 09 May 2007 21:18, Andreas Guther wrote: > Do I miss something here or is the documentation not updated? Looks like that part of the documentation isn't up-to-date. The file is called write.lock and it's stored in the index directory. Could you file an issue so the documentation ge

Re: search problem/odd results

2007-05-09 Thread Daniel Naber
On Wednesday 09 May 2007 16:17, John Powers wrote: > Yes, it doesn't work.     it gives an error modal dialog box that says > "IMPL". Is there a more useful error message when you start Luke from the command line and try to open the index? Regards Daniel -- http://www.danielnaber.de ---

Re: search problem/odd results

2007-05-08 Thread Daniel Naber
On Tuesday 08 May 2007 23:42, John Powers wrote: > I've had problems with luke in the past not being able to read the > files. Just make sure you specify the directory, not the files when opening an index with Luke. Also use the latest version (0.7). Regards Daniel -- http://www.danielnaber

Re: Leading and trailing wildcard together

2007-04-21 Thread Daniel Naber
On Saturday 21 April 2007 17:16, Mohsen Saboorian wrote: > however I wasn't able to search > for *Foo* (while "?Foo*" and even "?*Foo*" works). Is it possible to > have leading and trailing star wildcard together? That's a bug in the 2.1 release which has been fixed in SVN trunk. There's also a

Re: Issue with : Searcher.search() returning Hits of same length for different searches

2007-04-11 Thread Daniel Naber
On Wednesday 11 April 2007 18:51, Lokeya wrote: > Thanks for your reply. I should have given more information and will > keep in mind this for my future queries. If nothing else helps, please write a small, standalone test-case that shows the problem. This can then easily be debugged by someone

Re: Issue with search() Help Appreciated.

2007-04-09 Thread Daniel Naber
On Tuesday 10 April 2007 08:40, Lokeya wrote: > But when i try to get hits.length() it is 0. > > Can anyone point out whats wrong ? Please check the FAQ first: http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71 Regards Daniel -- http://www.danielnaber.d

Re: luke v0.7 and SnowBallAnalyzer

2007-04-06 Thread Daniel Naber
On Thursday 05 April 2007 17:07, Paul Hermans wrote: > I do receive the message > "java.lang.ClassNotFound: > net.sf.snowball.ext.GermansStemmer". This class is not part of the lukeall-0.7.jar, but it's in lucene-snowball-2.1.0.jar (which you can find on the Luke homepage). You will then need t

Re: indexing and searching the document title question

2007-02-27 Thread Daniel Naber
On Tuesday 27 February 2007 23:07, Phillip Rhodes wrote: > NAME:"color me mine"^2.0 (CONTENTS:color CONTENTS:me CONTENTS:mine) Try a (much) higer boost like 20 or 50, does that help? Regards Daniel -- http://www.danielnaber.de -

Re: Fwd: Unable to retreive 2/13 field values

2007-02-27 Thread Daniel Naber
On Tuesday 27 February 2007 19:21, Michael Barbarelli wrote: > GB821628930  (+VAT_reg:GB* doesn't work) What about VAT_reg:gb*? Also see QueryParser.setLowercaseExpandedTerms() Regards Daniel -- http://www.danielnaber.de - T

Re: possible to disable internal caching?

2007-02-14 Thread Daniel Naber
On Wednesday 14 February 2007 17:12, jm wrote: > So my question, is it possible to disable some of the caching lucene > does so the memory consumption will be smaller (I am a bit concerned > on the memory usage side)? Or the memory savings would not pay off? You could set IndexWriter.setTermIndex

Re: Merge factor problem,

2007-02-09 Thread Daniel Naber
On Friday 09 February 2007 17:14, Sairaj Sunil wrote: > I have increased the merge factor from 10 to 50. Please try increasing setMaxBufferedDocs() instead, does that help? Regards Daniel -- http://www.danielnaber.de - To un

Re: Slow performance (Fetching Hits)

2007-02-08 Thread Daniel Naber
On Thursday 08 February 2007 13:54, Laxmilal Menaria wrote: > This will take more than 30 secs for 1,50,000 docs (40 > MB Index).. What exactly takes this much time? You're not iterating over all hits, are you? Also see http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-1b15abeee21b0a72492b1

Re: upgrading from Lucene 1.4.3 to Lucene 2.0

2007-02-06 Thread Daniel Naber
On Tuesday 06 February 2007 13:21, [EMAIL PROTECTED] wrote: > Which performance improvements can I expect when upgrading from Lucene > 1.4.3 to Lucene 2.0 ? This is difficult to say, but you can update to Lucene 1.9 probably without doing any changes to your code and then make a performance test

Re: search keyword Fields +Text Field with BooleanQuery

2007-01-22 Thread Daniel Naber
On Monday 22 January 2007 17:19, Xue, Yijun wrote: > I try a query "Secondname:Beckwith AND Firstname:Louise AND > content:school" > on Luke with WhitespaceAnalyzer, I can get hits, but nothing if I use > StandardAnalyzer You need to use the same analyzer for indexing and searching. For example,

Re: rewriting wildcard query before highlighting

2007-01-18 Thread Daniel Naber
On Thursday 18 January 2007 14:48, Mark Miller wrote: > Would it be more efficient to make a RAM index with just > the doc to be highlighted and then pass the reader of that into the > rewrite method before highlighting a query that expands? Yes, that's a valid approach, especially using MemoryIn

Re: confuse of required and prohibited in BooleanQuery

2007-01-17 Thread Daniel Naber
On Wednesday 17 January 2007 11:30, David wrote: >    2.There are four logical combinations of these flags, but the case > where both are true is an illogical and invalid combination >    but I don't know why, Can anybody explain it to me? You're right. Because of this the API was changed in Luce

Re: Remote Searcher performance and Document retrieval

2007-01-08 Thread Daniel Naber
On Monday 08 January 2007 23:08, sashaman wrote: > Can anyone comment on this performance issue? Have you compared to a local index? It's not uncommon for several doc() calls to take more time than searching, as doc() requires a lot I/O, even locally. Regards Daniel -- http://www.danielnabe

Re: boosting instead of sorting WAS: to boost or not to boost

2006-12-21 Thread Daniel Naber
On Thursday 21 December 2006 10:55, Martin Braun wrote: > and in my case I have some documents > which have same values in many fields (=>same score) and the only > difference is the year. Andrzej's response sounds like a good solution, so just for completeness: you can sort by more than one cri

Re: lucene injection

2006-12-21 Thread Daniel Naber
On Thursday 21 December 2006 10:56, Deepan wrote: > I am bothered about security problems with lucene. Is it vulnerable to > any kind of injection like mysql injection? many times the query from > user is passed to lucene for search without validating. This is only an issue if your index has perm

Re: to boost or not to boost

2006-12-20 Thread Daniel Naber
On Wednesday 20 December 2006 17:32, Martin Braun wrote: > so a doc from 1973 should get a boost of 1.1973 and a doc of 1975 should > get a boost of 1.1975 . The boost is stored with a limited resolution. Try boosting one doc by 10, the other one by 20 or something like that. Regards Daniel -

Re: MultiFieldQueryParser doesn't properly filter out documents when the query string specifies to exclude certain terms

2006-12-19 Thread Daniel Naber
On Tuesday 19 December 2006 23:05, Scott Sellman wrote: >                         new > BooleanClause.Occur[]{BooleanClause.Occur.SHOULD, > BooleanClause.Occur.SHOULD} Why do you explicitly specify these operators? > q.add(keywordQuery, BooleanClause.Occur.MUST); //true, false); You seem to wra

Re: Indexing clarification , please advice

2006-12-13 Thread Daniel Naber
On Wednesday 13 December 2006 14:10, abdul aleem wrote: > a) Indexing large file ( more than 4MB ) >    Do i need to read the entire file as string using >    java.io and create a Document object ? You can also use a reader: http://lucene.apache.org/java/2_0_0/api/org/apache/lucene/document/Fiel

Re: de-boosting fields

2006-12-09 Thread Daniel Naber
On Saturday 09 December 2006 02:25, Scott Smith wrote: > What is the best way to do this?  Is changing the boost the right > answer?  Can a field's boost be zero? Yes, just use: term1 term2 category1^0 category2^0. Erick's Filter idea is also useful. Regards Daniel -- http://www.danielnaber.

too many parentheses confuse Lucene

2006-12-05 Thread Daniel Naber
Hi, a query like (-merkel) AND schröder is parsed as +(-body:merkel) +body:schröder I get no hits for this query because +(-body:merkel) doesn't return any hits (it's not a valid query for Lucene). However, a query like -merkel AND schröder works fine. From the user's point-of-view, both q

Re: Customized Analyzer

2006-12-05 Thread Daniel Naber
On Tuesday 05 December 2006 20:14, Alice wrote: > It returns > content:"(wind window)" That might be the correct representation of a MultiPhraseQuery. So does your query work anyway? It's just that you cannot use QueryParser again to parse this output (similar to some other queries like SpanQue

Re: Customized Analyzer

2006-12-05 Thread Daniel Naber
On Tuesday 05 December 2006 21:37, Alice wrote: > It does not work. > > Even with the synonyms indexed it is not found. So if your text contains "wind" it is not found by the query that prints as content:"(wind window)"? Then I suggest you post a small test case that shows this problem. As Chri

Re: Lucene search performance: linear?

2006-12-05 Thread Daniel Naber
On Tuesday 05 December 2006 03:49, Zhang, Lisheng wrote: > I found that search time is about linear: 2nd time is about 2 times > longer than 1st query. What exactly did you measure, only the search() or also opening the IndexSearcher? The later depends on index size, thus you shouldn't re-open

Phrase queries with wildcards

2006-12-03 Thread Daniel Naber
Hi, Lucene's phrase queries don't support wildcards and I'm thinking about the best way to "fix" this. One way would be to change QueryParser so that it builds a MultiPhraseQuery when it encounters a wildcard inside a phrase. However, to expand the wildcard the QueryParser needs an IndexReader.

  1   2   3   >