Re: Getting Started with Korean

2005-11-11 Thread Cheolgoo Kang
Hi, On 11/11/05, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Hi, > > Was wondering if someone could help me out with a few things in Korean > as related to Lucene: > 1. Which Analyzer do you recommend? From the list, I see that some > have had success with the StandardAnalyzer. Are there any c

Re: Performance Question

2005-11-11 Thread Yonik Seeley
Look at IndexReader.open() It actually uses a MultiReader if there are multiple segments. -Yonik Now hiring -- http://forms.cnet.com/slink?231706 On 11/11/05, Charles Lloyd <[EMAIL PROTECTED]> wrote: > You should run your own tests, but I found the MultiReader to be slower > than a regular IndexR

Re: Max score of two fields

2005-11-11 Thread Yonik Seeley
It doesn't seem like a custom Similarity would work. Always returning 1.0 for coord would still rank a doc higher if both current_name and old_name matched. -Yonik Now hiring -- http://forms.cnet.com/slink?231706 On 11/11/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > I believe if you create a cu

Re: Document as Paramter (Rephrased)

2005-11-11 Thread bib_lucene bib
Thanks for your time. -- Text I want to highlight is stored in the file system and index -- I can search and highlight the searched terms in results page ( just snippets) -- I have given a download link next to snippets ( which will point to file I stored in ROOT webapp of tomcat) I understo

Re: Max score of two fields

2005-11-11 Thread Erik Hatcher
On 11 Nov 2005, at 13:27, Lasse L wrote: I am indexing persons that has the usual fields name, address etc. I need to keep track of which name and addresses are active now and which ones are old. I do that by having a two sets of fields e.g.: current_name and old_name When I search for a per

Re: Document as Paramter (Rephrased)

2005-11-11 Thread Erik Hatcher
On 11 Nov 2005, at 12:54, bib_lucene bib wrote: My requirement is that I do a search, the results of the search are displayed. I am displaying results by using getbestfragmets and highlighting searched text. So basically the user can search and see what documents matched his search with s

Re: Getting Started with Korean

2005-11-11 Thread Youngho Cho
Hello, - Original Message - From: "Grant Ingersoll" <[EMAIL PROTECTED]> To: Sent: Friday, November 11, 2005 10:36 PM Subject: Getting Started with Korean > Hi, > > Was wondering if someone could help me out with a few things in Korean > as related to Lucene: > 1. Which Analyzer do y

Re: A lot of short documents, optimal query?

2005-11-11 Thread Paul Elschot
On Friday 11 November 2005 23:04, Chris Hostetter wrote: > > : Wouldn't it make sense to have BooleanFilter, > : TermFilter, MultiTermFilter, RangeFilter... fammily to > : "mirror" xxxQuery world with same idioms and > : interfaces? Is this the direction allready taken in > : Lucene development (

Re: A lot of short documents, optimal query?

2005-11-11 Thread Chris Hostetter
: Wouldn't it make sense to have BooleanFilter, : TermFilter, MultiTermFilter, RangeFilter... fammily to : "mirror" xxxQuery world with same idioms and : interfaces? Is this the direction allready taken in : Lucene development (an alternative would be to : parametrize existiong Query world). How

Re: Max score of two fields

2005-11-11 Thread Chris Hostetter
: There is no way around using a separate Scorer for this. : You can make (could have made) the scorer by starting from : DisjunctionSumScorer.java here: : http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java/org/apache/lucene/search/ : and rewrite it into a DisjunctionMaxScorer. Coincid

Re: A lot of short documents, optimal query?

2005-11-11 Thread eks dev
Everything is perfect with your suggestion, scoring is not needed. I am going to try all also approach with ChainedFilter, but for this I need to think a bit more on how to get it right. The Query in the example is just one variation on the same topic and there are a few more cases I need to cover

Re: Max score of two fields

2005-11-11 Thread Paul Elschot
Lasse, On Friday 11 November 2005 19:27, Lasse L wrote: > I am indexing persons that has the usual fields name, address etc. > I need to keep track of which name and addresses are active now and > which ones are old. > I do that by having a two sets of fields e.g.: current_name and old_name > > W

Re: highlight only one field

2005-11-11 Thread Ernesto De Santis
Yes, this work. . String strQuery = query.toString(); WeightedTerm[] weightedTerm = QueryTermExtractor.getTerms(query); ArrayList bodyQueryTerms = new ArrayList(); for (int i = 0; i < weightedTerm.length; i++) { String term = weightedTerm[i].getTe

Max score of two fields

2005-11-11 Thread Lasse L
I am indexing persons that has the usual fields name, address etc. I need to keep track of which name and addresses are active now and which ones are old. I do that by having a two sets of fields e.g.: current_name and old_name When I search for a person and I search in just the current fields ran

Re: highlight only one field

2005-11-11 Thread mark harwood
>>> This don't work, because Ah, crap. You'll have to drop down another level. Every line of code in QueryTermsExtractor that calls terms.add(new WeightedTerm(..)) would be the place to test the field name then. For now you could copy QueryTermsExtractor and put an "if" around these lines whi

Re: korean and lucene

2005-11-11 Thread Andrzej Bialecki
Cheolgoo Kang wrote: >Thanks Bialecki, > > Bialecki is my last name, my first name is Andrzej. No problem, it's similarly confusing for Europeans to decide between the first and last name in Asian names... :-) Is your first name Kang? >I'm trying to test your program, thanks a lot! > >And also

Document as Paramter (Rephrased)

2005-11-11 Thread bib_lucene bib
Hi Erik & All My requirement is that I do a search, the results of the search are displayed. I am displaying results by using getbestfragmets and highlighting searched text. So basically the user can search and see what documents matched his search with snippets of text shown in the result of

Re: highlight only one field

2005-11-11 Thread Ernesto De Santis
Hi Mark This don't work, because WeightedTerm[] weightedTerm = QueryTermExtractor.getTerms(query); return query terms values , not the fields names. example: for "body:mark title:highlight" return [mark, highlight], I can't compare this values with "body" field. Ernesto. mark harwood

Re: highlight only one field

2005-11-11 Thread mark harwood
Ah. You're right. Looks like the current highlighter api doesn't offer you that degree of control. The way to fix it is probably to tweak the list of WeightedTerms you give the highlighter: [psuedo code follows...] terms=QueryTermExtractor.getTerms(query); bodyQueryTerms=new ArrayList(); for all

Basic Question on Lucene Document

2005-11-11 Thread Ashwin Satyanarayana
Hello, I am new to Lucene. I was trying to use Lucene with TREC-6 Data. The dataset for TREC-6 used in 1997 contains many input files. Each input file has multiple documents (some files contain over 200 documents) tagged by and the text is tagged by .The result given by Lucene to a

Re: Performance Question

2005-11-11 Thread Charles Lloyd
You should run your own tests, but I found the MultiReader to be slower than a regular IndexReader. I was running on a dual-cpu box and two separate disk drives. Charles. - To unsubscribe, e-mail: [EMAIL PROTECTED] For addit

RE: Insert new records into index

2005-11-11 Thread Paul . Illingworth
I queue up all my index operations. If the app stops the queue gets saved to disk. When the app restarts the queue is loaded and everything carries on. I haven't looked at the app failing just yet. I know the JVM has hooks that can be used to ensure clean up code gets called when the JVM exits

Re: korean and lucene

2005-11-11 Thread Cheolgoo Kang
Thanks Bialecki, I'm trying to test your program, thanks a lot! And also, can you give me the paper you've cited [1] and [2]? I've googled(entire web and google scholar) about it but got nothing. On 11/8/05, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > KwonNam Son wrote: > > >First of all, I re

RE: Insert new records into index

2005-11-11 Thread Aigner, Thomas
Thanks for the advice Paul, I thought about doing two passes.. Delete all and then insert all, but the problem with that approach is if my program fails somewhere in between start and end.. I may end up with many deleted records and none changed. The same could happen with a batch build. How are

Re: Insert new records into index

2005-11-11 Thread Paul . Illingworth
Hello, You really do need to batch up your deletes and inserts otherwise it will take a long time. If you can, do all your deletes and then all of your inserts. I have gone to the trouble of queueing index operations and when a new operation comes along I reorder the job queue to ensure delet

Re: Performance Question

2005-11-11 Thread Yonik Seeley
The IndexSearcher(MultiReader) will be faster (it's what's used for indicies with multiple segments too). -Yonik Now hiring -- http://forms.cnet.com/slink?231706 On 11/11/05, Mike Streeton <[EMAIL PROTECTED]> wrote: > I have several indexes I want to search together. What performs better a > sing

highlight only one field

2005-11-11 Thread Ernesto De Santis
Hi I'm using highlighter and have this problem: The query is over two or more fields, like: *body:home AND title:sale* I want to highlight over body field, but not highlight "sale" if "sale" is in body. How I can do this? When I create a Highlighter instance, the parameter is the query: *hi

Re: RAMDirectory and Hibernate

2005-11-11 Thread Peter Gelderbloem
Martijn, Sorry for the late reply. I've been on holiday. I had other more pressing things come up. The problem I was trying to solve was clustering the indexing and search. I am thinking of breaking my application into indexing and search nodes and keep them coordinated in some fashion. It would

Insert new records into index

2005-11-11 Thread Aigner, Thomas
Howdy all, I am having a problem with inserting/updating records into my index. I have approximately 1.5M records in the index taking about 2.5G space when optimized. If I want to update 1000 records, I delete the old item and insert the new one. This is taking a LONG time to accomplis

Re: Fwd: Re: Term Vectors

2005-11-11 Thread Grant Ingersoll
If you are storing the term vector when you index, then you can ask the IndexReader for the vector using the getTermFreqVector() method, which will return the TermFreqVector which should have the information you need [EMAIL PROTECTED] wrote: I hope that this isn't a newbies question, but let

Re: Fwd: Re: Term Vectors

2005-11-11 Thread marigoldcc
I hope that this isn't a newbies question, but let me ask the more general question. While IndexReader can return the documents containing the term t, I need to do the opposite. Is there a method, given document d, that will return all of the terms in that document (I need to calculate the averag

Re: Document as parameter?

2005-11-11 Thread Erik Hatcher
On 11 Nov 2005, at 01:22, bib_lucene bib wrote: Hi All I use the following code to display search results LuceneHitHighlighter highlighter = new LuceneHitHighlighter (queryStr, "snippet", "body"); for (int i = 0; i < hits.size(); i++) { Document doc = (D

Getting Started with Korean

2005-11-11 Thread Grant Ingersoll
Hi, Was wondering if someone could help me out with a few things in Korean as related to Lucene: 1. Which Analyzer do you recommend? From the list, I see that some have had success with the StandardAnalyzer. Are there any caveats I should be aware of if I choose to use it? 2. Could anyone

Performance Question

2005-11-11 Thread Mike Streeton
I have several indexes I want to search together. What performs better a single searcher on a multi reader or a single multi searcher on multiple searchers (1 per index). Thanks Mike

Re: A lot of short documents, optimal query?

2005-11-11 Thread Chris Hostetter
: - What is the purpose of hasCode and equals methods in : XxxFilter? (this is a question about actual usage in : Lucene, not java elementary :) You mean hashCode right? ... those methods are generally important for Hashing, which makes then key for effective caching in most cases. CachingWrapper