Re: search for special condition.

2008-08-13 Thread 장용석
Hi. I was very happy ,you are love Korean language a lot :) So do you want search for special characters? If you want include special characters when indexing, you can override method in class Tokenizer. Method's name is isTokenChar(char c). protected boolean isTokenChar(char c) { return

Re: search for special condition.

2008-08-13 Thread Mr Shore
can nutch or lucene support search for special characters like .? when i search ".net" many result come for "net" i want to exclude them ps:i love korean language a lot 2008/8/13 장용석 <[EMAIL PROTECTED]> > hi. thank you for your response. > > I was found the way with your help. > > There are class

Payloads and tokenizers

2008-08-13 Thread Antony Bowesman
I started playing with payloads and have been trying to work out how to get the data into the payload I have a field where I want to add the following untokenized fields A1 A2 A3 With these fields, I would like to add the payloads B1 B2 B3 Firstly, it looks like you cannot add payloads to un

Re: Searching Tokenized x Un_tokenized

2008-08-13 Thread Andre Rubin
Thanks Otis, I created a custom analyzer and it's working fine. Here's my analyzer, for reference: public class KeywordLowerAnalyzer extends Analyzer{ public KeywordLowerAnalyzer() { } public TokenStream tokenStream(String fieldName, Reader reader) { T

Re: possible to read index into memory?

2008-08-13 Thread Darren Govoni
Erick, Thank you for the valuable tips. The time I'm measuring is just around the lucene search calls with standard analyzer, such as: word = "helloo" starttime = ... query = QueryParser("word", analyzer).parse(word+"~0.76") hits = searcher.search(query) endtime = ... e

Re: Case Sensitivity

2008-08-13 Thread Erick Erickson
What analyzer are you using at *query* time? I suspect that's where your problem lies if you indeed "don't use any sophisticated analyzers", since you *are* using a sophisticated analyzer at index time. You almost invariably want to use the same analyzer at query time and analyzer time. Please sta

RE: Case Sensitivity

2008-08-13 Thread Steven A Rowe
Hi Dino, StandardAnalyzer incorporates StandardTokenizer, StandardFilter, LowerCaseFilter, and StopFilter. Any index you create using it will only provide case-insensitive matching. Steve On 08/13/2008 at 12:15 PM, Dino Korah wrote: > Also would like to highlight the version of Lucene I am >

Re: Indexing sections of TEI XML files

2008-08-13 Thread Tricia Williams
Hi, Take a look at what I've done with SOLR-380 (https://issues.apache.org/jira/browse/SOLR-380). The part you might find particularly useful is the Tokenizer. Tricia [EMAIL PROTECTED] wrote: Dear users, Question on approaches to indexing TEI XML or similar section/subsectioned files.

RE: Case Sensitivity

2008-08-13 Thread Dino Korah
Also would like to highlight the version of Lucene I am using; It is 2.0.0. _ From: Dino Korah [mailto:[EMAIL PROTECTED] Sent: 13 August 2008 17:10 To: 'java-user@lucene.apache.org' Subject: Case Sensitivity Hi All, Once I index a bunch of documents with a StandardAnalyzer (and if t

Case Sensitivity

2008-08-13 Thread Dino Korah
Hi All, Once I index a bunch of documents with a StandardAnalyzer (and if the effort I need to put in to reindex the documents is not worth the effort), is there a way to search on the index without case sensitivity. I do not use any sophisticated Analyzer that makes use of LowerCaseTokenizer.

Re: Indexing sections of TEI XML files

2008-08-13 Thread Karsten F.
Hi A. starting point of xtf was the TEI format. I am very curious, if you find a missing point for your needs. (I already used it with cocoon.) I never saw a better implementation of searching xml-aware: Each hit knows his exact position inside the indexed(=source) xml-file :-) I you dive into

Re: possible to read index into memory?

2008-08-13 Thread Erick Erickson
How are you measuring? There is a bunch of setup work for the first few queries that go through the system. In either case (RAM or FS), you should fire a few representative warmup queries at the search engine before you go ahead and measure the response time. You also *must* isolate your search ti

Re: Number range search

2008-08-13 Thread Doron Cohen
The code seems correct (although it doesn't show which analyzer was used at indexing). Note that when adding numbers like this there's no real point in analyzing them, so I would add that field as UN_TOKENIZED. This would be more efficient, and would also comply with the query parser who does not

Re: Listing fields in an index

2008-08-13 Thread John Patterson
Thanks! I was looking in IndexReader for a good couple of minutes and didn't see that! Erik Hatcher wrote: > > > On Aug 13, 2008, at 5:02 AM, John Patterson wrote: >> How do I list all the fields in an index? Some documents do not >> contain all >> fields. > > Have a look at IndexReader#ge

Re: possible to read index into memory?

2008-08-13 Thread Darren Govoni
Hoss, Thank you for the detailed response. What I found weird was it seemed to take 0.09 seconds to create a RAMDirectory off a 17MB index. Suspiciously fast, but ok. Yet, when I do a simple fuzzy search on a single field "word: someword~0.76" It was taking .35 seconds. That's a very very lo

Re: Listing fields in an index

2008-08-13 Thread Erik Hatcher
On Aug 13, 2008, at 5:02 AM, John Patterson wrote: How do I list all the fields in an index? Some documents do not contain all fields. Have a look at IndexReader#getFieldNames(). That'll give you back field names regardless of which documents have them. Erik --

Number range search

2008-08-13 Thread m.harig
hi all. am indexing a price field by doc.add(new Field("price", "1450", Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field("price", "3800", Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field("pri

Listing fields in an index

2008-08-13 Thread John Patterson
Hi, How do I list all the fields in an index? Some documents do not contain all fields. Thanks, John -- View this message in context: http://www.nabble.com/Listing-fields-in-an-index-tp18959436p18959436.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

Listing fields in an index

2008-08-13 Thread John Patterson
Hi, How do I list all the fields in an index? Some documents do not contain all fields. Thanks, John -- View this message in context: http://www.nabble.com/Listing-fields-in-an-index-tp18959421p18959421.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

Re: Indexing sections of TEI XML files

2008-08-13 Thread ao1
Thanks, Erik, but I'm developing this system from scratch as it has specific use cases including dealing with multiple languages including multiple forms of a specific minority language (Irish). I'm going to look at XTF anyway just to see how they managed it! Thanks, A. > Have you looked at XTF

Re: Indexing sections of TEI XML files

2008-08-13 Thread Erik Hatcher
Have you looked at XTF? It does what you're after and much,much more. Erik On Aug 13, 2008, at 4:03 AM, [EMAIL PROTECTED] wrote: Dear users, Question on approaches to indexing TEI XML or similar section/ subsectioned files. I'm indexi

Indexing sections of TEI XML files

2008-08-13 Thread ao1
Dear users, Question on approaches to indexing TEI XML or similar section/subsectioned files. I'm indexing TEI P4 XML files using Lucene 2.x. Currently, each TEI XML file corresponds to a Lucene document. I extract the data from each XML file using XPath expressions e.g. for the body text: "/TEI