Is analyzing same as tokenizing???

2005-09-26 Thread Anand Kishore
Hi, Is 'Analyzing' same as 'Tokenizing'? When we say the Keyword field is not analyzed, but indexed and stored, does it indicate it is not tokenized as well? That means inorder to find a query match against a keyword there has to be an exact match(case sensitive). -- - Andy

Re: May I use a mixture of indexing methods altogether?

2005-09-26 Thread Otis Gospodnetic
Hi, You need to do a little bit of research :) Please search the mailing list archives (links to them are on Lucene site). Searcj for JDBCDirectory. Use Google and search for +JDBCDirectory +Vito (this is the name of its author). Otis --- Mag Gam <[EMAIL PROTECTED]> wrote: > well, it seems I

Re: May I use a mixture of indexing methods altogether?

2005-09-26 Thread Mag Gam
well, it seems I want to store the index into the database itself. ANy ideas for that? Even if thats possible? On 9/26/05, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > > Lucene indices are created in the file system (FSDirectory) or in > memory (RAMDirectory). If you want to store them elsewhere

Re: May I use a mixture of indexing methods altogether?

2005-09-26 Thread Otis Gospodnetic
Lucene indices are created in the file system (FSDirectory) or in memory (RAMDirectory). If you want to store them elsewhere, you need to implement your own Directory. Otis --- Mag Gam <[EMAIL PROTECTED]> wrote: > Otis: > > Thanks for the good and clean explanation! I will first try this out,

Re: May I use a mixture of indexing methods altogether?

2005-09-26 Thread Mag Gam
Otis: Thanks for the good and clean explanation! I will first try this out, and let you know how that goes...what you are saying is making VERY good sense! Once I index them, will this goto the filesystem, or somewhere else? I want this index to be created in the table, so I can do quick SELECTs t

Re: May I use a mixture of indexing methods altogether?

2005-09-26 Thread Otis Gospodnetic
It's easy, pull the data from DB using something like JDBC, and from retrieved rows create Lucene Documents. Of course, it gets more complicated than this, but start with something simple like using JDBC to run SELECTs, converting results to Lucene Documents, and index them with IndexWriter. Ther

Re: query behavior

2005-09-26 Thread Chris Hostetter
I *believe* that because of the ConjunctionScorer in 1.9, BooleanQueries consisting of all required terms are now optimized for situations like this, the Scorer for the common clause won't be asked to score things that the un-common clause has allready given a score of 0.0. : Date: Mon, 26 Sep 2

Re: May I use a mixture of indexing methods altogether?

2005-09-26 Thread Mag Gam
Otis: How do you do that? Got a quick and simple example? We have been looking for an example for the last 3-4 months, but no luck On 9/25/05, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > > > Is it possible to do that in a database instead of a flat text file? > > Huh? > You mean is it po

query behavior

2005-09-26 Thread Alberto Squassabia
Hi! I learnt from a mailing list archive that the following applies: - Tue, 06 Jan 2004 [...] I have a index with documents that have only 2 fields, the first (unique) is 'very unique', in that most document have at least somewhat varying terms, the second is a

Re: Can Lucene do this?

2005-09-26 Thread Mag Gam
Maik: Thanks for the reply. I was going to go that way, but it involves a lot of work, since my text file is about 3 meg of information. However, I am looking into integrating my data with Derby plus Lucene. TIA! On 9/26/05, Maik Schreiber <[EMAIL PROTECTED]> wrote: > > > Can I use lucene to

Re: Problem in .txt file indexing

2005-09-26 Thread Erik Hatcher
On Sep 26, 2005, at 2:30 PM, tirupathi reddy wrote: Hello, SO now How can I index the text files in Lucene? Didn't we go over this before?! :) Follow these steps _exactly_: 1) Download the source code from Lucene in Action from http:// www.lucenebook.com (you'll see a link at the top o

Re: Indexing .txt file containing english, german or french alphabet

2005-09-26 Thread Ian Soboroff
Otis Gospodnetic <[EMAIL PROTECTED]> writes: > For indexing text that has multiple languages I don't know what to > recommend. Well, I do - try the StandardAnalyzer and see if that > produces satisfactory results, but you'd really need a smart analyzer > that knows how to properly tokenize an

Re: Single Analyzer for multiple European languages

2005-09-26 Thread Andrzej Bialecki
Shashikant Kore wrote: Search: - Get the superset of stopwords by merging the stopwords from all the languages. This step doesn't make sense. Stopwords ARE language specific. A stopword in one language may be a valid content word in another language - e.g. English stopwords "is, by, far" mea

Re: Cannot sort results with multisearcher when mismatched field names

2005-09-26 Thread Chris Hostetter
I've never dealt with multisearcher's before, so i'm not sure what caveats there are when doing Sorts with them, but you should be able to make your own SortComparatorSource which knows about any special fields you have that might go by multiple names, and when it's requested to sort on one of tho

Re: Problem in .txt file indexing

2005-09-26 Thread tirupathi reddy
Hello, SO now How can I index the text files in Lucene? Thanx, MTREDDY Tirupati Reddy Manyam 24-06-08, Sundugaullee-24, 79110 Freiburg GERMANY. Phone: 00497618811257 cell : 004917624649007 __ Do You Yahoo!? Tired of spam? Yahoo! Mail

Re: Problem in .txt file indexing

2005-09-26 Thread Chris Hostetter
You seem to be using the example code from LIA chapter 7, however you seem to be confusing the "DocumentHandlerException" class mentioned there with org.ujac.print.DocumentHandlerException -- a completely unrelated class that just happens to have the same name (i'm guessing you found it doing a go

RE: Problem in .txt file indexing

2005-09-26 Thread M å n i s h
This package org.ujac.print.DocumentHandlerException is not part of Lucene, Check whether you are using some other third party class files in your program. -Original Message- From: tirupathi reddy [mailto:[EMAIL PROTECTED] Sent: Monday, September 26, 2005 11:20 PM To: java-user@lucene.

Single Analyzer for multiple European languages

2005-09-26 Thread Shashikant Kore
Hi, I plan to use lucene to index documents in multiple languages (ie. each document in more than one European language) as follows. Index: - Before indexing find the language of the document (using Nutch's Language Identifier) - Use the Analyzer for that language to index the document. Analyzer

Problem in .txt file indexing

2005-09-26 Thread tirupathi reddy
hello, I am using the following code to index text files. InputStream is = new FileInputStream(pdf); DefaultStyledDocument styledDoc = new DefaultStyledDocument(); try { new RTFEditorKit().read(is, styledDoc, 0); bodyText = styledDoc.getText(0, styledDoc.getLe

Re: Lucene trunk update question. WAS RE: search performance enhancement

2005-09-26 Thread Erik Hatcher
On Sep 26, 2005, at 3:10 AM, Paul Elschot wrote: I used my bug votes already. I hope more people will do that, hint: http://issues.apache.org/jira/secure/BrowseProject.jspa?id=12310110 Is there a way to view the open issues sorted by number of votes? There is the "Popular Issues" view: <

Exception in IndexWriter recreating

2005-09-26 Thread Alex Kiselevski
Hi, I have a strange exception when I'm trying to recreate an IndexWriter, that was previously defined. I did the following steps: 1. mWriter = new IndexWriter(indexPath, analyzer, true); 2. mWriter.addDocument(document); 3. mWriter.optimize(); 4. mWriter.c

RE: Problems in standard Analyzer

2005-09-26 Thread M å n i s h
Frank, Limo I am using just to see the index properties. And even if I search same value by my application's search component,I am not getting the results, I think I have to build the query using different analyzers , -Original Message- From: Kunemann Frank [mailto:[EMAIL PROTECTED]

RE: Problems in standard Analyzer

2005-09-26 Thread Kunemann Frank
The problem is that in limo you can only use standard analyzers for your queries. As you've already seen some of them will change the key value to something else or even remove them completely. So I don't think limo will be the right tool for you or at least you'll have to change it for your nee

Cannot sort results with multisearcher when mismatched field names

2005-09-26 Thread JMA
Greetings! I have a relatively simple problem: I want to sort a set of search results by a field, say "author". Fine for one index, or more than one if the field "author" is the same. However, say I want to use a multisearcher (2+ indices), but the second index uses field name "writer". If I set

Displaying Document [Highlighting Terms]

2005-09-26 Thread bib_lucene bib
Hi All I have indexed and displayed highlighted search results. [following lucene in action examples: Thanks authors]. Now I want to display the content of the file with highlighted terms. An idea I could comeup with is , clicking on search result I can open stream to a document, search the

Re: Problems in standard Analyzer

2005-09-26 Thread Anand Kishore
Try using a PerFieldAnalyzerWrapper through wich you can specify analyzers on a per field basis. This way you could skip analyzing this particular field while searching. On 9/26/05, M å n i s h <[EMAIL PROTECTED]> wrote: > > > Actually in Index I can see that MN12345 value very clearly that too in

RE: Problems in standard Analyzer

2005-09-26 Thread M å n i s h
Actually in Index I can see that MN12345 value very clearly that too in the same case ( I have Limo Lucene Index Monitor) but when I am trying to search the same value , I am not getting any results, I think the problem lies some where else.. -Original Message- From: Kunemann Frank

RE: Problems in standard Analyzer

2005-09-26 Thread Kunemann Frank
It should be possible to combine queries using different types of analyzers. The only problem I can see is if you're using one single line for the whole query. Frank -Original Message- From: "M å n i s h " [mailto:[EMAIL PROTECTED] Sent: Monday, September 26, 2005 9:05 AM To: java-user

Re: Lucene trunk update question. WAS RE: search performance enhancement

2005-09-26 Thread Paul Elschot
Otis, On Monday 26 September 2005 00:37, Otis Gospodnetic wrote: > As Erik Hatcher noted in another email (it might have been on the -dev > list), we'll go through JIRA before making the next release and try to > push the patches like this one into the core. Personally, it has been I used my bug

RE: Problems in standard Analyzer

2005-09-26 Thread M å n i s h
I thought of not using any Analyzer, but the problem is I got other queries that I am appending to this value with either OR or AND, so for that part of query I need Standard Analyzer , I think I should index that value like normal text, then may be it will work. -Original Message- From