Re: I need 100 most frequently used words in different languages.

2005-05-11 Thread Ahmet Aksoy
Hi David, Thanks for your suggestion. I'll give a try. David Spencer wrote: You could try downloading a copy of the wikipedia and processing the entries yourself. I don't know how well represented other languages are but there's lot of English. Ahmet Aksoy wrote: Hi, I have a project which will

Re: I need 100 most frequently used words in different languages.

2005-05-11 Thread David Spencer
You could try downloading a copy of the wikipedia and processing the entries yourself. I don't know how well represented other languages are but there's lot of English. Ahmet Aksoy wrote: Hi, I have a project which will be used in order to supply automatic dictionary helps in different language

RE: problem while merging two indexes

2005-05-11 Thread Omar Didi
Thanks otis, I copied the index and I am playing around with the copy. I first had to change the code to force the unlock of the directory. and from what you just said all the new segments that are in my directory the index doesn't know about them so deleting them shouldn't hurt. -Origina

Re: problem while merging two indexes

2005-05-11 Thread Otis Gospodnetic
You should be able to re-try the merge (from the beginning - there is no way to restart it at any point other than the beginning). The merge and the new index is "finalized" at the very end of the merge, so if it failed before that, your Lucene index (the segments file) still doesn't know about th

Re: end of line in queries

2005-05-11 Thread Chris Hostetter
it is as long as you use an Analyzer (when indexing, and when parsing your query strings) that doesn't strip/convert whatever characters you consider an "end of line" (newline? linefeed?) durring tokenization. : Date: Wed, 11 May 2005 12:41:52 -0400 : From: "Govoni, Darren" <[EMAIL PROTECTED]> :

problem while merging two indexes

2005-05-11 Thread Omar Didi
hey guys, my application died while I was merging two indexes. acoording to my undestanding, if I just delete the new files that have been created while I started merging, the index won't be affected. is this true?. what will happen if i just restart the merging from where the application died?

I need 100 most frequently used words in different languages.

2005-05-11 Thread Ahmet Aksoy
Hi, I have a project which will be used in order to supply automatic dictionary helps in different languages. I'm using Lucene for indexing, and searching the words in it. It is an open source project in java at address http://belletmen.dev.java.net Now, I will prepare a function to find the natu

Zilverline Search Engine version 1.3.0 released

2005-05-11 Thread Zilverline info
All, I've just released Zilverline version 1.3.0. This version has a webservice for indexing, and is localized for the chinese language. This version is fully webbased, all settings, collections, preferences can be set via the web interface. You don't need to edit any config files anymore. Also I'm

RE: Strange results using QueryParser (?)

2005-05-11 Thread Chris Hostetter
in your query parser, you'll need to use an Analyzer that knows that "documenttype" should not be tokenized, and the raw user string entered by the user should be treated as the query Term value. you can make you own analyzer that subclass StandardAnalyzer and only does the special behavior for t

Re: categorized search

2005-05-11 Thread Chris Hostetter
well ... once you have the list of all "category" names that are in docs which match your orriginal query, you can either redo the orriginal query with "and category:" to get the counts, or you can pre-compute (and save) a BitSet for each category in your index (esay to build using a HitCollec

Re: MultiFieldQueryParser Problems about how to give the fields weight

2005-05-11 Thread Otis Gospodnetic
If you think content field is more important, you could boost it at indexing time. If you want to boost at search time, and you are using QueryParser, you could just use the term^float syntax. I think what you have down there is ok, too, but I suppose you'd need an if/else so you boost only the c

Re: AW: only getting Hits with score >= threshold

2005-05-11 Thread Otis Gospodnetic
In that case just look at the first N hits and don't even mention the rest. Otis --- Kai Gülzau <[EMAIL PROTECTED]> wrote: > >Note that it may not make sense filtering by an arbitrary score > >(normalized or not). > > I don't like the gooogle effect > with an endless amo

RE: Real time indexing with RAMDirectory

2005-05-11 Thread Otis Gospodnetic
What happens if you swap these 2 lines? System.out.println("Docs number : " + ir.numDocs()); ir.close(); If I were you, I'd try using minMergeDocs instead of RAMDirectory. It makes things much simpler. You shouldn't need to optimize the index. Otis --- Rifflar

Re: a few basic questions

2005-05-11 Thread Otis Gospodnetic
Hello, It sounds like you missed the Index Format page: http://lucene.apache.org/java/docs/fileformats.html That's the best index format documentation currently available. Otis --- Sujatha Das <[EMAIL PROTECTED]> wrote: > > Hi, > I couldn't find documentation on these issues, > so a url as

RE: Getting subpart of Lucene Query

2005-05-11 Thread Yagnesh Shah
Hi! Seema, Change your document.java so that content field is added for example: doc.add(Field.Text("contents", "some dummy text")); -Original Message- From: Seema Jain [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 11, 2005 6:20 AM To: java-user@lucene.apache.org Subject: Getting subpar

RE: indexing relational table(s)

2005-05-11 Thread Govoni, Darren
You can also leverage the 'fields' capability in lucene and perhaps match them against columns to do field-based searching. -Original Message- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Wed 5/11/2005 12:50 PM To: java-user@lucene.apache.org Subject: Re: indexing relational ta

Re: indexing relational table(s)

2005-05-11 Thread Andrzej Bialecki
Dick Hollenbeck wrote: As sources of indexable text we always see HTML, XML, PDF, etc. but I have not seen much mention of relational tables as a source. Anybody know why? I think no specific reason - Lucene is able to index just pure text, anything else must go through format converters first

end of line in queries

2005-05-11 Thread Govoni, Darren
Hi, I'm trying to perform a query and ened to specify a string pattern occurring at the end of a line. Is this possible? Thanks. Darren

indexing relational table(s)

2005-05-11 Thread Dick Hollenbeck
As sources of indexable text we always see HTML, XML, PDF, etc. but I have not seen much mention of relational tables as a source. Anybody know why? We have a database with 60,000 records in 6 tables and aproximately 15 *text* fields per table. Can we use lucene to index this with JDBC being

MultiFieldQueryParser Problems about how to give the fields weight

2005-05-11 Thread luqun lou
Now Suppose,There are two fields,"content","summary",but i think the query in content field may have highter weight than the summary field. how can i do it? I overload the parse function,and add weights which store every fields weights. public static Query parse(String query,String[] fie

Re: sanity check - large, long running index updates and concurrent read-only service

2005-05-11 Thread Yonik Seeley
When created, an IndexReader opens all the segment files and hangs onto them. Any updates to the index through an IndexWriter (including commit and optimize) will not affect already open IndexReaders. -Yonik On 5/11/05, Naomi Dushay <[EMAIL PROTECTED]> wrote: > It's my impression that with optimi

RE: sanity check - large, long running index updates and concurrent read-only service

2005-05-11 Thread Naomi Dushay
It's my impression that with optimize running so long, there will be a significant period of time (many minutes) when the old IndexReader will not be able to find the segment/documents it needs. Am I wrong about that? - Naomi > Could you explain why you need to copy the index? It doesn't seem

RE: Strange results using QueryParser (?)

2005-05-11 Thread Lilja, Bjorn
Hi, Daniel's suggestions was quite correct. Is the "/" suposed to be turned into a whitespace? In that case, how do I stop it? I do wish to search for the entire exact word "Blankett/Mall". Regards, Björn _ Björn Lilja | Technology S

Getting subpart of Lucene Query

2005-05-11 Thread Seema Jain
Hi , I am using Lucene API for Text indexing , searching and highlighting .I am using Lucene SANDBOX API for highlighting of keywords . My requirement is to get the subpart of a lucene query . Lucene query , which is made up of Field-value pair. How can i get the value of a particular field ?

Re: proximity search in lucene (fwd)

2005-05-11 Thread Sujatha Das
Consider a situation in which i have indexed the terms under two different fields (say FIELD_TEXT and FIELD_SYNONYM). What if I wanted to support queries like "jaguar NEAR london", when i have indexed a document with "panthers in zoos around London". So given that Lucene doesn't support cross-fie

Re: proximity search in lucene (fwd)

2005-05-11 Thread Sujatha Das
-- Forwarded message -- Date: Fri, 1 Apr 2005 15:34:10 -0500 From: Erik Hatcher <[EMAIL PROTECTED]> Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: proximity search in lucene On Apr 1, 2005, at 2:29 PM, Sujatha Das wrote: Hi, Does Lucene support "

a few basic questions

2005-05-11 Thread Sujatha Das
Hi, I couldn't find documentation on these issues, so a url as response should be just fine. The inverted index must look like FIELD-1 term -> (doc,offset)pairs Is this correct? Say I am trying to index the documents in a corpus under two different fields. For instance, I want to store with every w

AW: only getting Hits with score >= threshold

2005-05-11 Thread Kai Gülzau
>Note that it may not make sense filtering by an arbitrary score >(normalized or not). I don't like the gooogle effect with an endless amount of paging links. ;) The user should get only the top percentage of docs/products he can handle reasonable. Regards, Kai Gü