Re: No.of Files in Directory

2005-06-29 Thread bib_lucene bib
Thanks Jian I need to retrive the original document sometimes. I did not quite understand your second suggestion. Can you please help me understand better, a pointer to some web resource will also help. jian chen <[EMAIL PROTECTED]> wrote: Hi, Depending on the operating system, there might be

RE: No.of Files in Directory

2005-06-29 Thread Karthik N S
Hi Apologies. With my Experience of Lucene since 2004,I can say that u need to update the Index Once in a day (rather then doing it for every upload),But if u'r requirement say's u have to make the Doc avaliable on the run then u may do so. with regards Karthik -Original Message

Re: Index comparison

2005-06-29 Thread Nader Henein
I am as interested in the answer to the first question as you, so we'll have to wait on an answer from one of the senior guys, I imagine in a perfect world both indecies should be the same if the same data is fed in assuming no errors occurred during indexing. As for the second question, if yo

Re: No.of Files in Directory

2005-06-29 Thread jian chen
Hi, Depending on the operating system, there might be a hard limit on the number of files in one directory (windoze versions). Even with operating systems that don't have a hard limit, it is still better not to put too many files in one directory (linux). Typically, the file system won't be very

No.of Files in Directory

2005-06-29 Thread bib_lucene bib
Hi All In my webapp i have people uploading their documents. My server is windows/tomcat. I am thinking there will be a limit on the no of files in a directory. Typically apllication users will load 3-5 page word docs. 1. How does one design the system such that there will not be any problem

Re: lucene query

2005-06-29 Thread eshwari pss
Thanks for the reply. -Eshwari --- Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On Jun 29, 2005, at 1:28 PM, eshwari pss wrote: > > Does Lucene support XML searching? - I mean not > > treating the xml file as text file. > > The short answer is yes. > http://www.lucenebook.com/search?query=xml

Re: Strategy for making short documents not bubble to the top?

2005-06-29 Thread Andrzej Bialecki
Chris Hostetter wrote: I beleve what you want to do is write your own custom Scorer class, and override lengthNorm. I've never done this myself, but i'm basing this guess on previous discussions i've seen regarding this method... http://www.google.com/search?q=lengthNorm+site%3Amail-archive.com

Re: Strategy for making short documents not bubble to the top?

2005-06-29 Thread Chris Hostetter
I beleve what you want to do is write your own custom Scorer class, and override lengthNorm. I've never done this myself, but i'm basing this guess on previous discussions i've seen regarding this method... http://www.google.com/search?q=lengthNorm+site%3Amail-archive.com+lucene : Date: 29 Jun

Re: SF.net search system

2005-06-29 Thread Chris Lu
How is your crawler is done? I saw SF.net searches several types of documents, like "People", "Freshmeet.net", "Site Doc". Are they all from database? A little bit marketing here: I am working on an off-the-shelf product called DBSight. It's basically Database+Lucene+Query Display. It can do most

Re: SF.net search system

2005-06-29 Thread David Spencer
Chris Conrad wrote: I know I've been asked before for a description of how SourceForge.net is using Lucene. I wrote a blog entry about it and thought people might be interested in seeing at a high level how it was designed. Take a look at http://blog.dev.sf.net. Any comments are welcome.

Re: Strategy for making short documents not bubble to the top?

2005-06-29 Thread yahootintin . 11533894
Hi Jian, Thanks for the reply. The problem with that is it completely ignores document length. A book that mentions "frog" 5 times in its 2,000 pages should be less relevant than a book that mentions "frog" 4 times in its 4 pages. I really want to lower the document length weight instead of rem

RE: issues building a large index

2005-06-29 Thread Lokesh Bajaj
We ran into some disk issues that have delayed my testing. We are not sure, but it might also have been causing the problem that we saw when running the large Lucene merges. I will send out another note once our disk problems are fixed. Lokesh -Original Message- From: Otis Gospodnetic [ma

SF.net search system

2005-06-29 Thread Chris Conrad
I know I've been asked before for a description of how SourceForge.net is using Lucene. I wrote a blog entry about it and thought people might be interested in seeing at a high level how it was designed. Take a look at http://blog.dev.sf.net. Any comments are welcome. --Chris Conrad So

Re: Strategy for making short documents not bubble to the top?

2005-06-29 Thread jian chen
Hi, I would use pure span or cover density based ranking algorithm which do not take document length into consideration. (tweaking whatever currently in the standard Lucene distribution?) For example, searching for the keywords "beautiful house", span/cover ranking will treat a long document and

Strategy for making short documents not bubble to the top?

2005-06-29 Thread yahootintin . 11533894
Hi, Short documents bubble to the top of the results because the field length is short. Does anyone have a good strategy for working around this? Will doing something like log(document length) flatten out my results while still making them meaningful? I'm going to try some different approaches

Re: Vedr. Re: Design question [too many fields?]

2005-06-29 Thread markharw00d
I suspect the most performant is as follows (but could require bags of RAM) : Heres the pseudo code . [on IndexReader open, initialize map] int []luceneDocIdsByDbKey=new int [largestDbKey]; //could be large array! for (int i=0;i;Should be super-quick but requires (int size* num db records) m

RE: Indexing puncutation

2005-06-29 Thread Chris Hostetter
keep in mind, you can "store" the raw field for display purposes and "index" many different token sequences that represent the same orriginal data parsed in several ways -- all using the same field name. : Date: Wed, 29 Jun 2005 13:33:42 -0400 : From: "Aigner, Thomas" <[EMAIL PROTECTED]> : Repl

Vedr. Re: Design question [too many fields?]

2005-06-29 Thread Naimdjon Takhirov
Hi Jian, Thanks for your inputs. The (DB) datamodel is quite complex with rooms and room units (I skipped it to make the case easier to understand), so I guess the easiest and actually best way to do it is with the filter. Mark: yes, there are a lot of text fields the user should be able to searc

Re: lucene query

2005-06-29 Thread Erik Hatcher
On Jun 29, 2005, at 1:28 PM, eshwari pss wrote: Does Lucene support XML searching? - I mean not treating the xml file as text file. The short answer is yes. http://www.lucenebook.com/search?query=xml The longer response is more involved - what are your needs? I built a search engine for the

Re: Design question [too many fields?]

2005-06-29 Thread jian chen
Hi, Naimdjon, I have some suggestions as well along the lines of Mark Harwood. As an example, suppose for each hotel room there is a description, and you want the user to do free text search on the description field. You could do the following: 1) store hotel room reservation info as rows in a

Index comparison

2005-06-29 Thread Sergeev Alexey
I am building the same index using different ways: 1) Whole index at once; 2) Step by step and then merging all parts together; When I compare index files I see that they have different sizes, that's why I'm not sure whether indexes has the same content or I've made any mistake in my index build

RE: Indexing puncutation

2005-06-29 Thread Aigner, Thomas
Thanks for the advice. I have replaced punctuation before the index is built and then queried on the same lack of punctuation. I had to create a separate index for this as well so I have the original information, but I think I will take your advice and build a custom token to filter out the punct

lucene query

2005-06-29 Thread eshwari pss
Does Lucene support XML searching? - I mean not treating the xml file as text file. thanks, Eshwari Yahoo! Sports Rekindle the Rivalries. Sign up for Fantasy Football http://football.fantasysports.yahoo.com ---

Re: newbie question on Mac OS X

2005-06-29 Thread Peter A. Friend
On Jun 29, 2005, at 1:12 AM, Xing Li wrote: 1) Downloaded 1.4.3 src 2) ran ant... everything builds 3) $ cd builds 4) $ java -jar lucene-1.5-rc1-dev.jar Failed to load Main-Class manifest attribute from lucene-1.5-rc1-dev.jar I haven't build anything java for almost 5 years so not sure wha

[ANN] naisQuest 1.0 search engine and fireQuest extension for Firefox released

2005-06-29 Thread Radomir Mladenovic
EVERSOFT is pleased to announce availability of the first tools from its naisQuest product line: http://www.naisquest.com * naisQuest is a Java based (J2EE) search engine designed for web sites and corporate networks. It collects and index full text and Meta data from the most popular docume

Re: Indexing puncutation

2005-06-29 Thread Ken Krugler
I do a vaguely similar thing; I have to strip accents from characters such as e-acute out of both my input data and my incoming search queries to put them into a standard form. I do this with a custom TokenFilter subclass. I have an analyzer that includes this filter along with some of the s

Re: no EnglishAnalyzer ?

2005-06-29 Thread Erik Hatcher
On Jun 29, 2005, at 4:03 AM, Paul Libbrecht wrote: Le 29 juin 05, à 00:57, Erik Hatcher a écrit : Paul - if stemming is what you're looking for, then grab the SnowballAnalyzer code from Subversion under contrib/snowball. Or you could get a binary copy of the JAR from the source code di

Re: Design question [too many fields?]

2005-06-29 Thread Erik Hatcher
I second Mark's suggestion over the alternative I posted. My alternative was merely to invert the field structure originally described, but using a Filter for the volatile information is wiser. Erik On Jun 29, 2005, at 9:58 AM, mark harwood wrote: Presumably there is also a free-text

Re: Design question [too many fields?]

2005-06-29 Thread Erik Hatcher
On Jun 29, 2005, at 9:18 AM, Naimdjon Takhirov wrote: Hi, We are porting our search functionality over to lucene in our hotel solution which is java based. Today search is done directly against the database. There is a date search, i.e tourist would like to search for free rooms fromDate and to

Re: Design question [too many fields?]

2005-06-29 Thread mark harwood
Presumably there is also a free-text element to the search or you wouldn't be using Lucene. Multiple fields is not the way to go. A single Lucene field could contain multiple terms ( the available dates) but I still don't think that's the best solution. The availability info is likely to be pretty

Design question [too many fields?]

2005-06-29 Thread Naimdjon Takhirov
Hi, We are porting our search functionality over to lucene in our hotel solution which is java based. Today search is done directly against the database. There is a date search, i.e tourist would like to search for free rooms fromDate and toDate. The documents are added to the index pr hotel room(p

Re: Indexing puncutation

2005-06-29 Thread Peter Pimley
I'm not sure how useful this reply is, but hey ;) me too! I do a vaguely similar thing; I have to strip accents from characters such as e-acute out of both my input data and my incoming search queries to put them into a standard form. I do this with a custom TokenFilter subclass. I have a

Re: question regarding the "commit.lock"

2005-06-29 Thread Ian Lea
http://lucene.apache.org/java/docs/api/org/apache/lucene/store/Lock.With.html#run() -- Ian. On 29/06/05, jian chen <[EMAIL PROTECTED]> wrote: > Hi, > > I am looking at and trying to understand more about Lucene's > reader/writer synchronization. Does anyone know when the commit.lock > is releas

Re: newbie question on Mac OS X

2005-06-29 Thread Paul Libbrecht
Which main class would you expect to run ? I don't think there's one. Lucene is a library. paul PS: this has nothing MacOSX specific Le 29 juin 05, à 10:12, Xing Li a écrit : 1) Downloaded 1.4.3 src 2) ran ant... everything builds 3) $ cd builds 4) $ java -jar lucene-1.5-rc1-dev.jar

newbie question on Mac OS X

2005-06-29 Thread Xing Li
1) Downloaded 1.4.3 src 2) ran ant... everything builds 3) $ cd builds 4) $ java -jar lucene-1.5-rc1-dev.jar Failed to load Main-Class manifest attribute from lucene-1.5-rc1-dev.jar I haven't build anything java for almost 5 years so not sure what it means. Did a good search online on the e

Re: no EnglishAnalyzer ?

2005-06-29 Thread Paul Libbrecht
Le 29 juin 05, à 00:57, Erik Hatcher a écrit : Paul - if stemming is what you're looking for, then grab the SnowballAnalyzer code from Subversion under contrib/snowball. Or you could get a binary copy of the JAR from the source code distribution of Lucene in Action at http://www.lucenebook.co