Thanks Jian
I need to retrive the original document sometimes. I did not quite understand
your second suggestion.
Can you please help me understand better, a pointer to some web resource will
also help.
jian chen <[EMAIL PROTECTED]> wrote:
Hi,
Depending on the operating system, there might be
Hi
Apologies.
With my Experience of Lucene since 2004,I can say that u need to update the
Index Once in a day (rather then doing it for every upload),But if u'r
requirement say's u have to make the Doc avaliable on the run then u may do
so.
with regards
Karthik
-Original Message
I am as interested in the answer to the first question as you, so we'll
have to wait on an answer from one of the senior guys, I imagine in a
perfect world both indecies should be the same if the same data is fed
in assuming no errors occurred during indexing.
As for the second question, if yo
Hi,
Depending on the operating system, there might be a hard limit on the
number of files in one directory (windoze versions). Even with
operating systems that don't have a hard limit, it is still better not
to put too many files in one directory (linux).
Typically, the file system won't be very
Hi All
In my webapp i have people uploading their documents. My server is
windows/tomcat. I am thinking there will be a limit on the no of files in a
directory. Typically apllication users will load 3-5 page word docs.
1. How does one design the system such that there will not be any problem
Thanks for the reply.
-Eshwari
--- Erik Hatcher <[EMAIL PROTECTED]> wrote:
>
> On Jun 29, 2005, at 1:28 PM, eshwari pss wrote:
> > Does Lucene support XML searching? - I mean not
> > treating the xml file as text file.
>
> The short answer is yes.
> http://www.lucenebook.com/search?query=xml
Chris Hostetter wrote:
I beleve what you want to do is write your own custom Scorer class, and
override lengthNorm.
I've never done this myself, but i'm basing this guess on previous
discussions i've seen regarding this method...
http://www.google.com/search?q=lengthNorm+site%3Amail-archive.com
I beleve what you want to do is write your own custom Scorer class, and
override lengthNorm.
I've never done this myself, but i'm basing this guess on previous
discussions i've seen regarding this method...
http://www.google.com/search?q=lengthNorm+site%3Amail-archive.com+lucene
: Date: 29 Jun
How is your crawler is done?
I saw SF.net searches several types of documents, like "People",
"Freshmeet.net", "Site Doc". Are they all from database?
A little bit marketing here:
I am working on an off-the-shelf product called DBSight. It's
basically Database+Lucene+Query Display. It can do most
Chris Conrad wrote:
I know I've been asked before for a description of how SourceForge.net
is using Lucene. I wrote a blog entry about it and thought people
might be interested in seeing at a high level how it was designed.
Take a look at http://blog.dev.sf.net. Any comments are welcome.
Hi Jian,
Thanks for the reply. The problem with that is it completely
ignores document length. A book that mentions "frog" 5 times in its 2,000
pages should be less relevant than a book that mentions "frog" 4 times in
its 4 pages.
I really want to lower the document length weight instead
of rem
We ran into some disk issues that have delayed my testing. We are not
sure, but it might also have been causing the problem that we saw when
running the large Lucene merges. I will send out another note once our
disk problems are fixed.
Lokesh
-Original Message-
From: Otis Gospodnetic [ma
I know I've been asked before for a description of how
SourceForge.net is using Lucene. I wrote a blog entry about it and
thought people might be interested in seeing at a high level how it
was designed. Take a look at http://blog.dev.sf.net. Any comments
are welcome.
--Chris Conrad
So
Hi,
I would use pure span or cover density based ranking algorithm which
do not take document length into consideration. (tweaking whatever
currently in the standard Lucene distribution?)
For example, searching for the keywords "beautiful house", span/cover
ranking will treat a long document and
Hi,
Short documents bubble to the top of the results because the field
length is short. Does anyone have a good strategy for working around this?
Will doing something like log(document length) flatten out my results while
still making them meaningful? I'm going to try some different approaches
I suspect the most performant is as follows (but could require bags of
RAM) :
Heres the pseudo code .
[on IndexReader open, initialize map]
int []luceneDocIdsByDbKey=new int [largestDbKey]; //could be large array!
for (int i=0;i;Should be super-quick but requires (int size* num db records) m
keep in mind, you can "store" the raw field for display purposes and
"index" many different token sequences that represent the same orriginal
data parsed in several ways -- all using the same field name.
: Date: Wed, 29 Jun 2005 13:33:42 -0400
: From: "Aigner, Thomas" <[EMAIL PROTECTED]>
: Repl
Hi Jian,
Thanks for your inputs. The (DB) datamodel is quite
complex with rooms and room units (I skipped it to
make the case easier to understand), so I guess the
easiest and actually best way to do it is with the
filter.
Mark: yes, there are a lot of text fields the user
should be able to searc
On Jun 29, 2005, at 1:28 PM, eshwari pss wrote:
Does Lucene support XML searching? - I mean not
treating the xml file as text file.
The short answer is yes. http://www.lucenebook.com/search?query=xml
The longer response is more involved - what are your needs?
I built a search engine for the
Hi, Naimdjon,
I have some suggestions as well along the lines of Mark Harwood.
As an example, suppose for each hotel room there is a description, and
you want the user to do free text search on the description field.
You could do the following:
1) store hotel room reservation info as rows in a
I am building the same index using different ways:
1) Whole index at once;
2) Step by step and then merging all parts together;
When I compare index files I see that they have different sizes, that's why I'm
not sure whether indexes has the same content or I've made any mistake in my
index build
Thanks for the advice. I have replaced punctuation before the index is
built and then queried on the same lack of punctuation. I had to create
a separate index for this as well so I have the original information,
but I think I will take your advice and build a custom token to filter
out the punct
Does Lucene support XML searching? - I mean not
treating the xml file as text file.
thanks,
Eshwari
Yahoo! Sports
Rekindle the Rivalries. Sign up for Fantasy Football
http://football.fantasysports.yahoo.com
---
On Jun 29, 2005, at 1:12 AM, Xing Li wrote:
1) Downloaded 1.4.3 src
2) ran ant... everything builds
3) $ cd builds
4) $ java -jar lucene-1.5-rc1-dev.jar
Failed to load Main-Class manifest attribute from
lucene-1.5-rc1-dev.jar
I haven't build anything java for almost 5 years so not sure wha
EVERSOFT is pleased to announce availability of the first tools from its
naisQuest product line:
http://www.naisquest.com
* naisQuest is a Java based (J2EE) search engine designed for web sites and
corporate networks. It collects and index full text and Meta data from the
most popular docume
I do a vaguely similar thing; I have to strip accents from
characters such as e-acute out of both my input data and my incoming
search queries to put them into a standard form. I do this with a
custom TokenFilter subclass. I have an analyzer that includes this
filter along with some of the s
On Jun 29, 2005, at 4:03 AM, Paul Libbrecht wrote:
Le 29 juin 05, à 00:57, Erik Hatcher a écrit :
Paul - if stemming is what you're looking for, then grab the
SnowballAnalyzer code from Subversion under contrib/snowball. Or
you could get a binary copy of the JAR from the source code
di
I second Mark's suggestion over the alternative I posted. My
alternative was merely to invert the field structure originally
described, but using a Filter for the volatile information is wiser.
Erik
On Jun 29, 2005, at 9:58 AM, mark harwood wrote:
Presumably there is also a free-text
On Jun 29, 2005, at 9:18 AM, Naimdjon Takhirov wrote:
Hi,
We are porting our search functionality over to lucene
in our hotel solution which is java based. Today
search is done directly against the database.
There is a date search, i.e tourist would like to
search for free rooms fromDate and to
Presumably there is also a free-text element to the
search or you wouldn't be using Lucene.
Multiple fields is not the way to go.
A single Lucene field could contain multiple terms (
the available dates) but I still don't think that's
the best solution.
The availability info is likely to be pretty
Hi,
We are porting our search functionality over to lucene
in our hotel solution which is java based. Today
search is done directly against the database.
There is a date search, i.e tourist would like to
search for free rooms fromDate and toDate.
The documents are added to the index pr hotel
room(p
I'm not sure how useful this reply is, but hey ;)
me too!
I do a vaguely similar thing; I have to strip accents from characters
such as e-acute out of both my input data and my incoming search queries
to put them into a standard form. I do this with a custom TokenFilter
subclass. I have a
http://lucene.apache.org/java/docs/api/org/apache/lucene/store/Lock.With.html#run()
--
Ian.
On 29/06/05, jian chen <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I am looking at and trying to understand more about Lucene's
> reader/writer synchronization. Does anyone know when the commit.lock
> is releas
Which main class would you expect to run ?
I don't think there's one.
Lucene is a library.
paul
PS: this has nothing MacOSX specific
Le 29 juin 05, à 10:12, Xing Li a écrit :
1) Downloaded 1.4.3 src
2) ran ant... everything builds
3) $ cd builds
4) $ java -jar lucene-1.5-rc1-dev.jar
1) Downloaded 1.4.3 src
2) ran ant... everything builds
3) $ cd builds
4) $ java -jar lucene-1.5-rc1-dev.jar
Failed to load Main-Class manifest attribute from
lucene-1.5-rc1-dev.jar
I haven't build anything java for almost 5 years so not sure what it
means. Did a good search online on the e
Le 29 juin 05, à 00:57, Erik Hatcher a écrit :
Paul - if stemming is what you're looking for, then grab the
SnowballAnalyzer code from Subversion under contrib/snowball. Or you
could get a binary copy of the JAR from the source code distribution
of Lucene in Action at http://www.lucenebook.co
36 matches
Mail list logo