Hello all,
We are upgrading from Lucene 1.4.3 to 1.9.1, and have many customers
with large existing index files. In our testing we have reused large
indexes created in 1.4.3 in 1.9.1 without incident. We have looked
through the changelog and the code and can't see any reason there should
be a
Chris Hostetter wrote:
: I added one record to the index and did flush(), optimize() and close() in
that order.
: I had one index file _twca.cfs. After the inserting the document and doing
optimization, I have two index files _twca.cfs and _twcf.cfs (both approx. same
size) and deletable f
: OK, that being the case, is there a prescribed method for dealing with this?
: Does anybody have a "best practice" for me?
It depends on how/why you see it as a problem.
If all you want to do is sort on the date - you have no problem, they will
sort correctly.
If you want to display the date
Hi,
I have an application. It has large number of records around (1.2 million)
with a possibility of doubling every year. The average records being added per
day is around 3000 distributed over the day. The inserted record has to be
searchable immediately once it is entered into the databa
: I added one record to the index and did flush(), optimize() and close() in
that order.
: I had one index file _twca.cfs. After the inserting the document and doing
optimization, I have two index files _twca.cfs and _twcf.cfs (both approx. same
size) and deletable file having entry for _twc
Hi,
I added one record to the index and did flush(), optimize() and close() in
that order.
I had one index file _twca.cfs. After the inserting the document and doing
optimization, I have two index files _twca.cfs and _twcf.cfs (both approx. same
size) and deletable file having entry for _twc
First off, when trying to make sense of socres you should allways use
either HitCollector or one of the TopDocs methods of the Searcher
interface -- otherwise the "normalize if greater then 1" logic of the Hits
class might confuse you.
Second: Searcher.explain(Query,int) is your friend ... it wi
in my solution, you can apply one doc for each mesh term, or apply different
keyword such as "mesh_1""mesh_10" for your top 10 terms...or u can group
your mesh terms as one string then add into a field, which requires a simple
string parser for the group string when you wanna read the terms...
Hello,
I ran into some very strange behavior by Lucene 1.9. Boost factor under 1.3
does not effect the result score! I wrote a simple test to isolate the
issue:
Writing test index
Creating 3 documents with same KEY and boosts of default, 1.1, 1.2, and 1.3
public static void writeTestI
Hi,
I have a design question. Here is what we try to do for indexing:
We designed an indexing tool to generate standard MeSH terms from medical
citations, and then use Lucene to save the terms and citations for future
search. The information we need to save are:
a) the exact mesh terms (top 10)
b
If you want to get "file.txt" out of "/documents/file.txt" simply cut of
everything before the last "/":
String path = doc.get("path");
String name = path != null ? path.substring(path.lastIndexOf("/") + 1) :
path;
Otherwise, if you want to store only the name in the index, you will have to d
Index the "filename" when you are indexing as you did the "path". You
can get it back with doc.get("filename");
suba suresh.
Mag Gam wrote:
Is it possible to get Document Name, instead of its entire path?
Currently, i have something like this:
out.println (doc.get ("path")); // Which gives
Thanks for everyone's help. I understand how it works now. I can get rid
of MultiFieldQueryParser in search.
thanks
suba suresh.
Erik Hatcher wrote:
Yeah, I used a cruder form by appending all the text together into a
single string with a space separator in that LIA example.
Given the posit
Its nice if someone shares design documents of Lucene with Me.
You could start with the javadocs here:
http://lucene.apache.org/java/docs/api/index.html
Click on the "Document" class to see some decription for Documents in
particular.
Or for a broader "get your feet wet" introduction,
It is up to you. What ever you put in the document during indexing
you'll get back. If you'll add a field of just the document name you can
retrieve that, or just parse the file name from the path.
Aviran
http://www.aviransplace.com
-Original Message-
From: Mag Gam [mailto:[EMAIL PROTEC
Is it possible to get Document Name, instead of its entire path?
Currently, i have something like this:
out.println (doc.get ("path")); // Which gives me /documents/file.txt
Is it possible to get "file.txt"
Thankyou! You are right. Seems like tomcat overwrites my path. I had to
manually move the .jar files into Tomcat's precence.
On 8/24/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
My hunch is you don't have the Lucene JAR in the classpath at runtime.
Erik
On Aug 24, 2006, at 7:58 AM,
OK, that being the case, is there a prescribed method for dealing with this?
Does anybody have a "best practice" for me?
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 24, 2006 2:21 AM
To: java-user@lucene.apache.org
Subject: Re: DateTools.set-
It is interesting to note that Lucene would also seem to suffer from
bugs when using spans if you only have a single document in the index.
At least with the NotSpanQuery, the spans could wrap around the document
from end to beginning. This would be unexpected but would also go away
if you add
My hunch is you don't have the Lucene JAR in the classpath at runtime.
Erik
On Aug 24, 2006, at 7:58 AM, Mag Gam wrote:
Hi All,
I keep getting this error in my tomcatlogs
Aug 24, 2006 7:44:09 AM org.apache.catalina.core.ApplicationContext
log
INFO: Marking servlet search as unav
Hi All,
I keep getting this error in my tomcatlogs
Aug 24, 2006 7:44:09 AM org.apache.catalina.core.ApplicationContext log
INFO: Marking servlet search as unavailable
Aug 24, 2006 7:44:09 AM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Allocate exception for servlet search
java.
Erik Hatcher wrote:
On Aug 24, 2006, at 3:29 AM, Michael Wechner wrote:
As an alternative I would rather suggest that one generates a
well-defined XML with JSP or a servlet and then applies
an XSLT. If somebody is afraid of performance issues then one might
want to consider generating the serv
Its nice if someone shares design documents of Lucene with Me.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Erik Hatcher wrote:
On Aug 24, 2006, at 3:29 AM, Michael Wechner wrote:
As an alternative I would rather suggest that one generates a well-
defined XML with JSP or a servlet and then applies
an XSLT. If somebody is afraid of performance issues then one might
want to consider generating the
Yeah, I used a cruder form by appending all the text together into a
single string with a space separator in that LIA example.
Given the position increment gap between instances of same-named
fields that is now part of Lucene, I recommend using multiple field
instances instead.
Er
On Aug 24, 2006, at 3:29 AM, Michael Wechner wrote:
As an alternative I would rather suggest that one generates a well-
defined XML with JSP or a servlet and then applies
an XSLT. If somebody is afraid of performance issues then one might
want to consider generating the servlet or jsp code
dy
Thanks a lot.
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: 23 August 2006 14:26
To: java-user@lucene.apache.org
Subject: Re: Change index structure
On Aug 23, 2006, at 6:22 AM, WATHELET Thomas wrote:
> If I want to add a new field for exemple into an existing i
I think I start to understand this :) .. Thanks guys.
~KEGan
On 8/24/06, Gopikrishnan Subramani <[EMAIL PROTECTED]> wrote:
Erik's has used a space as the field separator. May be you can use a
different field separator that your analyzer won't eat up, so that will
change the token position by
Mag Gam wrote:
Thanks!
So, when working with Tomcat, for a simple Index + Search, it is
recommend
to use JSP over servlets?
any advice?
well, the issue seems to me rather that the (X)HTML is hardcoded into
the JSP resp. Servlet which creates
a maintenance nightmare when one wants to cust
I think the confusion here is that when DateTools looks at a Date object
and a Resolution, it does it's calculations in GMT (so when you ask what
"day" it is at a particular moment, it tells you the current day in GMT,
when you ask which month, it tells you the month in GMT, etc...)
This may seem
Erik's has used a space as the field separator. May be you can use a
different field separator that your analyzer won't eat up, so that will
change the token position by 1.
Gopi
On 8/24/06, KEGan <[EMAIL PROTECTED]> wrote:
Erik,
What is generally the reason for indexing both individual fields
: What is generally the reason for indexing both individual fields, and the
: general-purpose "content" field ?
so you can explicitly query for "name:paris" or "city:paris" instead of
just "paris"
: name : John Smith
: food : subway sandwich
:
: So the general-purpose "content" would have the f
32 matches
Mail list logo