4 maj 2007 kl. 02.20 skrev Chris Lu:
I suppose if a document is indexed as English or French,
when users searching the document,
we need to parse the query as English or French also?
If you do some language specific token analysis such as stemming, yes.
Detecting the language on such small t
Oh, fix as in make constant, not fix as in broken ...
No, I don't know of any way to do this. Can your installer just
pack up everything in a directory?
Erick
On 5/3/07, Shaw, James <[EMAIL PROTECTED]> wrote:
I mean specifying the name of the .csf file, rather than letting Lucene
come up with
I mean specifying the name of the .csf file, rather than letting Lucene
come up with a name by itself.
I'm actually using Lucene.Net, and we pre-index during our build and
want to include the index in the installer, but the installer can only
reference named files, and it wouldn't work if the .csf
I suppose if a document is indexed as English or French,
when users searching the document,
we need to parse the query as English or French also?
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.db
Uh, what do you mean "fix"? You shouldn't have to do anything
with it at all. What behavior are you observing that you want
to change and why?
Erick
On 5/3/07, Shaw, James <[EMAIL PROTECTED]> wrote:
Does anyone know how to fix the .cfs file name in an index directory?
The deletable and segment
jim shirreffs wrote:
Hi, I'm a relative Lucene newbe and would appreciate some expert advice.
Sounds like you might want to start a new thread, otherwise people who
know the answer to your problem might not see your post.
Daniel
--
Daniel Noll
Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo N
Does anyone know how to fix the .cfs file name in an index directory?
The deletable and segments file names are always the same, but we have
observed that the .cfs file name changes each time you index a content
directory with some changes to the directory (some deleted files, added
files, etc). H
3 maj 2007 kl. 22.06 skrev Mordo, Aviran (EXP N-NANNATEK):
Anyone knows of a good language detection library that can detect what
language a document (text) is ?
I posted this some time back:
https://issues.apache.org/jira/browse/LUCENE-826
A bit of proof-of-concept:ish, but it does the job
Jason Pump wrote:
http://software.wise-guys.nl/libtextcat/
... which is what Nutch implements in its language-identifier plugin.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___
Coincidentally, I'm hacking at this very problem
First, are you sure you're free memory calculation is OK? Why not
just use freeMemory? Perhaps also calling the gc if the avail isn't
enough. Although I confess I don't know the innards of the
interplay of getting the various memory amounts
http://software.wise-guys.nl/libtextcat/
Otis Gospodnetic wrote:
LingPipe - commercial unless your data/product/service is free.
Nutch language id plugin.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
- Origin
LingPipe - commercial unless your data/product/service is free.
Nutch language id plugin.
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/ - Tag - Search - Share
- Original Message
From: "Mordo, Aviran (EXP N-NANNATEK)" <[EMAIL PROTEC
Anyone knows of a good language detection library that can detect what
language a document (text) is ?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Our application includes an indexing server that writes to multiple
indexes in parallel (each thread writes to a single index). In order
to avoid an OutOfMemoryError, each request to index a document is
checked to see if the JVM has enough memory available to index the
document.
I know that Index
It seems to me like a french stemmer is what you need instead of a fuzzy
query. What analyzer are you using for your documents and queries ?
-- Stefan
[EMAIL PROTECTED] wrote:
Hi!
I have a problem in dealing whith a fuzzy query in Lucene 2.1.0.
In order to explain my problem, I illustrate it
Hi, I'm a relative Lucene newbe and would appreciate some expert advice.
I would like to make fulltest searchable, files distributed on various
local hosts in the intranet. My startup plan is to index these files locally
and then merge all the little indexes into a master indexes on a search
I don't think you're doing yourself any good by explicitly using a
RAMdirectory in the first place. If you use a simple FSDirectory, a
number of documents are added in RAM before being flushed to the
FS.
Why do you add this complexity to your code with no proof that
it does you any good? Or do yo
I don't think (but don't know for sure) whether optimizing before the
end of the run buys you anything. And you're right, it takes a while.
I've assumed that it was best done at the end of the entire run,
but that's only an assumption.
Search the archives for the thread titled
MergeFactor and Ma
It would help a lot if you can either post a snippet of code showing
how you construct the fuzzy query or create a small, self-contained
program illustrating the problem.
With the latter approach, I've often found that in the middle of creating
the program, what I'm doing wrong surfaces ...
Best
I can just see Hatcher's reply now..
Would you be willing to submit the correct code ?
Erick
On 5/2/07, Winton Davies <[EMAIL PROTECTED]> wrote:
Hey guys,
Does someone who makes commits want to fix the EMAIL definition in
StandardTokenizer.jj
Its a not very well known exception to the n
See IndexWriter#addIndexesNoOptimize, released with 2.1. Note that it
doesn't optimize before or after, so if you want an optimize at the end,
you need to ask for it manually.
-Original Message-
From: Chandan Tamrakar [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 03, 2007 12:46 AM
To: jav
What if we are using addindexes(Ram Directory) method ? it calls optimize
function inside the function itself ?
Any solution to this ?
-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 03, 2007 4:03 PM
To: java-user@lucene.apache.org
Subject: Re: MergeFac
Ok. but then you would not optimize at all? Not even in the end of the
indexing run?
On Thu, 03 May 2007 12:17:40 +0200, Mark Miller <[EMAIL PROTECTED]>
wrote:
I think it is worth your time to do some benchmarking. I think
mergeFactor is not very helpful in the end...if you set it high, y
I think it is worth your time to do some benchmarking. I think
mergeFactor is not very helpful in the end...if you set it high, you'll
index faster but then your searches will be slower prompting you to
optimize...after which you'll find that you paid all your gains back.
Test things out for yo
Hello everyone!
I'm wondering if any of you have any helpful advice to what MergeFactor i
should use...
The indexing process is handling a large amount of documents and i would
like to index as fast as possible.
Initial thought was to increase the mergeFactor to make the indexer work
more in
Hi!
I have a problem in dealing whith a fuzzy query in Lucene 2.1.0.
In order to explain my problem, I illustrate it by a simple example:
I would like to recover files including the set of strings
"société américaine" and "sociétés américaines"
from a fuzzy query relating the string "société
I found that IndexWriter.addIndexes(Directory[]) always calls optimize
method twice
I am indexing a documents in batches , i.e I call this method when X no. of
documents are buffered in RAM
Using RAMDirectory . So as the index size grows , optimize method will only
increase by indexing time
C
27 matches
Mail list logo