Excellent!
Exactly what I was looking for!
Thanks Grant!
-John
On Thu, Mar 13, 2008 at 5:39 PM, Grant Ingersoll <[EMAIL PROTECTED]>
wrote:
> There is an addDocument method that takes an Analyzer and overrides
> the one used at construction of the IndexWriter. See
>
> http://lucene.apache.org/j
Hi Patrick,
I noticed that we do not package the *.pom.template files in the source
release files. That's why it is not possible to build the maven
artifacts using official releases.
I'll open a JIRA issue and make sure that we will ship 2.3.2 with the
template files. In the meantime, you ca
Hi all,
I guess this question is a bit off the track. Are there any language
identification modules inside Lucene ??? If not can somebody please suggest
me a good one.
Thank You.
Dear:
If possible to list all term scores inside some document by some
simple method? now i just use each term as the query to search the
whole index to get the score. seems very cumbersome. is there any
simple approach?
Cheers!
weixiong
Hi,
I've looked around (mailing lists, jira) and I can't seem to find
information about how to generate maven artifacts, especially for
contrib.
I mean, I can get lucene from the maven repo, and I know I have to
build the contrib for myself.
But I kind of hoped I would be able to deploy contrib l
There is an addDocument method that takes an Analyzer and overrides
the one used at construction of the IndexWriter. See
http://lucene.apache.org/java/2_3_1/api/core/org/apache/lucene/index/IndexWriter.html#addDocument(org.apache.lucene.document.Document,%20org.apache.lucene.analysis.Analyzer)
Daniel Noll wrote:
For interest's sake I also timed fetching the document with no FieldSelector,
that takes around 410ms for the same documents. So there is still a big
benefit in using the field selector, it just isn't anywhere near enough to
get it close to the time it takes to retrieve th
On Thu, Mar 13, 2008 at 9:30 PM, Doron Cohen <[EMAIL PROTECTED]> wrote:
> Hi Daniel, LUCENE-1228 fixes a problem in IndexWriter.commit().
> I suspect this can be related to the problem you see though I am not sure.
> Could you try with the patch there?
> Thanks,
> Doron
Daniel, I was wrong about
Hi Grant:
For our corpus, we don't rely on idf in scoring calculation that much,
so I don't see that being a problem that much.
About performance, instantiating 1 indexWriter for a batch of say 1000
docs, e.g. iterate over 1000 docs and do addDocument; comparing with
instantiating and clo
Hi Daniel, LUCENE-1228 fixes a problem in IndexWriter.commit().
I suspect this can be related to the problem you see though I am not sure.
Could you try with the patch there?
Thanks,
Doron
On Thu, Mar 13, 2008 at 10:46 AM, Michael McCandless <
[EMAIL PROTECTED]> wrote:
>
> Daniel Noll wrote:
>
>
On Mar 13, 2008, at 11:03 AM, John Wang wrote:
Yes, but usually it's a good idea to add documents in batch and not
having
to reinstantiate the writer for every document and then closing it.
It would be nice if one can specify to the writer which analyzer to
use.
PerfieldAnalyzer wouldn't
On Mar 13, 2008, at 11:03 AM, John Wang wrote:
Yes, but usually it's a good idea to add documents in batch and not
having
to reinstantiate the writer for every document and then closing it.
Why does what I suggested require instantiating a new writer for every
document? It uses the anal
Yes, but usually it's a good idea to add documents in batch and not having
to reinstantiate the writer for every document and then closing it.
It would be nice if one can specify to the writer which analyzer to use.
PerfieldAnalyzer wouldn't work because different analyzers may apply on the
same
Well ... yes and no?
Yes, the Log*MergePolicy will still at certain times merge the index
all the way down to one segment. If mergeFactor is 10 then this will
happen every "power of 10" flushed segments. Ie, after 10 flushes a
merge will merge them down to 1 segment, then after 100 flush
Thanks a lot Mike...one more question:
I remember reading that a regular addDocument call could basically
trigger an optimize on a given call. Is this true? Maybe not true anymore?
It doesnt sound right to me, but I do remember reading about it. This
was pre background merging when it was men
Hello
I index once every 24h. If a single search takes place between those
24hours, the next indexing will generate a new cfs file, because the old
one cannot be deleted.
Yes, I've read in the API that it's best not to open and close an
IndexReader for every search, but right now I'm not con
My unique is more like synonym. For instance: Brain cancer, Cancer of
the brain, Brain neoplasm, are the same, so i need to tokenize the title
remove the stop words etc.
I have a problem with the indexing... with a new title first i have to
search in the index, if the title is not found write
Yes this should reduce transient (while merging) disk usage.
However, optimize disregards this parameter, so it will still use the
same disk space. However, if you call optimize(N) then that should
use less space since it does not merge all the way down to 1 segment.
Note that the limit
If I use LogByteSizeMergePolicy#setMaxMergeMB, can I clamp down on the
space needed for optimize/merge? My thought is, if a segment is maxed
out, it will never need to be copied for a merge right? So you could
significantly reduce merge/optimize space requirments (now at like 2x-4x
if readers c
>>Upping the amount of RAM does not help us when the
index is replaced before we pass the 50.000 queries.
have you seen https://issues..apache.org/jira/browse/LUCENE-1035 , It would be
interesting to see if this one changes HD numbers . You have plenty of free
memory in this setup...
On Thu, 2008-03-13 at 08:37 -0400, Grant Ingersoll wrote:
> Is this corpus publicly available? If so, please share. I'm always
> on the hunt for free data!
I'm sorry. It's the bibliographic records from the State and University
Library of Denmark and we're not allowed to share them.
Slight aside below?
On Mar 13, 2008, at 7:58 AM, Srikant Jakilinki wrote:
Remember, this is all searches with an optimized index. This is on
the
corpus from the Danish State and University Library and should be
seen
as nothing else than inspiration.
Is this corpus publicly available? If
Hi Toke,
Thanks for the write-up. Speaking for the community, the graphs (as
earlier) would be great.
There is no benchmarks page on the Wiki. There is one on the main site
to which you can add your stuff -
http://lucene.apache.org/java/2_1_0/benchmarks.html
Maybe one should create one on th
Not sure why you can't close, but it's a bit suspicious that you are
opening the IndexReader every time you do a search. Can you explain a
little more about your process? When are you indexing, how often, etc.?
-Grant
On Mar 12, 2008, at 11:50 AM, Ioannis Cherouvim wrote:
Hello
I can in
What's in "field"? What are your docs? More info is needed to help...
-Grant
On Mar 13, 2008, at 6:50 AM, sandyg wrote:
Hi,
Thnxs for spending time for the problem.
When sorting the results of lucene search it takes more time and not
looks
not that much usefull can any one help
Below i
On IndexWriter, you can pass in the Analyzer when you add a Document,
thus your application can identify the language, choose the analyzer
for the given doc, and then add the document
See
public void addDocument(Document doc, Analyzer analyzer)
On Mar 12, 2008, at 8:40 PM, John Wang wrote:
Time for another dose of inspiration for investigating Solid State
Drives. And no, I don't get percentages from the chip manufacturers :-)
This time I'll argue that there's little gain in using a RAMDirectory
over SSDs, when performing searches. At least for our setting.
We've taken our producti
Hi,
Thnxs for spending time for the problem.
When sorting the results of lucene search it takes more time and not looks
not that much usefull can any one help
Below is my code..
sort = new Sort(new SortField(field));
hits = searcher.search(query,sort);
Once
Hi All,
How many records needed minimum to create a index store.when i try
to create a index store with 5 records ,it creates segments file only.
--
View this message in context:
http://www.nabble.com/Minimum-records-to-create-IndexStore-tp16024349p16024349.html
Sent from the Lucene - Ja
Hi list,
I'm new in Lucene and I'm trying to index a set of XML documents
(document-centric) with the same structure. All this documents have a
header, a front, and a body (where there's a lot of text).
The problem is that in the header I have two fields author and title, but
one document can hav
Daniel Noll wrote:
On Wednesday 12 March 2008 19:36:57 Michael McCandless wrote:
OK, I think very likely this is the issue: when IndexWriter hits an
exception while processing a document, the portion of the document
already indexed is left in the index, and then its docID is marked
for deletio
31 matches
Mail list logo