We've used Hadoop MapReduce with Solr to parallelize indexing for a customer
and that brought down their multi-hour indexing process down to a couple of
minutes. There is/was also Lucene-level contrib in Hadoop that makes use of
MapReduce to parallelize indexing.
Otis
Sematext :: http://
Sorry - kind of my fault. When I fixed this to use maxDocCharsToAnalyze, I
didn't set a default other than 0 because I didn't really count on this being
used beyond how it is in the Highlighter - which always sets
maxDocCharsToAnalyze with it's default.
You've got to explicitly set it higher t
I tried something similar, and failed - I think the API is lacking
there? My only advice is to vote for this:
https://issues.apache.org/jira/browse/LUCENE-2878 which should provide
an alternative better API, but it's not near completion.
-Mike
On 7/6/2011 5:34 PM, Jahangir Anwari wrote:
I h
I have a CustomHighlighter that extends the SolrHighlighter and overrides
the doHighlighting() method. Then for each document I am trying to extract
the span terms so that later I can use it to get the span Positions. I tried
to get the weightedSpanTerms using WeightedSpanTermExtractor but was
unsu
I just profiled the application and tst.TernaryTreeNode takes 99.99..% of
the memory.
I'll test further tomorrow and report on mem usage for runnable smaller
indexes.
I will email you privately for sharing the index to work with.
BR,
Elmer
-Oorspronkelijk bericht-
From: Michael McC
Hmm... so I suspect the fst suggest module must first gather up all
titles, then sort them, in RAM, and then build the actual FST. Maybe
it's this gather + sort that's taking so much RAM?
1.3 M publications times 100 chars times 2 bytes/char = ~248 MB. So
that shouldn't be it...
Is this a an ac
You could try storing your autocomplete index in a RAMDirectory?
I forgot to mention. I tried this previously, but that also resulted in heap
space problems. That's why I was interested in using the new suggest classes
:)
BR,
Elmer
-Oorspronkelijk bericht-
From: Michael McCandless
Hi Mike,
That's what I thought when I started indexing it. To be clear, it happens on
build time.
I don't know if memory efficiency is better when building has finished.
The titles I index are titles from the dblp computer sience bibliography.
They can take up to... say 100 characters.
Examp
You could try storing your autocomplete index in a RAMDirectory?
But: I'm surprised you see the FST suggest impl using up so much RAM;
very low memory usage is one of the strengths of the FST approach.
Can you share the text (titles) you are feeding to the suggest module?
Mike McCandless
http://
Hi again.
I have created my own autocompleter based on the spellchecker. This
works well in a sense that it is able to create an auto completion index
from my 'publication' index. However, integrated in my web application,
each keypress asks autocompleter to search the index, which is stored on
di
Hello,
First off I am using the QueryParser with the standardanalyzer. It
seems that whenever I search for the # symbol, nothing is found. This
wouldn't be a problem but the documents I am searching have C# used
and needing to be searched for.
I have tried escaping the # symbol but when I d
Hello,
until now, we use a stupid %like% SQL query script to assign the
following terms for Id/Item mapping in different id-spaces:
john wayne == john wayne
wayne, john == john wayne
I can imagine that Lucene offers much more possibilities for this
assignment.
Maybe with Lucene is also pos
Hi all,
Luke 3.3.0 has been released and is available for download here:
http://code.google.com/p/luke/
Apart from the updated Lucene libraries there were no changes in
functionality.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __
[__ ||
Ant is the official Lucene/Solr build system. Snapshot and release artifacts
are produced with Ant.
While Maven is capable of producing artifacts, the artifacts produced in this
way may not be the same as the official Ant artifacts. For this reason: no,
the artifacts should not be built with
Thanks. It was what I expected, but it's nice to have it confirmed.
On Tue, Jul 5, 2011 at 9:39 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> This API doesn't exist today.
>
> Lucene has long needed for queries impls to do this, so that we can
> properly plan/optimize how the query
Hi I'm looking inside the jenkins maven repository. For example the package
in
https://builds.apache.org/job/Lucene-Solr-Maven-trunk/lastSuccessfulBuild/artifact/maven_artifacts/org/apache/lucene/lucene-misc/4.0-SNAPSHOT/lucene-misc-4.0-20110705.223250-1.jar
seems to be built with ant instea
On Tue, 2011-07-05 at 17:50 +0200, Hiller, Dean x66079 wrote:
> We are using a sort of nosql environment and deleting 200 gig on one machine
> from the database is fast, but then we go and delete 5 gigs of indexes that
> were created and it takes forever
8 million indexes is at a minimum 16
17 matches
Mail list logo