Adrein,
I have created an index of size 370M of 1 million docs of 40 fields of 40
chars and did the profiling.
I see that the indexing and in particular the addDocument &
ConcurrentMergeScheduler in 4.1 takes double the time compared to 3.0.2.
Looks like CompressionStoredFieldsFormat is of little
Hi Nicola,
There might be a way to do what you want, with some coding on your part. If
you're interested in counting the top-10 of the "Brand" facet, but also
return the count of "Brand/X", even if it's not in the top-10, then what
you should do is write code similar to this:
FacetArrays facetArr
Thanks, we will try the class path trickery.
How do we avoid similar situations in the future? Is Pulsing41PostingsFormat
going to be maintained in future versions of Lucene? What are the
safe PostingFormat/Codecs
to use? Every PostingFormat/Codec is @deprecated or @experimental.
Sean
On Tue
Another option would be, using 4.0, use addIndexes(IndexReader[]) into
a new index, to convert your entire index into the supported (back
compat) codec (Lucene40).
Don't use addIndexes(Directory[]) as this just copies files.
Then you can read that resulting index with 4.1.
Mike McCandless
http:
Woops, sorry: PulsingPostingsFormat was already moved from core to
codecs as of 4.0 (not in 4.1 like I said before).
And ... yes, you need Pulsing40PostingsFormat on your classpath to
read your 4.0 indices with 4.1. I think you need to excise the
sources and get them on the classpath? But this w
Do I need the Pulsing40PostingsFormat class to read my indexes though?
Pulsing40PostingsFormat isn't shipped with lucene 4.1.
I have index files with names like _0_Pulsing40_0.frq. When I try to open
my index I get,
java.lang.IllegalArgumentException: A SPI class of type
org.apache.lucene.codecs
Pulsing41PostingsFormat was just moved out of core to the "codecs" module.
Still, the worst case (had it been deleted) would be to revive the
code from the past release and put it in your classpath, so old
indices could be read.
Mike McCandless
http://blog.mikemccandless.com
On Tue, Jan 29, 201
I'm sorry, but for anybody to help you here, you really need to be able to
provide a concise test case, like 10-20 lines of code, completely
self-contained. If you think you need a million documents to repro what you
claimed was a simple scenario, then you leave me very, very confused - and
una
Hi Jack,
The problematic query is "scar"+"wads".
There are several (more than 10) documents in the data with the content
"star wars", so I think that query should be able to find all these
documents.
I was trying to provide a minimal test case, but I couldn't reduce the size
of data showing the
Hello,
I currently have the following production code which currently works with
Lucene 3.0:
this.luceneWriter = new IndexWriter( directory, analyzer, true,
MaxFieldLength.UNLIMITED );
this.fs.delete( this.finalOutput, true );
this.luceneWriter.setUseCompoundFile( true );
this.luceneWriter.setMer
I also noticed that you have "MUST" for your full string of fuzzy terms -
that means everyone of them must appear in an indexed document to be
matched. Is it possible that maybe even one term was not in the same indexed
document?
Try to provide a complete example that shows the input data and
Hi Nicola,
How does the interface allow the user to select a facet values not from the
top-10? How does the interface know which other facet values are there?
Does it query the taxonomy somehow?
One thing you can do is to set numResults to Integer.MAX_VALUE and
numToLabel to 10. That way your Fac
Hi,
I have a FacetRequest with numResults setted to 10, how can I specify
additional facets value to add to the FacetResult?
I try to explain the use-case:
- the user view 10 facet result
- the interface permit the user to choose a facet value not from the
top-10 results
- the user execute the qu
Hi Jack,
ed is set to 1 here and I have lowercased all the data and queries.
Regarding the indexed data factor you mentioned, can you elaborate more?
Thanks!
George
On Tue, Jan 29, 2013 at 9:10 AM, Jack Krupansky wrote:
> That depends on the value of "ed", and the indexed data.
>
> Another f
Hi Michael,
I'm looking into implementing a solution.
On Fri, 2013-01-25 at 16:23 -0500, Michael McCandless wrote:
> On Fri, Jan 25, 2013 at 3:48 PM, Nicola Buso wrote:
>
> > if you have experiences in this use case can you share solutions? What
> > is reusable from Lucene 4.x implementation?
>
That depends on the value of "ed", and the indexed data.
Another factor to take into consideration is that a case change ("Star" vs.
"star") also counts as an edit.
-- Jack Krupansky
-Original Message-
From: George Kelvin
Sent: Tuesday, January 29, 2013 11:49 AM
To: java-user@lucene
: Subject: Large Index Query Help!
: References: <1359429227142-4036943.p...@n3.nabble.com>
https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh emai
Arun,
Lucene uses a very light compression algorithm so I'm a little
surprised it can make indexing 2x slower. Could you run indexing under
a profiler to make sure it really is what makes indexing slower?
Thanks!
--
Adrien
-
T
Fair point, but with such small numbers per index any variation
between versions is likely to just be noise. It is also certainly
possible that on your index the compressing format may not help.
An aside: have you considered merging the thousands of small indexes
into one, with some field to iden
Hi Ian,
you r right in that if we have 1 index of say 15 Mb there is no prob but i
have thousands of such indexes.
So the time will add up with the number of such indexes being open
simultaneously and parallel indexing.
Arun
On Tue, Jan 29, 2013 at 7:09 PM, Ian Lea wrote:
> I make that about
Well, lucene is a java API so if you can make java calls from an
Oracle stored procedure, maybe you can make it work. Sounds a
terrible idea to me, but then using Oracle is, in my opinion, a
terrible idea, as are stored procedures.
How about using the stored procedure to set some "reindexthisreco
I make that about 15Mb of data - trivial. What happens if you make
each field 400 chars and index a million or two? If you really have
that few docs, what are you worrying about?
A doubling of indexing time from 3.0.2 to 4.1 is surprising, but for
40k docs are we talking about it taking 2 second
Hi
we are using oracle stored procedure which accepts UDT ( java object ) and
returns the results as UDT ( java object ).
we want to use this SP to update the index. Does lucene allows updating
index in such a scenario.
we plan to use solr ( solr uses lucene library ) for building index and
searc
Hi,
> 4 JVM Flag : -Xms512m -Xmx1576m
> 5 Other app don't occupy too much memory
As said by Ian, read this blog post and you will understand that Lucene is not
eating your memory. The "RES" column in TOP shows the actual memory usage
(resident memory):
PID USER PR NI VIRT RES SHR S
OK, I will show the scene.
1 OS : Redhat5
2 JVM 64bit jdk1.7
3 Lucene4.0
4 JVM Flag : -Xms512m -Xmx1576m
5 Other app don't occupy too much memory
My Code is as follow:
public synchronized IndexSearcher openSearcher() {
if (QueryUtil.indexExist(SearchEnv.searchEnv.indexDir)
Please try and phrase your question in terms of lucene. Oracle?
What's that? User defined type? What's that?
IndexWriter has various updateDocuments() methods. I usually give all
docs in my indexes a unique id, supplied by me (primary key in
database terminology) and use the method that "Update
Lucene won't load the whole index into memory. See
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
What version of lucene?
How are you opening index readers?
How are you searching?
How much memory are you giving the jvm?
What else in your app is using all the memory?
27 matches
Mail list logo