Hi Shireesha,
I'm not sure as to what is it that you have been using, but 'm kinda sure
that you'd have to check for deprecated things as well as improved ones
while upgrading.. 1.2 to 2.4 is a huge jump certainly, with compound index
structure etc. coming into place.
You would have to try it and c
Hi,
I am trying to upgrade the version of Lucene from 1.2 to 2.4. Can we do
this directly?
Is it possible to have two versions of Lucene on the same machine.?
Shireesha
This e-mail and any files transmitted with it are for the sole use of the
intended recipient(s) and may contain
I had same kind of problem and I somehow managed to find a work around by
initializing IndexSearcher from new reader.
try {
IndexReader newReader = reader.reopen();
if (newReader != reader) {
// reader was reopened
On Wed, Nov 19, 2008 at 3:27 AM, karl wettin <[EMAIL PROTECTED]> wrote:
> rewritten query. I.e. this is probably as much a store related expense
> as it is a Levenshtein calculation expense.
"this is probably *not* as much a store related.." that is.
karl
---
On an index of around 20 gigs I've been seeing a performance drop of
around 35% after upgrading to 2.4 (measured on ~1 requests
identical requests, executed in parallel against a threaded lucene /
apache setup, after a roughly 1 query warmup). The principal
changes I've made so far are just
The actual performance depends on how much you load to the index. Can
you tell us how many documents and how large these documents are that
you have in your index?
Compared with RAMDirectory I'vee seen performance boosts of
up to 100x in a small index that contains (1-20) Wikipedia sized
document
> With "Allow Filter as clause to BooleanQuery":
> https://issues.apache.org/jira/browse/LUCENE-1345
> one could even skip the ConstantScoreQuery with this.
> Unfortunately 1345 is unfinished for now.
>
That would be interesting; I'd like to see how much performance improves.
>> startup: 2811
Op Wednesday 19 November 2008 00:43:56 schreef Tim Sturge:
> I've finished a query time implementation of a column stride filter,
> which implements DocIdSetIterator. This just builds the filter at
> process start and uses it for each subsequent query. The index itself
> is unchanged.
>
> The resul
I've finished a query time implementation of a column stride filter, which
implements DocIdSetIterator. This just builds the filter at process start
and uses it for each subsequent query. The index itself is unchanged.
The results are very impressive. Here are the results on a 45M document
index:
There has been discussion in the past about how PhraseQuery artificially
requires that the Terms you add to it must be in the same field ... you
could theoretically modify PhraseQuery to have a tpe of query that
required terms in one field be withing (slop)N positions of a term in a
"parallel"
I'll provide a better example, perhaps it will help in formulating a
solution.
Suppose I am designing an index that stores invoices. One document
corresponds to one invoice, which has a unique id. Any number of employees
can make comments on the invoices, and comments have different
classification
Thanks for the suggestion, but I think I will need a more robust solution,
because this will only work with pairs of fields. I should have specified
that the example I gave was somewhat contrived, but in practice there could
be more than two parallel fields. I'm trying to find a general solution th
Flexible indexing (LUCENE-1458) should make this possible.
IE you could use your own codec which discards doc/freq/prox/payload
and during indexing (for this one field) and simply stores the term
frequency in the terms dict. However, one problem will be deletions
(in case it matters to yo
How about using variable field names?
url: http://www.cnn.com/
page_description: cnn breaking news
page_title_ajax: news
page_title_paris: cnn news
page_title_daniel: homepage
username: ajax
username: paris
username: daniel
and search for +user:ajax +page_title_ajax:news or maybe just
pag
Hello,
I am designing an index in which one url corresponds to one document. Each
document also contains multiple parallel repeating fields. For example:
Document 1:
url: http://www.cnn.com/
page_description: cnn breaking news
page_title: news
page_title: cnn news
page_titel: homepage
I would like to store a set of keywords in a single field of a document.
for example I have now three keywords: "One", "Two" and "Three"
and I am going to add them into a document.
At first, is this code correct?
//
String[] keyword
Hi all,
I am wondering if the raw scores obtained from HitCollector can be used to
compare relevance of documents to different queries?
E.g. two phrase queries are issued : (PQ1: "Barack Obama" and PQ2: "John
McCain"). if a document (doc1) belongs to the result sets of both queries
and has th
What analyzer are you using at index and search time? Typical problems
include:
using an analyzer that doesn't understand accented chars (StandardAnalyzer
for instance)
using a different anlyzer during search and index.
Search the user list for "accent" and you'll find this kind of problem
discuss
Hi!
I'm having problems with entities including special characters (Spanish
language) not getting indexed.
I haven't been able to find the the reason why some entities get indexed
while some don't.
I have 3 fields that (currently) hold the same value. The value for the
fields is example "¡
Naming this class to include "Latin2" may be misleading.
Latin2 means ISO-8859-2 character set.
http://en.wikipedia.org/wiki/ISO_8859-2
> From: Uwe Goetzke [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, November 18, 2008 7:26 AM
> To: java-user@lucene.apache.org
> Cc: [EMAIL PROTECTED]
> Subject: A
Sascha Fahl wrote:
Where do I get the CharFilter library? I'm using Lucene, not Solr.
Thanks,
Sascha
CharFilter is included in recent Solr nightly build.
It is not OOTB solution for Lucene now, sorry.
If I have time, I will make it for Lucene in this weekend.
Koji
--
You are right.
Cheers,
Zhibin
From: Chris Lu <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Monday, November 17, 2008 11:13:44 PM
Subject: Re: how to estimate how much memory is required to support the large
index search
So looks like you are not
Where do I get the CharFilter library? I'm using Lucene, not Solr.
Thanks,
Sascha
Am 18.11.2008 um 14:11 schrieb Koji Sekiguchi:
Uwe Goetzke wrote:
> Use ISOLatin1AccentFilter, although it is not perfect...
> So I made ISOLatin2AccentFilter for me and changed this method.
Or use CharFilter li
Uwe Goetzke wrote:
> Use ISOLatin1AccentFilter, although it is not perfect...
> So I made ISOLatin2AccentFilter for me and changed this method.
Or use CharFilter library. It is for Solr as of now, though.
See:
https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
Well... we certainly do our best to have each release be stable, but
we do make mistakes, so you'll have to use your own judgement on when
to upgrade.
However, it's only through users like yourself upgrading that we then
find & fix any uncaught issues in each new release.
Mike
Ganesh w
Use ISOLatin1AccentFilter, although it is not perfect...
So I made ISOLatin2AccentFilter for me and changed this method.
We use our own analysers, so you would use something like this
result = new
org.apache.lucene.analysis.WhitespaceTokenizer(reader);
result = new
I am creating IndexSearcher using String, this is working fine with version
2.3.2.
I tried by replacing Directory ctor of IndexSearcher and it is working fine
with v2.4.
I have recently upgraded from v2.3.2 to 2.4. Is v2.4 stable and i could more
forward with this or shall i revert back to 2.3
Hi,
what is the best to transform the german umlaute ö,ä,ü,ß into oe, ae,
ue, ss during the process of analyzing?
Thanks,
Sascha Fahl
Softwareentwicklung
evenity GmbH
Zu den Mühlen 19
D-35390 Gießen
Mail: [EMAIL PROTECTED]
--
Did you create your IndexSearcher using a String or File (not
Directory)?
If so, it sounds like you are hitting this issue (just fixed this
morning, on 2.9-dev (trunk)):
https://issues.apache.org/jira/browse/LUCENE-1453
The workaround is to use the Directory ctor of IndexSearcher.
M
Hello all,
I am using version 2.4. The following code throws AlreadyClosedException
IndexReader reader = searcher.getIndexReader();
IndexReader newReader = reader.reopen();
if (reader != newReader) {
reader.close();
boolean isCurrent = newReader.isCurr
Can you post the code fragment in AccentFilter.java that's setting the
Token?
In 2.4 we added that check (for IllegalArgumentException) to ensure
you don't setTermLength to something longer than the current term
buffer. You should call resizeTermBuffer() first, then fill in the
char[]
BTW, upcoming changes in Lucene for flexible indexing should improve
the RAM usage of the terms index substantially:
https://issues.apache.org/jira/browse/LUCENE-1458
In the current (first) iteration on that patch, TermInfo is no longer
used at all when loading the index. I think for
32 matches
Mail list logo