Dear fellow Java developers:
I am setting up an advanced search page that is very similar to google's and
yahoo's. I have four text fields with the following labels:
With all of the words:
With the exact phrase:
With at least on one of the words:
Hi -
One thing to consider is field norms. If your fields aren't analyzed, this
doesn't apply to you.
But if you do have norms, I believe that it's one by per field with norms x
number of documents. It doesn't matter if the field occurs in a document or
not, it's nTotalFields x nDocs.
So, an ind
> It's an open question whether this is more or less work than
> re-parsing the document (I infer that you have the originals
> available). Before trying to reconstruct the document I'd
> ask how often you need to do this. The gremlins coming out
> of the woodwork from reconstruction would consume
It is possible to reconstruct a document from the terms, but
it's a lossy process. Luke does this (you can see from the
UI, and the code is available). There's no utility that I know
of to make this easy.
It's an open question whether this is more or less work than
re-parsing the document (I infer
On Dec 30, 2009, at 5:08 PM, tsuraan wrote:
> Suppose I have a (useful) document stored in a Lucene index, and I
> have a variant that I'd also like to be able to search. This variant
> has the exact same data as the original document, but with some extra
> fields. I'd like to be able to use an
> Alternatively, if one of the "regular" analyzers works for you *except*
> for lower-casing, just use that one for your mixed-case field and
> lower-case your input and send it to your lower-case field.
>
> Be careful to do the same steps when querying .
>
Thanks Erick, I didn't think about this.
See PerFieldAnalyzerWrapper for an easy way to implement two fields
in the same document processed with different analyzers. So basically
you're copying the input to two fields that handle things slightly
differently.
As far as re-implementing stuff, no real re-implementing is necessary,
just crea
Suppose I have a (useful) document stored in a Lucene index, and I
have a variant that I'd also like to be able to search. This variant
has the exact same data as the original document, but with some extra
fields. I'd like to be able to use an IndexReader to get the document
that I stored, use th
I am getting IOException when I am doing a "Real-time" search, i.e. I am
creating a Index using the Index Writer and also opening the Index using
Index Reader (writer.getReader()) to make sure the document does not exist
prior adding to the Index file.
The code works perfect fine multiple time ind
> System.out.println(typeAtt.type());
> ??? And this typeAtt?
>
> Thanks!
>
Yes. You can add the other attributes if you want. By the way i forget to
remove (TermAttribute) and TypeAttribute). You don't need them in 3.0.0.
TermAttribute termAtt = tokenStream.getAttribute(TermAttribute.class);
That would be good, if you could test it!
Please checkout Lucene 2.9 branch from svn
(http://svn.apache.org/repos/asf/lucene/java/branches/lucene_2_9), compile
the whole package (at least lucene-core.jar) and then replace the lucene jar
files in solr's lib folder.
Uwe
-
Uwe Schindler
H.-H.-M
Hi,
just sharing some personal experiences in this domain,
We performed some benchmarks in a similar setup (indexing millions of
documents with thousands of fields) to measure the impact of large
number of fields on a Lucene index.
We observed that more you have fields, more the dictionary wil
> I just want to see if it's safe to use two different analyzers for the
> following situation:
>
> I have an index that I want to preserve case with so I can do
> case-sensitive
> searches with my WhitespaceAnalyzer. However, I also want to do case
> insensitive searches.
you should also make su
System.out.println(typeAtt.type()); ??? And this typeAtt?
Thanks!
-
Mário André
Instituto Federal de Educação, Ciência e Tecnologia de Sergipe - IFS
Mestrando em MCC - Universidade Federal de Alagoas - UFAL
http://www.marioandre.
> Using PorterStemFilter and removing the stopwords, but how
> can I use
> TokenStream in release 3.0 (print the result this method).
>
> I tried to use:
>
> public static void main(String[] args) throws
> IOException,
> ParseException
> {
> StringReader sr = new
> StringReader("T
Yes I can (though I need some time, since I have my nested custom analyzers
and filter). I'll try to write a test scenario to reproduce this issue.
For now, can you tell me if these steps are correct for instantiating and
using highlighter:
IndexSearcher is = new IndexSearcher(indexReader);
Quer
Hi,
I have the method below:
public final TokenStream tokenStream(String fieldName, Reader reader)
{
TokenStream result = new LowerCaseTokenizer(reader);
result = new StopFilter(true, result, stopWords, true);
result = new PorterStemFilter(result);
return re
* Yes, I understand the first part about two rows and querying.
* The problem is - I'm not the one creating those Analyzers and storing
documents into indexes. All I could say - "add this field to document", it's
as simple as this.
Luckily the system is built using Pico and OSGi, so I will try t
As far as I know, no problem. There's no penalty that I
know of for having this kind of setup. Of course your
mileage may vary, and a relevant question is "why do
you care?" That is, if your total index is 100M in size,
pretty much no matter how Lucene implements the internal
data structures you wo
Mohsen Saboorian wrote:
> After updating to 2.9.x or 3.0, highlighter doesn't work on wildcard queries
> like "abc*". I thought that it would be because of scoring, so I also set
> myIndexSearcher.setDefaultFieldSortScoring(true, true) before searching.
> I tested with both QueryScorer and QueryTer
After updating to 2.9.x or 3.0, highlighter doesn't work on wildcard queries
like "abc*". I thought that it would be because of scoring, so I also set
myIndexSearcher.setDefaultFieldSortScoring(true, true) before searching.
I tested with both QueryScorer and QueryTermScorer.
In my custom highligh
You'll have one problem if you can't return a different increment gap,
you'll
match across rows.
Say you index row 1 with "aaa" "bbb" "ccc", then row two with
"ddd", "eee", "fff". Just adding multiple rows to a single document,
that document would match the phrase "ccc ddd".
I don't understand wh
I have a situation where I might have 1000 different types of Lucene
Documents each with 10 or so fields with different names that get
indexed.
I am wondering if this is bad to do within Lucene. I end up with
10,000 fields within the index although any given document has only 10
or so.
I was hop
Thanks all for your interest, especially Uwe. I asked this question on
solr-user at the beginning but I got no reply. That's why I re-asked the
question at java-user.
Thanks for your efforts. I will try it now.
On Mon, Dec 28, 2009 at 12:02 PM, Uwe Schindler wrote:
> I opened https://issues.apac
24 matches
Mail list logo