Oh yes, I also use Spring Cache which works fine and I don't have to store
products in Lucene making index smaller and faster.
On Fri, 23 Sept 2022, 19:26 Stephane Passignat,
wrote:
> Hi
>
> I would don't store the original value. That's "just" an index. But store
> the value of your db identifi
Well, my bad is that I used wrong word. I'm not storing but just goving
keywords to analyzer. That was my mistake in writing. So far I don't index
exotic letters, but just normalized.
Additionally I put in index something like "Prod_3443" which is a product
ID for situation when specific product is
Good point!
For now I'll leave it normalized. Every search term coming from frontend is
stored and also its counter updated which will help me after some time to
see trends and to decide to change the logic or not.
P.S. Here is the funny part: in Croatian "pišanje" means peeing while
"pisanje" mea
Hi
I would don't store the original value. That's "just" an index. But store the
value of your db identifiers, because I think you'll want it at some point. (I
made the same kind of feature on top of datanucleus)
I use to have tech id in my db. Even more since I started to use jdo jpa some
20
I think it depends how precise you want to make the search. If you
want to enable diacritic-sensitive search in order to avoid confusions
when users actually are able to enter the diacritics, you can index
both ways (ascii-folded and not folded) and not normalize the query
terms. Or you can just fo
Hi Stephane!
Actually, I have excactly that kind of conversion, but I didn't mention as
my mail was long enough whithout it :)
My main concern it should I let Lucene index original keywords or not.
Considering what you wrote, I guess your answer would be to store only
converted values without exot
Hello,
The way I did it took me some time and I almost sure it's applicable to all
languages.
I normalized the words. Replacing letters or group of letters by another
approaching one.
In french e é è ê ai ei sound a bit the same, and for someone who write
mistakes having to use the right lett
Hi!
I'm using Hibernate Search / Lucene to index my entities in Spring Boot
aplication.
One thing I'm not sure is how to handle Croatian specific letters.
Croatian language has few additional letters "*č* *Č* *ć* *Ć* *đ* *Đ* *š*
*Š* *ž* *Ž*".
Letters "*đ* *Đ*" are commonly replaced with "*dj* *DJ
I put a comment on the StackOverflow question.
Mike McCandless
http://blog.mikemccandless.com
On Fri, May 30, 2014 at 2:51 AM, Gaurav gupta
wrote:
> Hi,
>
> I am implementing NRT and looking for best practice to implement it. I
> found that 4.4.0 release onwards the Near Real
Hi,
I am implementing NRT and looking for best practice to implement it. I
found that 4.4.0 release onwards the Near Real Time Manager
(org.apache.lucene.search.NRTManager) has been replaced by
ControlledRealTimeReopenThread. But as per Java doc it appears
"experimenter".
Please advis
Thank you, that helped me a lot.
Sven Teichmann
__
Software for Intellectual Property GmbH
Gewerbering 14a
83607 Holzkirchen (Germany)
Phone: +49 (0)8024 46699-00
Fax:+49 (0)8024 46699-02
E-Mail: s.teichm...@s4ip.de
Local Court of Munich
On Tue, May 13, 2014 at 1:34 AM, Sven Teichmann wrote:
> Hi,
>
> I also found this response very useful and right now I am playing around
> with DocValues.
>
>> If the default DocValuesFormat isn't fast enough, you can always
>> switch to e.g. DirectDocValuesFormat (uses lots of RAM but it just an
Hi,
I also found this response very useful and right now I am playing around
with DocValues.
If the default DocValuesFormat isn't fast enough, you can always
switch to e.g. DirectDocValuesFormat (uses lots of RAM but it just an
array lookup).
How do I switch do DirectDocValuesFormat? And how do
Hey Mike,
That was a very useful response, also for long time Lucene users like
myself who were stuck in legacy ways of doing things!
I managed to easily change indexing of keys to DocValues and found myself
wondering why I did not get anything returned, it appears indexing works
transparent to an
Doc values is far faster than a stored field.
If the default DocValuesFormat isn't fast enough, you can always
switch to e.g. DirectDocValuesFormat (uses lots of RAM but it just an
array lookup).
Mike McCandless
http://blog.mikemccandless.com
On Tue, May 6, 2014 at 4:33 AM, Sven Teichmann wro
Hi,
I would index it in a field, you can use the database id and even add
additional information so compose your own key and retrieve that (only)
when you collect search results.
Wouter
> Hi,
>
> what is the best way to retrieve our "real" ids (as they are in our
> database) while searching?
>
>
Hi,
what is the best way to retrieve our "real" ids (as they are in our
database) while searching?
Right now we generate a file after indexing which contains all Lucene
docids and the matching id in our database. Our own Collector converts
the docids to our ids while collecting. This works a
;
ScoreDoc[] sd = is.search(query, 10).scoreDocs;
for(ScoreDoc scoreDoc:sd){
System.out.println(ir.document(scoreDoc.doc));
}
is.close();
ir.close();
iw.close();
*--Snip--*
--
Anshum Gupta
http://ai-cafe.blogspot.com
On Fri, Apr 15,
I know that it's best practice to reuse the Document object when
indexing, but I'm curious how multi-valued fields affect this. I tried
this before indexing each document:
doc.removeFields(myMultiValuedField);
for (String fieldName: fieldNames) {
Field field= doc.getField(field);
>> Ideally I'd like to have the parser use the
>> custom analyzer for everything unless it's going to parse a clause into
>> a PhraseQuery or a MultiPhraseQuery, in which case it uses the
>> SimpleAnalyzer and looks in the _exact field - but I can't figure out
>> the best way to accomplish this.
>
On Tue, Mar 29, 2011 at 6:56 PM, Christopher Condit wrote:
> Ideally I'd like to have the parser use the
> custom analyzer for everything unless it's going to parse a clause into
> a PhraseQuery or a MultiPhraseQuery, in which case it uses the
> SimpleAnalyzer and looks in the _exact field - but
MultiPhraseQuery, in which case it uses the
SimpleAnalyzer and looks in the _exact field - but I can't figure out
the best way to accomplish this.
Has anyone else encountered the same problem?
Is there a best practice for doing this - or something much
use IndexSearcher with MultiReader?
>
> Regards
> Ganesh
>
> - Original Message -
> From: "Robert Muir"
> To:
> Sent: Saturday, November 27, 2010 1:28 AM
> Subject: Re: best practice: 1.4 billions documents
>
>
>> On Fri, Nov 26, 2010 at 12:4
- Original Message -
From: "Robert Muir"
To:
Sent: Saturday, November 27, 2010 1:28 AM
Subject: Re: best practice: 1.4 billions documents
> On Fri, Nov 26, 2010 at 12:49 PM, Uwe Schindler wrote:
>> This is the problem for Fuzzy: each searcher expands the fuzzy quer
On Fri, Nov 26, 2010 at 12:49 PM, Uwe Schindler wrote:
> This is the problem for Fuzzy: each searcher expands the fuzzy query to a
> different Boolean Query and so the scores are not comparable - MultiSearcher
> (but not Solr) tries to combine the resulting rewritten queries into one
> query, so e
er@lucene.apache.org; Uwe Schindler
> Subject: Re: best practice: 1.4 billions documents
>
> On Mon, Nov 22, 2010 at 12:49 PM, Uwe Schindler wrote:
> > (Fuzzy scores on
> > MultiSearcher and Solr are totally wrong because each shard uses
> > another rewritten query).
&
On Mon, Nov 22, 2010 at 12:49 PM, Uwe Schindler wrote:
> (Fuzzy scores on
> MultiSearcher and Solr are totally wrong because each shard uses another
> rewritten query).
Hmmm, really? I thought that fuzzy scoring should just rely on edit distance?
Oh wait, I think I see - it's because we can use
e
eMail: u...@thetaphi.de
> -Original Message-
> From: Ganesh [mailto:emailg...@yahoo.co.in]
> Sent: Thursday, November 25, 2010 9:55 AM
> To: java-user@lucene.apache.org
> Subject: Re: best practice: 1.4 billions documents
>
> Thanks for the input.
>
> My results
Thanks for the input.
My results are sorted by date and i am not much bothered about score. Will i
still be in trouble?
Regards
Ganesh
- Original Message -
From: "Robert Muir"
To:
Sent: Thursday, November 25, 2010 1:45 PM
Subject: Re: best practice: 1.4 billions document
On Thu, Nov 25, 2010 at 2:58 AM, Uwe Schindler wrote:
> ParallelMultiSearcher as subclass of MultiSearcher has the same problems.
> These are not crashes, but more that some queries do not return correct
> scored results for some queries. This effects especially all MultiTermQueries
> (TermRang
ilto:emailg...@yahoo.co.in]
> Sent: Thursday, November 25, 2010 6:44 AM
> To: java-user@lucene.apache.org
> Subject: Re: best practice: 1.4 billions documents
>
> Since there was a debate about using multisearcher, what about using
> ParallelMultiSearcher?
>
> I am having indexe
now i didn't faced any issue. I
used Lucene 2.9 and recently upgraded to 3.0.2.
Do i need to switch to MultiReader?
Regards
Ganesh
- Original Message -
From: "Luca Rondanini"
To:
Sent: Monday, November 22, 2010 11:29 PM
Subject: Re: best practice: 1.4 billions docu
eheheheh,
1.4 billion of documents = 1,400,000,000 documents for almost 2T = 2
therabites = 2000 gigas on HD!
On Mon, Nov 22, 2010 at 10:16 AM, wrote:
> > of course I will distribute my index over many machines:
> > store everything on
> > one computer is just crazy, 1.4B docs is going to b
> of course I will distribute my index over many machines:
> store everything on
> one computer is just crazy, 1.4B docs is going to be an index
> of almost 2T
> (in my case)
billion = giga in english
billion = tera in non-english
2T docs = 2.000.000.000.000 docs... ;)
AFAIK 2 ^ 32 - 1 docs is
earchers, indexing additional documents, or filling FieldCache in
> parallel.
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> > > -Original Messa
il.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Monday, November 22, 2010 6:29 PM
> To: java-user@lucene.apache.org
> Subject: Re: best practice: 1.4 billions documents
>
> On Mon, Nov 22, 2010 at 12:17 PM, Uwe Schindler wrote:
> > The latest discussion
On Mon, Nov 22, 2010 at 12:17 PM, Uwe Schindler wrote:
> The latest discussion was more about MultiReader vs. MultiSearcher.
>
> But you are right, 1.4 B documents is not easy to go, especially when you
> index grows and you get to the 2.1 B marker, then no MultiSearcher or
> whatever helps.
>
> O
er@lucene.apache.org
> Subject: Re: best practice: 1.4 billions documents
>
> Am I the only one who thinks this is not the way to go, MultiReader (or
> MulitiSearcher) is not going to fix your problems. Having 1.4B Documents
on
> one machine is a big number, does not matter how you
ling FieldCache in parallel.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: David Fertig [mailto:dfer...@cymfony.com]
> > Sent: Monday,
-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: David Fertig [mailto:dfer...@cymfony.com]
> Sent: Monday, November 22, 2010 5:57 PM
> To: java-user@lucene.apache.org
> Subject: RE: best practice: 1.4 billions documents
>
---
From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Monday, November 22, 2010 11:19 AM
To: java-user@lucene.apache.org
Subject: RE: best practice: 1.4 billions documents
There is no reason to use MultiSearcher instead the much more consistent and
effective MultiReader! We (Robert and me) are
remen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: David Fertig [mailto:dfer...@cymfony.com]
> Sent: Monday, November 22, 2010 4:54 PM
> To: java-user@lucene.apache.org
> Subject: RE: best practice: 1.4 billions documents
>
> >> We have a couple
ni [mailto:luca.rondan...@gmail.com]
Sent: Monday, November 22, 2010 1:47 AM
To: java-user@lucene.apache.org
Subject: Re: best practice: 1.4 billions documents
Hi David, thanks for your answer. it really helped a lot! so, you have an
index with more than 2 billions segments. this is pretty
iginal Message-
> > From: Luca Rondanini [mailto:luca.rondan...@gmail.com]
> > Sent: Sunday, November 21, 2010 8:13 PM
> > To: java-user@lucene.apache.org; yo...@lucidimagination.com
> > Subject: Re: best practice: 1.4 billions documents
> >
> > thank you bot
M
> To: java-user@lucene.apache.org; yo...@lucidimagination.com
> Subject: Re: best practice: 1.4 billions documents
>
> thank you both!
>
> Johannes, katta seems interesting but I will need to solve the problems of
> "hot" updates to the index
>
> Yonik, I see
From: Luca Rondanini [mailto:luca.rondan...@gmail.com]
Sent: Sunday, November 21, 2010 8:13 PM
To: java-user@lucene.apache.org; yo...@lucidimagination.com
Subject: Re: best practice: 1.4 billions documents
thank you both!
Johannes, katta seems interesting but I will need to solve the problems of
&qu
thank you both!
Johannes, katta seems interesting but I will need to solve the problems of
"hot" updates to the index
Yonik, I see your point - so your suggestion would be to build an
architecture based on ParallelMultiSearcher?
On Sun, Nov 21, 2010 at 3:48 PM, Yonik Seeley wrote:
> On Sun, No
On Sun, Nov 21, 2010 at 6:33 PM, Luca Rondanini
wrote:
> Hi everybody,
>
> I really need some good advice! I need to index in lucene something like 1.4
> billions documents. I had experience in lucene but I've never worked with
> such a big number of documents. Also this is just the number of docs
Hi Luca,
Katta is an open-source project that integrates Lucene with Hadoop
http://katta.sourceforge.net
Johannes
2010/11/21 Luca Rondanini
> Hi everybody,
>
> I really need some good advice! I need to index in lucene something like
> 1.4
> billions documents. I had experience in lucene but I'
Hi everybody,
I really need some good advice! I need to index in lucene something like 1.4
billions documents. I had experience in lucene but I've never worked with
such a big number of documents. Also this is just the number of docs at
"start-up": they are going to grow and fast.
I don't have to
Off the top of my head...
1) is certainly easiest. This looks suspiciously like synonyms. That is, at
index
time you inject the ID as a synonym in the text and it gets indexed at
the same
position as the token. Why this helps is that then phrase queries
continue to
work. Lucene in Actio
I'm curious about embedding extra information in an index (and being able to
search the extra information as well). In this case certain tokens correspond
to recognized entities with ids. I'd like to get the ids into the index so that
searching for the id of the entity will also return that docu
Ahmet,
Thanks for your suggestion, and could you explain more about this or give me
a refer article that explains the reason in details ?
Thanks
On Tue, Mar 23, 2010 at 6:33 PM, Ahmet Arslan wrote:
>
>
> > I'd like to use the synonymy in my project. And I think
> > there's two
> > candidates s
Index time is a much better approach. The only negative about it is the
index size increase. I've used it for a considerable sized dataset and even
the index time doesn't seem to go up considerably.
Searching of multiple terms is generally unoptimized when you can do it with
1.
--
Anshum Gupta
Nau
> I'd like to use the synonymy in my project. And I think
> there's two
> candidates solution :
> 1. using the synonymy in the indexing stage, enhance the
> index by using
> synonymy
> 2. using the synonymy in the search stage, enhance the
> search query by
> synonymy .
>
> I'd like to know whic
Hi all,
I'd like to use the synonymy in my project. And I think there's two
candidates solution :
1. using the synonymy in the indexing stage, enhance the index by using
synonymy
2. using the synonymy in the search stage, enhance the search query by
synonymy .
I'd like to know which one is better
>
> -Original Message-
> From: java-user-return-45433-paul.b.murdoch=saic@lucene.apache.org
>
[mailto:java-user-return-45433-paul.b.murdoch=saic@lucene.apache.org
> ] On Behalf Of Mark Miller
> Sent: Monday, March 15, 2010 10:48 AM
> To: java-user@lucene.apache.org
-Original Message-
> From: java-user-return-45433-paul.b.murdoch=saic@lucene.apache.org
> [mailto:java-user-return-45433-paul.b.murdoch=saic@lucene.apache.org
> ] On Behalf Of Mark Miller
> Sent: Monday, March 15, 2010 10:48 AM
> To: java-user@lucene.apache.org
> Subject:
:48 AM
To: java-user@lucene.apache.org
Subject: Re: Batch Indexing - best practice?
On 03/15/2010 10:41 AM, Murdoch, Paul wrote:
Hi,
I'm using Lucene 2.9.2. Currently, when creating my index, I'm
calling
indexWriter.addDocument(doc) for each Document I want to ind
:java-user-return-45433-paul.b.murdoch=saic@lucene.apache.org
] On Behalf Of Mark Miller
Sent: Monday, March 15, 2010 10:48 AM
To: java-user@lucene.apache.org
Subject: Re: Batch Indexing - best practice?
On 03/15/2010 10:41 AM, Murdoch, Paul wrote:
> Hi,
>
>
>
> I'm using
See http://wiki.apache.org/lucene-java/ImproveIndexingSpeed for plenty
of tips. Suggested by Mike just a few hours ago in another thread ...
--
Ian.
On Mon, Mar 15, 2010 at 2:41 PM, Murdoch, Paul wrote:
> Hi,
>
>
>
> I'm using Lucene 2.9.2. Currently, when creating my index, I'm calling
> ind
On 03/15/2010 10:41 AM, Murdoch, Paul wrote:
Hi,
I'm using Lucene 2.9.2. Currently, when creating my index, I'm calling
indexWriter.addDocument(doc) for each Document I want to index. The
Documents aren't large and I'm averaging indexing about 500 documents
every 90 seconds. I'd like to try
Hi,
I'm using Lucene 2.9.2. Currently, when creating my index, I'm calling
indexWriter.addDocument(doc) for each Document I want to index. The
Documents aren't large and I'm averaging indexing about 500 documents
every 90 seconds. I'd like to try and speed this upunless 90
seconds for 50
Use IndexWriter.getReader to get a near real-time reader, after making
changes...
Mike
On Mon, Feb 8, 2010 at 3:45 AM, NanoE wrote:
>
> Hello,
>
> I am writing small library search and want to know what are the best
> practice for lucene 3.0.0 for almost real time index update?
Hello,
I am writing small library search and want to know what are the best
practice for lucene 3.0.0 for almost real time index update?
Thanks Nano
--
View this message in context:
http://old.nabble.com/Best-Practice-3.0.0-tp27496796p27496796.html
Sent from the Lucene - Java Users mailing
Phew :) Thanks for bringing closure!
Mike
On Fri, Nov 27, 2009 at 6:02 AM, Michael McCandless
wrote:
> If in fact you are using CFS (it is the default), and your OS is
> letting you use 10240 descriptors, and you haven't changed the
> mergeFactor, then something is seriously wrong. I would tri
You were right, my bad...
I have an async reader closing on a scheduled basis (after the writer
refreshes the index, to not interrupt the ongoing searches), but while
I've setup the scheduling for my first two index, I've forgotten it in
my third... oh dear...
Thanks anyway the info, it was usefu
If in fact you are using CFS (it is the default), and your OS is
letting you use 10240 descriptors, and you haven't changed the
mergeFactor, then something is seriously wrong. I would triple check
that all readers are being closed.
Or... if you list the index directory, how many files do you see?
On Fri, Nov 27, 2009 at 11:37 AM, Michael McCandless
wrote:
> Are you sure you're closing all readers that you're opening?
Absolutely. :) (okay, never say this, but I had bugz because of this
previously so I'm pretty sure that one is ok).
> It's surprising with normal usage of Lucene that you'd
Are you sure you're closing all readers that you're opening?
It's surprising with normal usage of Lucene that you'd run out of
descriptors, with its default mergeFactor (have you increased the
mergeFactor)?
You can also enable compound file, which uses far fewer file
descriptors, at some cost to
Hi,
I've a requirement that involves frequent, batched update of my Lucene
index. This is done by a memory queue and process that periodically
wakes and process that queue into the Lucene index.
If I do not optimize my index, I'll receive "too many open files"
exception (yeah, right, I can get th
The best practice is, well, "It Depends" (tm). First off, I wouldn't do any
caching of results unless and until you had a reasonable certainty that
you had performance issues, so would by my first choice. And if
you *did* start to see performance issues, I'd look first at
...@earthlink.net
Subject: Re: what's the best practice for getting "next page" of hits?
Date: Thu, 19 Feb 2009 10:48:02 +0530
Your solution (b) is better rather than using your own way of paging.
Do search for every page and collect the (pageno * count) results, discard
(pageno
:
To:
Sent: Thursday, February 19, 2009 8:59 AM
Subject: what's the best practice for getting "next page" of hits?
R2.4
So, I may well be missing something here, but: I use
IndexSearcher.search(someQuery, null, count, new
Sort());
to get an instance of TopFieldDocs (the "
R2.4
So, I may well be missing something here, but: I use
IndexSearcher.search(someQuery, null, count, new
Sort());
to get an instance of TopFieldDocs (the "Hits" is deprecated). So far, all
fine; I get a bunch of documents. Now, what is the Lucene-best-practice for
getting the *n
ew this message in context:
http://www.nabble.com/Best-Practice-for-Lucene-Search-tp21748839p21955474.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene
ed, searched this Forum and read the manual, but I'm not sure what
> would be the best practice for Lucene search.
>
> I have an e-Commerce application with about 10 mySQL tables for my
> products. And I have an Index (which is working fine), with about 10
> fields for every pr
I like the point about doing things the easiest way possible until it starts
to become a problem.
Thank you very much for your answers and for the insight how you handle this
issue. You helped me a lot.
Ilwes
--
View this message in context:
http://www.nabble.com/Best-Practice-for-Lucene
tails you have/
expect to have, because the answer varies depending
upon what you need/expect.
Best
Erick
On Fri, Jan 30, 2009 at 10:08 AM, ilwes wrote:
>
> Hello,
>
> I googled, searched this Forum and read the manual, but I'm not sure what
> would be the best practic
...@thetaphi.de
> -Original Message-
> From: Ian Lea [mailto:ian@gmail.com]
> Sent: Friday, January 30, 2009 4:57 PM
> To: java-user@lucene.apache.org
> Subject: Re: Best Practice for Lucene Search
>
> That answer is fine, but there are others. We store denormalized
- just what is returned for product searches.
Overall I don't think there is a single best practice recommendation.
As so often, it depends on your setup, requirements and preferences.
--
Ian.
On Fri, Jan 30, 2009 at 3:13 PM, Nilesh Thatte wrote:
> Hello
>
> I would store normalised
Hello
I would store normalised data in MySQL and index only searchable content in
Lucene.
Regards
Nilesh
From: ilwes
To: java-user@lucene.apache.org
Sent: Friday, 30 January, 2009 15:08:10
Subject: Best Practice for Lucene Search
Hello,
I googled
Hello,
I googled, searched this Forum and read the manual, but I'm not sure what
would be the best practice for Lucene search.
I have an e-Commerce application with about 10 mySQL tables for my products.
And I have an Index (which is working fine), with about 10 fields for every
product.
OK, sounds good. Fall will be here before you know it!
Mike
Christopher Kolstad wrote:
The only way to make this work with svn is if you can have svn
perform a
switch without doing any removal, then restart your IndexSearcher,
then do a
normal svn switch to remove the now unused files.
>
> The only way to make this work with svn is if you can have svn perform a
> switch without doing any removal, then restart your IndexSearcher, then do a
> normal svn switch to remove the now unused files. Does svn have an option
> to "switch but don't remove any removed files"? Because IndexSe
OK, got it.
The only way to make this work with svn is if you can have svn perform
a switch without doing any removal, then restart your IndexSearcher,
then do a normal svn switch to remove the now unused files. Does svn
have an option to "switch but don't remove any removed files"?
Bec
Hi.
First, thanks for the reply.
Why does SubversionUpdate require shutting down the IndexSearcher? What
> goes wrong?
>
SubversionUpdate requires shutting down the IndexSearcher in our current
implementation because the old index files are deleted in the tag we're
switching to. Sorry, just rea
Why does SubversionUpdate require shutting down the IndexSearcher?
What goes wrong?
You might want to switch instead to rsync.
A Lucene index is fundamentally write once, so, syncing changes over
should simply be copying over new files and removing now-deleted
files. You won't be able
Hi.
Currently using Lucene 2.3.2 in a tomcat webapp. We have an action
configured that performs reindexing on our staging server. However, our live
server can not reindex since it does not have the necessary dtd files to
process the xml.
To update the index on the live server we perform a subvers
webspeak <[EMAIL PROTECTED]> wrote:
> >
> >>
> >> Hello,
> >>
> >> I would like to search documents by "CUSTOMER".
> >> So I search on the field "CUSTOMER" using a KeywordAnalyzer.
> >>
> >> The CUST
ike to search documents by "CUSTOMER".
>> So I search on the field "CUSTOMER" using a KeywordAnalyzer.
>>
>> The CUSTOMER field is indexed with those params:
>> Field.Index.UN_TOKENIZED
>> Field.Index.Store
>>
>> Is it the Best Practice ?
search on the field "CUSTOMER" using a KeywordAnalyzer.
>
> The CUSTOMER field is indexed with those params:
> Field.Index.UN_TOKENIZED
> Field.Index.Store
>
> Is it the Best Practice ?
>
> --
> View this message in context:
> http://www.nabble.com/Search-by-Key
eywordAnalyzer.
The CUSTOMER field is indexed with those params:
Field.Index.UN_TOKENIZED
Field.Index.Store
Is it the Best Practice ?
--
View this message in context:
http://www.nabble.com/Search-by-KeyWord%2C-the-best-practice-tp14513720p14513720.html
Sent from the Lucene - Java Users mailing list a
Hello,
I would like to search documents by "CUSTOMER".
So I search on the field "CUSTOMER" using a KeywordAnalyzer.
The CUSTOMER field is indexed with those params:
Field.Index.UN_TOKENIZED
Field.Index.Store
Is it the Best Practice ?
--
View this message in context:
h
Oh rats. Thunderbird ate the indenting. The two examples should be:
multipart/alternative
text/plain
multipart/related
text/html
image/gif
image/gif
application/msword
and
multipart/related
text/html
image/
lude wrote:
You also mentioned indexing each bodypart ("attachment") separately.
Why?
To my mind, there is no use case where it makes sense to search a
particular bodypart
I will give you the use case:
[snip]
3.) The result list would show this:
1. mail-1 'subject'
'Abstract of the messa
Hi Johan,
thanks again for the many words and explanations!
You also mentioned indexing each bodypart ("attachment") separately.
Why?
To my mind, there is no use case where it makes sense to search a
particular bodypart
I will give you the use case:
1.) User searches for "abcd"
2.) Luc
lude wrote:
Hi John,
thanks for the detailed answer.
You wrote:
If you're indexing a
multipart/alternative bodypart then index all the MIME headers, but only
index the content of the *first* bodypart.
Does this mean you index just the first file-attachment?
What do you advice, if you have to
essage-
From: lude [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 15, 2006 10:29 AM
To: java-user@lucene.apache.org
Subject: Best Practice: emails and file-attachments
Hello,
does anybody has an idea what is the best design approch for realizing
the following:
The goal is to index
Hi John,
thanks for the detailed answer.
You wrote:
If you're indexing a
multipart/alternative bodypart then index all the MIME headers, but only
index the content of the *first* bodypart.
Does this mean you index just the first file-attachment?
What do you advice, if you have to index mulitp
1 - 100 of 117 matches
Mail list logo