Well, thanks, sounds like the best option to me. Does anybody use the
PerFieldAnalyzerWrapper? I'm just curious to know if there is any impact on the
performances when using different analyzers.
Mélanie
-Original Message-
From: Doron Cohen [mailto:[EMAIL PROTECTED]
Sent: Thursday, Ma
If language is known also at search time, PerFieldAnalyzerWrapper seems a
nice third option: single document per feed, with a separate field for each
language, additional field(s) for the common data; using
PerFieldAnalyzerWrapper at both indexing and search; using FieldSelector
at search to retr
OOPs!!!
Sorry,
My last message has come here by mistake. It was for someone else, It is just a
silly mistake.
sorry People.
- Original Message
From: aslam bari <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, 22 March, 2007 12:12:57 PM
Subject: Re: indexing rss feed
Hi,
Have a look to my resume attached with the mail. if it suits you, let me know.
Thanks...
- Original Message
From: Melanie Langlois <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, 22 March, 2007 11:33:03 AM
Subject: indexing rss feeds in multiple languages
Hi,
Martin,
This sounds like the spellchecker dictionary needs to be built in parallel with
the main Lucene index. Is it possible to create a dictionary out of an
existing (and no longer modified) Lucene index?
Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.si
Lokeya <[EMAIL PROTECTED]> wrote on 21/03/2007 22:09:06:
>
> Initially I was writing into the Index 7,00,000 times. I chaged the code
to
> now write only 70 times which means I am putting lot of data in an array
> list and add to doc and index at one shot. This is where the improvement
> came from
Hi,
I saw that there are many post on the mailing list about indexing in multiple
language, so I will try to not post duplicate question. In my case, I want to
index rss feeds, so one feed contains several items in different languages, and
some common data for all the items (date, source..).
I have indexed objects that contain one or more attachments. Each attachment is
indexed as a separate Document along with the object metadata.
When I make a search, I may get hits in more than one Document that refer to the
same object. I have a HitCollector which knows if the object has alre
Initially I was writing into the Index 7,00,000 times. I chaged the code to
now write only 70 times which means I am putting lot of data in an array
list and add to doc and index at one shot. This is where the improvement
came from. To be precise IndexWriter is now adding document 70 times Vs.
7,0
there's a few options...
you can define a custom SImilarity that makes the score based entirely on
the sloppyFreq ... it's not trivial, but it's certainly possible.
the other option is to call SpanQuery.getSpans directly, and then iterate
over it and compare end() - start() for each span.
: Dat
: IndexReader class for lazy field loading, the search API in IndexSearcher
: does not contain such facilities. Hence, the Documents I get from the
: Hits.doc() would not benefit from the mentioned feature.
Lazy loading stored fields is really about perfermance tweaking ... if
yoiu are that conce
You all rock. I'm clearing the semi-official legal hurdle with my CTO and
our head counsel to full (or something close to full) disclosure of some of
the architectural details, so stay tuned for as much as I'm allowed to share
(and btw, for any of you that live/work/vacation in the SF Bay area, I
Hi
I am seeking for making use of the latest lazy field loading in lucene 2.1.
I store the orignal bytes of a document, say a PDF file for example, in a
special untokenized field in the index. Though there is enough facilities in
IndexReader class for lazy field loading, the search API in IndexSea
: Care to write up a Use Case when you have a few spare cycles? http://
: wiki.apache.org/lucene-java/UseCases
Oh, Oh OH! ... competing requests for wiki submisions: Can you add some of
the info about the performance numbers you are seeing to...
http://wiki.apache.org/solr/SolrPerforman
Hi Cass,
Care to write up a Use Case when you have a few spare cycles? http://
wiki.apache.org/lucene-java/UseCases
-Grant
On Mar 20, 2007, at 4:49 PM, Cass Costello wrote:
Heh - it used to be in my sig ... my bad.
Thanks, all. :)
http://www.stubhub.com
On 3/20/07, bruce <[EMAIL PROTEC
The dictionary is generated from the corpus, with the result that a larger
corpus gives better results.
Words are queued up during an index run, and at the end are munged to create
an optimized dictionary. It also supports incremental building, though the
overhead would be too much for those appl
On 3/21/07, Peter Keegan <[EMAIL PROTECTED]> wrote:
On a similar topic, has anybody measured query performance as a function of
index size?
Well, I did and the results surprised me. I measured query throughput on 8
indexes that varied in size from 55,000 to 4.4 million documents. When
plotted on
On a similar topic, has anybody measured query performance as a function of
index size?
Well, I did and the results surprised me. I measured query throughput on 8
indexes that varied in size from 55,000 to 4.4 million documents. When
plotted on a graph, there is a distinct hyperbolic curve (1/x).
Sorry, I don't think there is any POI in my future :-) Long story.
Maybe I'll blog about it or something. Stay tuned.
I have another project that I'm interested in spending time on. Not
sure if it's going to be open source at this point but it will utilize
the textmining.org library so I plan on
Last I remember, it was being voted on by the Incubator committee.
Good to hear TextMining is back in action! Does that mean you are
back on POI Word again too?
-Grant
On Mar 20, 2007, at 10:35 PM, Ryan Ackley wrote:
Someone pointed me there already. Looks interesting. Is there a
mailing
Is it a fair restatement of your problem that you want to generate
a list of all children of a node? That's what I'm reading.
Would it work for you to store the complete ancestry in each node?
By that I mean (from your example),
NOTE: it's no problem in Lucene to store different values for t
Hi,
first, thanks for this great a resource, and sorry if i am oversimplfying a few
things, i am still rather new to Lucene.
I have been thinking how to integrate my app with Lucene - it is a CMS type
system that has documents organized in a tree-style layout. A few facts about
the system:
-
22 matches
Mail list logo