Hi, all
I want to know the different between the search without rangefilter and
the search with rangefilter. Is the letter more slow than the latter?
Thanks
Sen Zhou
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additio
On 4/21/06, Dragon Fly <[EMAIL PROTECTED]> wrote:
> I don't think the SynonymAnalyzer described in LIA would work because
> some of my "synonyms" contain multiple words.
The SynonymFilter in Solr can handle multi-word synonyms.
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
http://in
On Apr 21, 2006, at 11:56 AM, Malcolm Clark wrote:
has anyone attempted to index/search the Reuters collection which
consists of SGML?
Mine seems to run through the process okay but alas I'm left with
nothing in the index when I check with Luke or my own Search Engine.
Anyone got any hints
Hi,
I am trying to get the frequency of a phrase using the SpanNearQuery.
How can I use SpanNearQuery for boolean queries. The code I have is
for a single query. How can I extend this for multiple queries
SpanTermQuery[] phrase = new SpanTermQuery[phraseTerms.length];
for(int termCount=0; termCou
Okay converting to XML sounds like a great option.
Thanks,
Malcolm
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Some months ago I created an index from the reuters collection. I converted
the SGML files to XML using a tool that I've found somewhere on the net
(just google for it), then I parsed the files to create the index, using a
standard DOM parser. If you have problems parsing the SGML files I think you
Hi all,
I didn't know whether to add this to the thread asking about TREC indexing or
start a new one.
Anyway, has anyone attempted to index/search the Reuters collection which
consists of SGML?
Mine seems to run through the process okay but alas I'm left with nothing in
the index when I check w
Hi,
What is the best way to implement the following?
Document 1 contains the following text:
"THE CZECH REPUBLIC ORGANIZATION"
Document 2 contains the following text:
"THE CZE ORGANISATION"
Synonym rules:
(1) CZECH REPUBLIC --> CZE
(2) CZE --> CZECH REPUBLIC
(3) ORGANIZATION --> ORG, ORGA
See the "Lucene In Action" book or my ApacheCon talk at
http://www.cnlp.org/apachecon2005. Both of these have examples.
trupti mulajkar wrote:
hi
can anyone suggest how to can generate document and query vectors containing the
term frequency from Lucene index.
i need it to implement vector s
hi
can anyone suggest how to can generate document and query vectors containing the
term frequency from Lucene index.
i need it to implement vector space model using Wordnet.
cheers,
trupti mulajkar
MSc Advanced Computer Science
Lucene can index the trec documents, but depends how you want to index them.
If you want to index the sub files in the TREC DAtA then you have to modify the
IndexFiles.java to read the tags else you can index them normally.
cheers,
trupti mulajkar
Quoting thanh nguyen <[EMAIL PROTECTED]>:
> Hi
Simon,
I wonder if using Zoe might do the trick - http://guests.evectors.it/zoe/
Have you tried it?
- Dmitry
From: Fisheye [mailto:[EMAIL PROTECTED]
Sent: Fri 4/21/2006 7:23 AM
To: java-user@lucene.apache.org
Subject: Lucene - FileFormat
Im trying to const
Hi all,
Did anyone use Lucene to index WT10G? Can it index
WT10G in compressed format (.gz) or we have to unzip
it first?
Further more, does Lucene support TREC format? I mean
can it receive a topic file like " 1
abc def " and produce a results file
which we can use with trec_eval program?
A
This is a puzzler, I'm not sure if I'm doing something wrong or whether
I have a poisoned document, a corrupted index (failing to close my
IndexModifier properly?) or what. The setup is this: I have two
processes (the backend and frontend of a CMS) that run in two different
VMs -- both use Luc
Im trying to construct a plaintext parser for different file formats like ms
word, excel, powerpoint, rich text format, plain text, html, pdf etc.
I use the known libraries PDFBox, POI and some parts from AtLeap...and now I
should support the OpenOffice formats and the more important msg-fromat (
Thks for the reply, perhaps to use something like in Luke is the best
option.
My idea to do is a TAGcloud (see the example in this page) for every
group(field group with the id) and every portal (with the id).
The problem is that I think do reader.terms() is not the best option in
my case, becau
Hi,
If I have correctly understood your question, you want the terms in a
field with the maximum number of occurences.
Try luke [www.getopt.org/*luke*/].
Or else in case you are not able to initialize graphical content on your
system.
You may use the following script.
src/org/getopt/luke/HighF
17 matches
Mail list logo