Erik Hatcher wrote:
>
> On May 27, 2005, at 12:14 PM, Gusenbauer Stefan wrote:
>
>> Max Pfingsthorn wrote:
>>
>>
>>> Hi!
>>>
>>> Thanks for the reply. I figured already that fields are actually
>>> not tokenized... I lost track of the filenames/dirnames and there
>>> were some duplicates...
>>>
Dear Rasha,
Sorry for the delay, I've indexed Arabic and English seamlessly on
Lucene, the only thing you have to watch out for is stemming, as for
indexing PDFs, I have not used that part of the API, but from
experience, this comes down to using or in some cases forcing the
correct encoding,
On May 27, 2005, at 12:14 PM, Gusenbauer Stefan wrote:
Max Pfingsthorn wrote:
Hi!
Thanks for the reply. I figured already that fields are actually
not tokenized... I lost track of the filenames/dirnames and there
were some duplicates...
About case-insensitivity: Okay, I can make my qu
Hello,
The "results by YAHOO! search" is a marketing thing that we have no
real control over. I promise that the actual project search engine
is using Lucene.
As for defaulting to OR, it was decided that the new search should
function as similarly as possible to the old search system by
On May 27, 2005, at 11:22 AM, Max Pfingsthorn wrote:
Hi!
In my application, I index some strings (like filenames)
untokenized, meaning via
doc.add(new Field(FIELD,VALUE,false,true,false));
When I later take a look at it with Luke, I still get tokens of the
filenames (like "news" instead
Also, see if
http://wiki.apache.org/jakarta-lucene/IndexingOtherLanguages helps
at all.
>>> [EMAIL PROTECTED] 5/27/2005 12:09:32 PM >>>
Probably your Unix system has a different default encoding than your
Windows
machine.
You have to make sure you give the IndexWriter a string that has the
corre
Max Pfingsthorn wrote:
>Hi!
>
>Thanks for the reply. I figured already that fields are actually not
>tokenized... I lost track of the filenames/dirnames and there were some
>duplicates...
>
>About case-insensitivity: Okay, I can make my query lower case, but my strings
>in the field are not...
Probably your Unix system has a different default encoding than your Windows
machine.
You have to make sure you give the IndexWriter a string that has the correct
encoding.
Do you specifically set the encoding in you code before you index it with
Lucene?
Ross
-Original Message-
From: gau
Hi!
Thanks for the reply. I figured already that fields are actually not
tokenized... I lost track of the filenames/dirnames and there were some
duplicates...
About case-insensitivity: Okay, I can make my query lower case, but my strings
in the field are not... I guess I have to do that manual
Max Pfingsthorn wrote:
>Hi!
>
>In my application, I index some strings (like filenames) untokenized, meaning
>via
>
>doc.add(new Field(FIELD,VALUE,false,true,false));
>
>When I later take a look at it with Luke, I still get tokens of the filenames
>(like "news" instead of "news-item.xml") in the
Hi,
I haven't got no utf-8 index when I use Lucene with Solaris while my
characters are OK under windows. My indexing program is the same and it
uses lucene 1.4.3.
Is someone have an Idea to help me?
Regards,
Arnaud.
-
To
Hi,
I'm having problems with the Lucene optimization. Two of the indexes are
about 2BG big and every day about 30 documents are added to each of these
indexes. At the end of the indexing the IndexWriter optimize() method is
executed and it takes about 30 minutes to finish the optimization for each
what you can do is open the index and loop through all the documents in
decending order.
the code below will explain more.
Directory dir = FSDirectory.getDirectory( args[ 0 ], false );
IndexReader reader = IndexReader.open( dir );
int numDocs = reader.numDocs();
HashSet items = new HashSet( size
Hi!
In my application, I index some strings (like filenames) untokenized, meaning
via
doc.add(new Field(FIELD,VALUE,false,true,false));
When I later take a look at it with Luke, I still get tokens of the filenames
(like "news" instead of "news-item.xml") in the list of most frequent terms.
Sh
Hi,
I found Sourceforge's search is still "results by YAHOO! search". What
does that mean?
And currently, seems the search condition for the keywords is still OR,
not AND.
This makes search for "lucene java" returns a long list, yet without the
one I wanted in the first several rows.
Chris L
hi,
Lucene is greate project to serve as a source code
search engine.
I had made a source code search engine based on
lucene , it perfermance very well.
unforturnately , my version is chinese version.
the url is ;
http://www.domolo.com/domolo/ctrlc/index.aspx
it search 101732 j
16 matches
Mail list logo