Hi,
I am working on lucene.I had seen ur suggestion about lucene in google
search.Iam facing some problems in searching.Please go through my sample code
and suggest me where i had gone wrong.
I will be thankful to you.
This is my sample code:
private static Document createDocument(File f
Hey all I am working on a project that requires a search engine on a
embedded linux that is also bluetooth capable. Is there a lucene mobile or
can I recompile the code in the J2me wireless toolkit ? Any help would be
appreciated, Thanks
--
: Yes, when I say "duplicate" sentences, they are exact copies of the same
: string.
you still haven't explained how you indexed these sentences, what do you
mean by "each lucene document actually contains exactly one sentence." ?
Did you tokenize the sentence into one field? do you a field for
I'd have to see your indexing code to see if there are any obvious
performance gotchas there. If you can run your indexer under a
profiler (OptimizeIt, JProbe, or just the free one with java using
-Xprof), it will tell you in which methods most of your CPU time is
spent. If you're using StandardA
Hi David,
>>
I would like to poll the community's opinion on good strategies for identifying
duplicate documents in a lucene index.
>>
Do you mean 100% duplicates or some kind of similarity?
>>
Obviously the brute force method of pairwise compares would take forever. I
have tried
grouping sen
Thanks for the quick reply, Chris.
Yes, when I say "duplicate" sentences, they are exact copies of the same string.
The MD5 hash is a good idea, I wish I had thought of it earlier as it would have
saved me a lot of trouble. Right now it is not feasible to reindex again because
indexing is a very
Dave,
Can you define exactly what you consider "duplicate sentences"? Is it
the same exact string, or the same words in the same order, or the
same words in any order, etc?
If you can normalize each sentence first, so two "duplicate" sentences
are always the exact same string, then you should be
Hi,
I would like to poll the community's opinion on good strategies for identifying
duplicate documents in a lucene index.
You see, I have an index containing roughly 25 million lucene documents. My task
requires me to work at sentence level so each lucene document actually contains
exactly one s
Hi,
I prepared a dictionary application which uses Lucene.
I made my application to be downloaded with webstart.
Everything is OK, but I can't access Lucene index files.
When I made a search on the internet about the subject,
I found some clues saying that it is impossible to put Lucene
indexes in
Joshua Slive wrote:
On Sat, 11 Jun 2005, Erik Hatcher wrote:
On Jun 11, 2005, at 1:08 PM, Chris Lu wrote:
Thanks.
Somehow I found the "Powered By" Lucene page is "Immutable Page",
even if I logged in.
http://wiki.apache.org/jakarta-lucene/PoweredBy
Wow, it sure is. I'm CC'ing infra
Thanks, guys!
I have made the changes to the wiki, following Joshua's advice.
It's the cookie/refreshing problem.
Chris Lu
Joshua Slive wrote:
On Sat, 11 Jun 2005, Erik Hatcher wrote:
On Jun 11, 2005, at 1:08 PM, Chris Lu wrote:
Thanks.
Somehow I found the "Powered By" Lucene page is
11 matches
Mail list logo