hello all
i've a doubt in spell checker , am creating spell index from my
original index , but my original index itself has some misspelled words. So
i decided to use any proper English dictionary words for my spell checker ,
can any one tell me is there any option in lucene to do my above?
--
View this message in context:
http://old.nabble.com/english-dictionary-for-spelling-tp26672045p26672045.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: java-user-unsubsc
hello all
how do i update my existing index to avoid my duplicates , this is
how am doing my indexing
doc.add(new Field("id",""+i,Field.Store.YES,Field.Index.NOT_ANALYZED));
doc.add(new Field("title", indexForm.getTitle(), Field.Store.YES,
hello all
i've doubt in lucene split words search , for example if i search
for dualcore it should return dual core , how do i split this word ? is
there any analyzer in lucene to do it? please any one help me.
--
View this message in context:
http://old.nabble.com/splitting-words-tp265
What should i do now , could you make me clear ??
Grant Ingersoll-6 wrote:
>
>
> On Nov 24, 2009, at 1:16 AM, m.harig wrote:
>
>>
>> String[] suggestions = spellChecker.suggestSimilar("hoem", 3,indexReader,
>> "contents", true);
>
String[] suggestions = spellChecker.suggestSimilar("hoem", 3,indexReader,
"contents", true);
this is how am retrieving my did you words
Grant Ingersoll-6 wrote:
>
> How are you invoking the spell checker?
>
>
> On Nov 19, 2009, at 1:22 AM,
hello all
is there any way to update the spell index directory ? please any1 help
me out of this.
--
View this message in context:
http://old.nabble.com/updating-spell-index-tp26490695p26490695.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
hello all
i've a doubt in spell checker , when i search for a keyword hoem
am getting the spell results as in the following order (in which am
retrieving 4 suggested words)
form
hold
home
them
my need is to get the home word to be fetched first. But its in the third
position , howeve
Thanks Ian , it works , thanks a lot.
Ian Lea wrote:
>
> Try updateDocument(new Term("id", ""+i), doc).
>
> See javadocs for Term constructors.
>
>
>
> --
> Ian.
>
>
> On Tue, Nov 10, 2009 at 9:47 AM, m.harig wrote:
>>
>&g
Thanks simon ,,
this is my code
doc.add(new Field("id",""+i,Field.Store.YES,Field.Index.NOT_ANALYZED));
doc.add(new Field("title", indexForm.getTitle(), Field.Store.YES,
Field.Index.ANALYZED));
doc.add(new Field("conte
Thanks again
this is my code ,
doc.add(new Field("id",""+i,Field.Store.YES,Field.Index.NOT_ANALYZED));
doc.add(new Field("title", indexForm.getTitle(), Field.Store.YES,
Field.Index.ANALYZED));
doc.add(new Field("contents",
document) this will delete
> the old document and add the new one.
>
> simon
>
> On Tue, Nov 10, 2009 at 10:05 AM, m.harig wrote:
>>
>> hello all,
>>
>> This is my situation , i've multiple indexes , for example , index1 ,
>> index2 ,
hello all,
This is my situation , i've multiple indexes , for example , index1 ,
index2 , index3 ... i've to update the indexes every night . If i open my
IndexWriter create=false (since i want to update the existing index) , am
getting duplicate documents appends with the existing indexes ,
Thanks Erick ,
i understand the issue , but my doubt is when you search for a keyword
which is originally a single word, for example , metacity is really single
keyword . when i search for meta city am not able to get the results , this
is what my doubt ,
if you goto google and search for m
hello all
i've a doubt in search , i've a word in my index welcomelucene (without
spaces) , when i search for welcome lucene(with a space) , am not able to
get the hits. It should pick the document welcomelucene.. is there anyway to
do it ? i've used wildcard option too. but no results , ple
Thanks erick ,
It works fine , if i use the (code snippet found from nabble) same
analyzer for both indexing & querying .
But the highlighter has gone for plural words. Hope i need to search more ,
i'll come back to you once if i can't find out. Thanks again erick.
--
View this message in
thanks erick ,
A little more information would help here.1> Are you using the same analyzer
at both index and query time?
no . sorry , am using StandardAnalyzer at the index time , during querying
am using the code snippet found from nabble.
2> Assuming <1> is "yes", did you re-index your data
hello all
i've a doubt in plural & singular word searching , i've got code
snippet from nabble forum ,
private static Analyzer createEnglishAnalyzer() {
return new Analyzer() {
public TokenStream tokenStream(String fieldName, Reader reader)
{
TokenStream result =
which is an IndexReader on top of various
> Sub-IndexReaders.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>> -----Original Message-
>> From: m.harig [mailto:m.ha...@gmail.com]
>> Se
hello all ,
am merging more than one indexes to search a document , how do i use
IndexReader here to open multiple indexes? (since IndexReader will open one
directory at a time) could any1 please suggest me?
--
View this message in context:
http://www.nabble.com/index-reader-for-multip
Thanks Ahmet , i found the solution. thanks a lot
Ahmet Arslan wrote:
>
>
>> hello all, is there any way to get all
>> tokens from my index ? please anyone
>> suggest me
>
> The code below prints all terms of a field.
>
>String path = "E:\\ThesaurusSolrHome\\data\\index";
>St
hello all ,
is there any way to get all tokens from my index ? please anyone
suggest me
--
View this message in context:
http://www.nabble.com/get-all-tokens-from-index-tp25359411p25359411.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
--
Hello
Will my reader.reopen() method work on windows machine when the index get
updated? i mean my tomcat server will allow the reader to update my index?
please help me.
--
View this message in context:
http://www.nabble.com/reading-index-tp24862928p24875673.html
Sent from the Lucene - Java
Thanks,
this is my code snippet
public void doSearch(){
..
Query query =
.
IndexSearcher searcher = new IndexSearcher(directory);
hello all,
thanks to lucene. Am using lucene 2.4.0 for my application. My
doubt is , can i read the index for many number of times? i mean , i've a
search application which reads the index , which is 300MB in size, am
reading my index at every time the user hits the page . Is it goo
Hello
Do you've any idea about the integration of Lucene with Hadoop
BrickMcLargeHuge wrote:
>
> Hey all,
>
> I just wanted to send a link to a presentation I made on how my
> company is building its entire core BI infrastructure around Hadoop,
> HBase, Lucene, and more. It fea
Thanks all,
but how nutch handle this problem? am aware of nutch but not in
depth. If i search the keyword "about us" , nutch gives me exactly what i
want. Is there any scoring techinques? please let me know.
--
View this message in context:
http://www.nabble.com/Searching-doubt-tp2
Thanks ,
i've noticed that , but the code is for known tokens, how do i
do it for dynamic tokens , meaning , i don't know the urls , someone picked
up the urls and i'll index it. Is there any technique to use while indexing
? am using lucene 2.4.0 version. Please suggest me.
--
Vie
Thanks for your reply,
my original code snippet is
IndexSearcher searcher = new IndexSearcher(indexDir);
Analyzer analyzer = new StopAnalyzer();
BooleanClause.Occur[] flags = { BooleanClause.Occur.SHOULD,
Boolea
Thanks
This is my codw snippet
IndexSearcher searcher = new IndexSearcher(indexDir);
Analyzer analyzer = new StopAnalyzer();
WildcardQuery query = new WildcardQuery(new
Term(DEFAULT_FIELD));
searcher.search(
Thanks all ,
Very thankful to all , am tired of hadoop settings , is it
good to use read such type large index with lucene alone? will it go for OOM
? anyone pl suggest me.
--
View this message in context:
http://www.nabble.com/indexing-100GB-of-data-tp24600563p24620846.html
Sent
Is there any article or forum for using Hadoop with lucene? Please any1 help
me
--
View this message in context:
http://www.nabble.com/indexing-100GB-of-data-tp24600563p24605164.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
Thanks Shai
So there won't be problem when searching that kind of large index
. am i right?
Can anyone tell me is it possible to use hadoop with lucene??
--
View this message in context:
http://www.nabble.com/indexing-100GB-of-data-tp24600563p24602064.html
Sent from the
hello all
We've got 100GB of data which has doc,txt,pdf,ppt,etc.., we've
separate parser for each file format, so we're going to index those data by
lucene. (since we scared of Nutch setup , thats why we didn't use it) My
doubt is , will it be scalable when i index those dcouments ?
hello all ,
am using .Net lucene for my search application , how do i index non
english pages ? Is there any analyzers to do it?? because am struggling with
utf8 problem , please any1 help me
--
View this message in context:
http://www.nabble.com/.net-lucene-doubt-tp24510928p24510928.html
hello all ,
am using .Net lucene for my search application , how do i index non
english pages ? Is there any analyzers to do it?? because am struggling with
utf8 problem , please any1 help me
--
View this message in context:
http://www.nabble.com/.net-lucene-doubt-tp24510918p24510918.html
Thanks Uwe,
can you please give me a code snippet , so that i can resolve my
issue , please
The correct way to iterate over all results is to use a custom HitCollector
(Collector in 2.9) instance. The HitCollector's method collect(docid, score)
is called for every hit. No need to a
Thanks eric
in Ian's link, particularly see the section "Don't iterate over morehits
than necessary".
A couple of other things:
1> Loading the entire document just to get a field or two isn't
very efficient, think about lazy loading (See FieldSelector)
i done it , but have couple of ques
Hi there,
On Tue, Jun 30, 2009 at 12:41 PM, m.harig wrote:
>
> Thanks Simon ,
>
> Its working now , thanks a lot , i've a doubt
>
> i've got 30,000 pdf files indexed , but if i use the code which you
> sent , returns only 200 results , becau
Thanks Simon ,
Its working now , thanks a lot , i've a doubt
i've got 30,000 pdf files indexed , but if i use the code which you
sent , returns only 200 results , because am setting TopDocs topDocs =
searcher.search(query,200); as i said if use Integer.MAX_VALUE , it return
hello all,
i've gone through most of the posts from this forum , i need a code
snippet for searching large index, currently am iterating ,
hits = searher.search(query);
for (int inc = 0; inc < hits.length(); inc++) {
Document doc = hits.doc(inc);
Thanks SImon ,
Example:
IndexReader open = IndexReader.open("/tmp/testindex/");
IndexSearcher searcher = new IndexSearcher(open);
final String fName = "test";
is fName a field like summary , contents??
TopDocs topDocs = searcher.search(new TermQuery(new Term(fName,
"lucene")),
Thanks Simon ,
Hey there, that makes things easier. :)
ok here are some questions:
>>>Do you iterate over all docs calling hits.doc(i) ?If so do you have to
load all fields to render your results, if not you should not retrieve
all of them?
Yes, am iterating over all docs by calling hits.doc
Thanks again,
Did i index my files correctly, please need some tips, the following
is the error when i run my keyword , i typed pdf , thats it , because i've
got around 30,000 files named pdf,
HTTP Status 500 -
type Exception report
message
description The server encountered a
Thanks Simon ,
This is how am indexing my documents ,
indexWriter.addDocument(doc, new StopAnalyzer());
indexWriter.setMergeFactor(10);
indexWriter.setMaxBufferedDocs(100);
indexWriter.setMaxMergeDocs(Integer.MAX_VA
Thanks Simon
I don't run any application on the tomcat , moreover i restarted
it , am not doing any jobs except searching , we've a 500GB drive , we've
indexed around 100,000 documents , it gives me around 1GB index . When i
tried to search pdf i got the heap space error ,
--
View t
Simon Willnauer wrote:
>
> On Mon, Jun 29, 2009 at 1:48 PM, m.harig wrote:
>>
>>
>>
>> Simon Willnauer wrote:
>>>
>>> Hey there,
>>> before going out to use hadoop (hadoop mailing list would help you
>>> better I guess) you co
> - how much heap space
> - where does the OOM occure
>
> or maybe there is already an issue that is related to you like this
> one: https://issues.apache.org/jira/browse/LUCENE-1566
>
> simon
>
> On Mon, Jun 29, 2009 at 12:49 PM, m.harig wrote:
>>
>> hello a
hello all
Am doing a search application on lucene, its working fine when my
index size is small, am getting java heap space error when am using large
size index, i came to know about hadoop with lucene to solve this problem ,
but i don't have any idea about hadoop , i've searched thru th
Hello all
Can anyone tell me what is the difference between query.setBoost()
and doc.setBoost()... More over if use query.setBoost(4.0f) am not able
to boost my results . which one makes my results better please anyone
help me out of this...
--
View this message in context:
Hello all
i've a search application running on lucene-2.3.0 , say
for example am indexing 10 urls as an input , when am searching am not able
to get the expected result at the best ranking, i.e, unrelated hits are
coming up rather than related hits. I've been working this for a w
Hello all,
i've a search application which uses lucene-2.3.0 , and my
application running for a banking domain. Am indexing some banking urls as
an input and am searching some keywords. What my doubt is when i search
"cards", the less count keyword url comes up. I mean , for exa
hi all.
am indexing a price field by
doc.add(new Field("price", "1450", Field.Store.YES,
Field.Index.TOKENIZED));
doc.add(new Field("price", "3800", Field.Store.YES,
Field.Index.TOKENIZED));
doc.add(new Field("pri
53 matches
Mail list logo