english dictionary for spelling

2009-12-06 Thread m.harig
hello all i've a doubt in spell checker , am creating spell index from my original index , but my original index itself has some misspelled words. So i decided to use any proper English dictionary words for my spell checker , can any one tell me is there any option in lucene to do my above?

english dictionary for spelling

2009-12-06 Thread m.harig
-- View this message in context: http://old.nabble.com/english-dictionary-for-spelling-tp26672045p26672045.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubsc

updating index

2009-12-04 Thread m.harig
hello all how do i update my existing index to avoid my duplicates , this is how am doing my indexing doc.add(new Field("id",""+i,Field.Store.YES,Field.Index.NOT_ANALYZED)); doc.add(new Field("title", indexForm.getTitle(), Field.Store.YES,

splitting words

2009-11-30 Thread m.harig
hello all i've doubt in lucene split words search , for example if i search for dualcore it should return dual core , how do i split this word ? is there any analyzer in lucene to do it? please any one help me. -- View this message in context: http://old.nabble.com/splitting-words-tp265

Re: did you mean issue

2009-11-24 Thread m.harig
What should i do now , could you make me clear ?? Grant Ingersoll-6 wrote: > > > On Nov 24, 2009, at 1:16 AM, m.harig wrote: > >> >> String[] suggestions = spellChecker.suggestSimilar("hoem", 3,indexReader, >> "contents", true); >

Re: did you mean issue

2009-11-23 Thread m.harig
String[] suggestions = spellChecker.suggestSimilar("hoem", 3,indexReader, "contents", true); this is how am retrieving my did you words Grant Ingersoll-6 wrote: > > How are you invoking the spell checker? > > > On Nov 19, 2009, at 1:22 AM,

updating spell index

2009-11-23 Thread m.harig
hello all is there any way to update the spell index directory ? please any1 help me out of this. -- View this message in context: http://old.nabble.com/updating-spell-index-tp26490695p26490695.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

did you mean issue

2009-11-18 Thread m.harig
hello all i've a doubt in spell checker , when i search for a keyword hoem am getting the spell results as in the following order (in which am retrieving 4 suggested words) form hold home them my need is to get the home word to be fetched first. But its in the third position , howeve

Re: remove duplicate when merging indexes

2009-11-10 Thread m.harig
Thanks Ian , it works , thanks a lot. Ian Lea wrote: > > Try updateDocument(new Term("id", ""+i), doc). > > See javadocs for Term constructors. > > > > -- > Ian. > > > On Tue, Nov 10, 2009 at 9:47 AM, m.harig wrote: >> >&g

Re: remove duplicate when merging indexes

2009-11-10 Thread m.harig
Thanks simon ,, this is my code doc.add(new Field("id",""+i,Field.Store.YES,Field.Index.NOT_ANALYZED)); doc.add(new Field("title", indexForm.getTitle(), Field.Store.YES, Field.Index.ANALYZED)); doc.add(new Field("conte

Re: remove duplicate when merging indexes

2009-11-10 Thread m.harig
Thanks again this is my code , doc.add(new Field("id",""+i,Field.Store.YES,Field.Index.NOT_ANALYZED)); doc.add(new Field("title", indexForm.getTitle(), Field.Store.YES, Field.Index.ANALYZED)); doc.add(new Field("contents",

Re: remove duplicate when merging indexes

2009-11-10 Thread m.harig
document) this will delete > the old document and add the new one. > > simon > > On Tue, Nov 10, 2009 at 10:05 AM, m.harig wrote: >> >> hello all, >> >>   This is my situation ,  i've multiple indexes , for example , index1 , >> index2 ,

remove duplicate when merging indexes

2009-11-10 Thread m.harig
hello all, This is my situation , i've multiple indexes , for example , index1 , index2 , index3 ... i've to update the indexes every night . If i open my IndexWriter create=false (since i want to update the existing index) , am getting duplicate documents appends with the existing indexes ,

Re: search problem

2009-10-29 Thread m.harig
Thanks Erick , i understand the issue , but my doubt is when you search for a keyword which is originally a single word, for example , metacity is really single keyword . when i search for meta city am not able to get the results , this is what my doubt , if you goto google and search for m

search problem

2009-10-29 Thread m.harig
hello all i've a doubt in search , i've a word in my index welcomelucene (without spaces) , when i search for welcome lucene(with a space) , am not able to get the hits. It should pick the document welcomelucene.. is there anyway to do it ? i've used wildcard option too. but no results , ple

Re: singular and plural search

2009-10-21 Thread m.harig
Thanks erick , It works fine , if i use the (code snippet found from nabble) same analyzer for both indexing & querying . But the highlighter has gone for plural words. Hope i need to search more , i'll come back to you once if i can't find out. Thanks again erick. -- View this message in

Re: singular and plural search

2009-10-21 Thread m.harig
thanks erick , A little more information would help here.1> Are you using the same analyzer at both index and query time? no . sorry , am using StandardAnalyzer at the index time , during querying am using the code snippet found from nabble. 2> Assuming <1> is "yes", did you re-index your data

singular and plural search

2009-10-21 Thread m.harig
hello all i've a doubt in plural & singular word searching , i've got code snippet from nabble forum , private static Analyzer createEnglishAnalyzer() { return new Analyzer() { public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream result =

RE: index reader for multiple indexes

2009-10-02 Thread m.harig
which is an IndexReader on top of various > Sub-IndexReaders. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > >> -----Original Message- >> From: m.harig [mailto:m.ha...@gmail.com] >> Se

index reader for multiple indexes

2009-10-02 Thread m.harig
hello all , am merging more than one indexes to search a document , how do i use IndexReader here to open multiple indexes? (since IndexReader will open one directory at a time) could any1 please suggest me? -- View this message in context: http://www.nabble.com/index-reader-for-multip

Re: get all tokens from index

2009-09-09 Thread m.harig
Thanks Ahmet , i found the solution. thanks a lot Ahmet Arslan wrote: > > >> hello all, is there any way to get all >> tokens from my index ? please anyone >> suggest me > > The code below prints all terms of a field. > >String path = "E:\\ThesaurusSolrHome\\data\\index"; >St

get all tokens from index

2009-09-08 Thread m.harig
hello all , is there any way to get all tokens from my index ? please anyone suggest me -- View this message in context: http://www.nabble.com/get-all-tokens-from-index-tp25359411p25359411.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --

RE: reading index

2009-08-08 Thread m.harig
Hello Will my reader.reopen() method work on windows machine when the index get updated? i mean my tomcat server will allow the reader to update my index? please help me. -- View this message in context: http://www.nabble.com/reading-index-tp24862928p24875673.html Sent from the Lucene - Java

Re: reading index

2009-08-07 Thread m.harig
Thanks, this is my code snippet public void doSearch(){ .. Query query = . IndexSearcher searcher = new IndexSearcher(directory);

reading index

2009-08-07 Thread m.harig
hello all, thanks to lucene. Am using lucene 2.4.0 for my application. My doubt is , can i read the index for many number of times? i mean , i've a search application which reads the index , which is 300MB in size, am reading my index at every time the user hits the page . Is it goo

Re: A Presentation on Building a Hadoop + Lucene System Architecture

2009-08-04 Thread m.harig
Hello Do you've any idea about the integration of Lucene with Hadoop BrickMcLargeHuge wrote: > > Hey all, > > I just wanted to send a link to a presentation I made on how my > company is building its entire core BI infrastructure around Hadoop, > HBase, Lucene, and more. It fea

Re: Searching doubt

2009-08-04 Thread m.harig
Thanks all, but how nutch handle this problem? am aware of nutch but not in depth. If i search the keyword "about us" , nutch gives me exactly what i want. Is there any scoring techinques? please let me know. -- View this message in context: http://www.nabble.com/Searching-doubt-tp2

Re: Searching doubt

2009-08-04 Thread m.harig
Thanks , i've noticed that , but the code is for known tokens, how do i do it for dynamic tokens , meaning , i don't know the urls , someone picked up the urls and i'll index it. Is there any technique to use while indexing ? am using lucene 2.4.0 version. Please suggest me. -- Vie

Re: Searching doubt

2009-08-04 Thread m.harig
Thanks for your reply, my original code snippet is IndexSearcher searcher = new IndexSearcher(indexDir); Analyzer analyzer = new StopAnalyzer(); BooleanClause.Occur[] flags = { BooleanClause.Occur.SHOULD, Boolea

Re: Searching doubt

2009-08-03 Thread m.harig
Thanks This is my codw snippet IndexSearcher searcher = new IndexSearcher(indexDir); Analyzer analyzer = new StopAnalyzer(); WildcardQuery query = new WildcardQuery(new Term(DEFAULT_FIELD)); searcher.search(

RE: indexing 100GB of data

2009-07-23 Thread m.harig
Thanks all , Very thankful to all , am tired of hadoop settings , is it good to use read such type large index with lucene alone? will it go for OOM ? anyone pl suggest me. -- View this message in context: http://www.nabble.com/indexing-100GB-of-data-tp24600563p24620846.html Sent

Re: indexing 100GB of data

2009-07-22 Thread m.harig
Is there any article or forum for using Hadoop with lucene? Please any1 help me -- View this message in context: http://www.nabble.com/indexing-100GB-of-data-tp24600563p24605164.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

Re: indexing 100GB of data

2009-07-22 Thread m.harig
Thanks Shai So there won't be problem when searching that kind of large index . am i right? Can anyone tell me is it possible to use hadoop with lucene?? -- View this message in context: http://www.nabble.com/indexing-100GB-of-data-tp24600563p24602064.html Sent from the

indexing 100GB of data

2009-07-21 Thread m.harig
hello all We've got 100GB of data which has doc,txt,pdf,ppt,etc.., we've separate parser for each file format, so we're going to index those data by lucene. (since we scared of Nutch setup , thats why we didn't use it) My doubt is , will it be scalable when i index those dcouments ?

.net lucene doubt

2009-07-15 Thread m.harig
hello all , am using .Net lucene for my search application , how do i index non english pages ? Is there any analyzers to do it?? because am struggling with utf8 problem , please any1 help me -- View this message in context: http://www.nabble.com/.net-lucene-doubt-tp24510928p24510928.html

.net lucene doubt

2009-07-15 Thread m.harig
hello all , am using .Net lucene for my search application , how do i index non english pages ? Is there any analyzers to do it?? because am struggling with utf8 problem , please any1 help me -- View this message in context: http://www.nabble.com/.net-lucene-doubt-tp24510918p24510918.html

RE: Read large size index

2009-06-30 Thread m.harig
Thanks Uwe, can you please give me a code snippet , so that i can resolve my issue , please The correct way to iterate over all results is to use a custom HitCollector (Collector in 2.9) instance. The HitCollector's method collect(docid, score) is called for every hit. No need to a

Re: optimized searching

2009-06-30 Thread m.harig
Thanks eric in Ian's link, particularly see the section "Don't iterate over morehits than necessary". A couple of other things: 1> Loading the entire document just to get a field or two isn't very efficient, think about lazy loading (See FieldSelector) i done it , but have couple of ques

Re: Read large size index

2009-06-30 Thread m.harig
Hi there, On Tue, Jun 30, 2009 at 12:41 PM, m.harig wrote: > > Thanks Simon , > >          Its working now , thanks a lot , i've a doubt > >       i've got 30,000 pdf files indexed ,  but if i use the code which you > sent , returns only 200 results , becau

Re: Read large size index

2009-06-30 Thread m.harig
Thanks Simon , Its working now , thanks a lot , i've a doubt i've got 30,000 pdf files indexed , but if i use the code which you sent , returns only 200 results , because am setting TopDocs topDocs = searcher.search(query,200); as i said if use Integer.MAX_VALUE , it return

optimized searching

2009-06-29 Thread m.harig
hello all, i've gone through most of the posts from this forum , i need a code snippet for searching large index, currently am iterating , hits = searher.search(query); for (int inc = 0; inc < hits.length(); inc++) { Document doc = hits.doc(inc);

Re: Read large size index

2009-06-29 Thread m.harig
Thanks SImon , Example: IndexReader open = IndexReader.open("/tmp/testindex/"); IndexSearcher searcher = new IndexSearcher(open); final String fName = "test"; is fName a field like summary , contents?? TopDocs topDocs = searcher.search(new TermQuery(new Term(fName, "lucene")),

Re: Read large size index

2009-06-29 Thread m.harig
Thanks Simon , Hey there, that makes things easier. :) ok here are some questions: >>>Do you iterate over all docs calling hits.doc(i) ?If so do you have to load all fields to render your results, if not you should not retrieve all of them? Yes, am iterating over all docs by calling hits.doc

Re: Read large size index

2009-06-29 Thread m.harig
Thanks again, Did i index my files correctly, please need some tips, the following is the error when i run my keyword , i typed pdf , thats it , because i've got around 30,000 files named pdf, HTTP Status 500 - type Exception report message description The server encountered a

Re: Read large size index

2009-06-29 Thread m.harig
Thanks Simon , This is how am indexing my documents , indexWriter.addDocument(doc, new StopAnalyzer()); indexWriter.setMergeFactor(10); indexWriter.setMaxBufferedDocs(100); indexWriter.setMaxMergeDocs(Integer.MAX_VA

Re: Read large size index

2009-06-29 Thread m.harig
Thanks Simon I don't run any application on the tomcat , moreover i restarted it , am not doing any jobs except searching , we've a 500GB drive , we've indexed around 100,000 documents , it gives me around 1GB index . When i tried to search pdf i got the heap space error , -- View t

Re: Read large size index

2009-06-29 Thread m.harig
Simon Willnauer wrote: > > On Mon, Jun 29, 2009 at 1:48 PM, m.harig wrote: >> >> >> >> Simon Willnauer wrote: >>> >>> Hey there, >>> before going out to use hadoop (hadoop mailing list would help you >>> better I guess) you co

Re: Read large size index

2009-06-29 Thread m.harig
> - how much heap space > - where does the OOM occure > > or maybe there is already an issue that is related to you like this > one: https://issues.apache.org/jira/browse/LUCENE-1566 > > simon > > On Mon, Jun 29, 2009 at 12:49 PM, m.harig wrote: >> >> hello a

Read large size index

2009-06-29 Thread m.harig
hello all Am doing a search application on lucene, its working fine when my index size is small, am getting java heap space error when am using large size index, i came to know about hadoop with lucene to solve this problem , but i don't have any idea about hadoop , i've searched thru th

query & doc boost difference

2009-03-25 Thread m.harig
Hello all Can anyone tell me what is the difference between query.setBoost() and doc.setBoost()... More over if use query.setBoost(4.0f) am not able to boost my results . which one makes my results better please anyone help me out of this... -- View this message in context:

need scoring help

2009-03-20 Thread m.harig
Hello all i've a search application running on lucene-2.3.0 , say for example am indexing 10 urls as an input , when am searching am not able to get the expected result at the best ranking, i.e, unrelated hits are coming up rather than related hits. I've been working this for a w

boosting query

2009-03-19 Thread m.harig
Hello all, i've a search application which uses lucene-2.3.0 , and my application running for a banking domain. Am indexing some banking urls as an input and am searching some keywords. What my doubt is when i search "cards", the less count keyword url comes up. I mean , for exa

Number range search

2008-08-13 Thread m.harig
hi all. am indexing a price field by doc.add(new Field("price", "1450", Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field("price", "3800", Field.Store.YES, Field.Index.TOKENIZED)); doc.add(new Field("pri