This way I am indexing my files: InputStream is = new BufferedInputStream(new FileInputStream(file));BufferedReader bufr = new BufferedReader(new InputStreamReader(is));String inputLine="" ;while((inputLine=bufr.readLine())!=null ){Document doc = new Document(); doc.add(new Field("contents",inputLine,Field.Store.YES,Field.Index.ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS)); doc.add(new Field("title",section,Field.Store.YES,Field.Index.NOT_ANALYZED)); String newRem = new String(rem); doc.add(new Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED)); doc.add(new Field("fieldsort2",rem.toLowerCase().replaceAll("-", "").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED)); doc.add(new Field("field1",Author,Field.Store.YES,Field.Index.NOT_ANALYZED)); doc.add(new Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED)); doc.add(new Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED));
writer.addDocument(doc); } is.close(); Can you explain you please solution in steps? It will be very helpful Thanks, Neeraj On Sun, Apr 23, 2017 at 1:17 AM, Jacques Uber <ub...@miradortech.com> wrote: > Have you considered indexing chapters as documents? Using your example you > would have three documents corresponding to your three chapters: A, B, and > D. Once you have that structure the query "pain AND head" returns only > chapters A and B. Using the information gained from this new chapter index > you could then use your existing index to do "pain AND head AND (chapter:A > OR chapter:B)" > > On Fri, Apr 21, 2017 at 10:40 PM, neeraj shah <neerajsha...@gmail.com> > wrote: > > > Hello, > > Let me explain my case: > > - suppose I am searching word ("pain" (in same chapter) "head") . > This > > is my query. > > Now what i need to do is i need to first search "pain" and then i need > to > > search "head" seperately then i need common file name of both search > > result. > > Now the criteria is Suppose: > > > > FileA - Chapter A - has word only "*pain*" > > FileB - Chapter B - has word both "*head*" and "*pain*" > > FileC - Chapter A - has word only "*head*" > > FileD - Chapter D - has only word "*head*" > > FileE - Chapter A - has only word "*pain*" > > > > Now the result should be: > > FileA - Chapter A - has word only "*pain*" > > FileB - Chapter B - has word both "*head*" and "*pain*" > > FileC - Chapter A - has word only "*head*" > > FileE - Chapter A - has only word "*pain*" > > > > FileD - Chapter D - has only word "*head*" will not appear in search > > result because "Chapter D" name is not same as other chapters which has > > both search words. > > In short I have to show only those chapters from any book but with same > > chapter name which has both search word or atleast one search word. But > > chapter name should be same. > > > > Above is my requirement that is why I was parsing all hits for pain and > > head seperatly then i was collecting common "title" or chapter name from > > both results or the result which has atleast one search word and same > > chapter name. > > In my result only "pain" word has "5 Lacs result" and "head" word has > "60K" > > results. > > > > Please suggest me if you have other approach in mind. > > > > Thanks, > > Neeraj > > > > > > > > > > > > > > On Sat, Apr 22, 2017 at 12:20 AM, Chris Hostetter < > > hossman_luc...@fucit.org> > > wrote: > > > > > > > > : then which one is right tool for text searching in files. please can > > you > > > : suggest me? > > > > > > so far all you've done is show us your *indexing* code; and said that > > > after you do a search, calling searcher.doc(docid) on 500,000 documents > > is > > > slow. > > > > > > But you still haven't described the usecase you are trying to solve -- > > ie: > > > *WHY* do you want these 500,000 results from your search? Once you get > > > those Documents back, *WHAT* are you going to do with them? > > > > > > If you show us some code, and talk us through your goal, then we can > help > > > you -- otherwise all we can do is warn you that the specific > > > searcher.doc(docid) API isn't designed to be efficient at that large a > > > scale. Other APIs in Lucene are designed to be efficient at large > scale, > > > but we don't really know what to suggest w/o knowing what you're trying > > to > > > do... > > > > > > https://people.apache.org/~hossman/#xyproblem > > > XY Problem > > > > > > Your question appears to be an "XY Problem" ... that is: you are > dealing > > > with "X", you are assuming "Y" will help you, and you are asking about > > "Y" > > > without giving more details about the "X" so that we can understand the > > > full issue. Perhaps the best solution doesn't involve "Y" at all? > > > See Also: http://www.perlmonks.org/index.pl?node_id=542341 > > > > > > > > > PS: please, Please PLEASE upgrade to Lucene 6.x. 3.6 is more then 5 > > years > > > old, and completley unsupported -- any advice you are given on this > list > > > is likeley to refer to APIs that are completley different then the > > version > > > of Lucene you are working with. > > > > > > > > > : > > > : > > > : On Fri, Apr 21, 2017 at 2:01 PM, Adrien Grand <jpou...@gmail.com> > > wrote: > > > : > > > : > Lucene is not designed for retrieving that many results. What are > you > > > doing > > > : > with those 5 lacs documents, I suspect this is too much to display > so > > > you > > > : > probably perform some computations on them? If so maybe you could > > move > > > them > > > : > to Lucene using eg. facets? If that does not work, I'm afraid that > > > Lucene > > > : > is not the right tool for your problem. > > > : > > > > : > Le ven. 21 avr. 2017 à 08:56, neeraj shah <neerajsha...@gmail.com> > a > > > : > écrit : > > > : > > > > : > > Yes I fetching around 5 lacs result from index searcher. > > > : > > Also i am indexing each line of each file because while > searching i > > > need > > > : > > all the lines of a file which has matched term. > > > : > > Please tell me am i doing it right. > > > : > > {code} > > > : > > > > > : > > InputStream is = new BufferedInputStream(new > > FileInputStream(file)); > > > : > > BufferedReader bufr = new BufferedReader(new > > > InputStreamReader(is)); > > > : > > String inputLine="" ; > > > : > > > > > : > > while((inputLine=bufr.readLine())!=null ){ > > > : > > Document doc = new Document(); > > > : > > doc.add(new > > > : > > > > > : > > Field("contents",inputLine,Field.Store.YES,Field.Index. > > > : > ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS)); > > > : > > doc.add(new > > > : > > Field("title",section,Field.Store.YES,Field.Index.NOT_ > ANALYZED)); > > > : > > String newRem = new String(rem); > > > : > > > > > : > > doc.add(new > > > : > > Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED)); > > > : > > doc.add(new Field("fieldsort2",rem. > > toLowerCase().replaceAll("-", > > > : > > "").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED)); > > > : > > > > > : > > doc.add(new > > > : > > Field("field1",Author,Field.Store.YES,Field.Index.NOT_ > ANALYZED)); > > > : > > doc.add(new > > > : > > Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED)); > > > : > > doc.add(new > > > : > > Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED)); > > > : > > > > > : > > writer.addDocument(doc); > > > : > > > > > : > > } > > > : > > is.close(); > > > : > > > > > : > > {/code} > > > : > > > > > : > > On Thu, Apr 20, 2017 at 5:57 PM, Adrien Grand <jpou...@gmail.com > > > > > wrote: > > > : > > > > > : > > > IndexSearcher.doc is the right way to retrieve documents. If > this > > > is > > > : > > > slowing things down for you, I'm wondering that you might be > > > fetching > > > : > too > > > : > > > many results? > > > : > > > > > > : > > > Le jeu. 20 avr. 2017 à 14:16, neeraj shah < > > neerajsha...@gmail.com> > > > a > > > : > > > écrit : > > > : > > > > > > : > > > > Hello Everyone, > > > : > > > > > > > : > > > > I am using Lucene 3.6. I have to index around 60k docuemnts. > > > After > > > : > > > > performing the search when i try to reterive documents from > > > seacher > > > : > > using > > > : > > > > searcher.doc(docid) it slows down the search . > > > : > > > > Please is there any other way to get the document. > > > : > > > > > > > : > > > > Also if anyone can give me an end-to-end example for working > > > : > > FieldCache. > > > : > > > > While implementing the cache i have : > > > : > > > > > > > : > > > > int[] fieldIds = FieldCache.DEFAULT.getInts( > indexMultiReader, > > > "id"); > > > : > > > > > > > : > > > > now i dont know how to further use the fieldIds for improving > > > search. > > > : > > > > Please give me an end-to-end example. > > > : > > > > > > > : > > > > Thanks > > > : > > > > Neeraj > > > : > > > > > > > : > > > > > > : > > > > > : > > > > : > > > > > > -Hoss > > > http://www.lucidworks.com/ > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > >