Re: How to get document effectively. or FieldCache example

Jacques Uber Sat, 22 Apr 2017 12:47:46 -0700

Have you considered indexing chapters as documents? Using your example you
would have three documents corresponding to your three chapters: A, B, and
D. Once you have that structure the query "pain AND head" returns only
chapters A and B. Using the information gained from this new chapter index
you could then use your existing index to do "pain AND head AND (chapter:A
OR chapter:B)"


On Fri, Apr 21, 2017 at 10:40 PM, neeraj shah <neerajsha...@gmail.com>
wrote:

> Hello,
> Let me explain my case:
> - suppose I am  searching word ("pain" (in same chapter) "head") .    This
> is my query.
>  Now what i need to do is i need to first search "pain" and then i need to
> search "head" seperately then i need common file name of both search
> result.
> Now the criteria is Suppose:
>
> FileA - Chapter A  - has word only "*pain*"
> FileB - Chapter B  - has word both "*head*" and "*pain*"
> FileC - Chapter A  - has word only "*head*"
> FileD - Chapter D  - has only word "*head*"
> FileE -  Chapter A - has only word "*pain*"
>
> Now the result should be:
> FileA - Chapter A  - has word only "*pain*"
> FileB - Chapter B  - has word both "*head*" and "*pain*"
> FileC - Chapter A  - has word only "*head*"
> FileE -  Chapter A - has only word "*pain*"
>
> FileD - Chapter D  - has only word "*head*" will not appear in search
> result because "Chapter D" name is not same as other chapters which has
> both search words.
> In short I have to show only those chapters from any book but with same
> chapter name which has both search word or atleast one search word. But
> chapter name should be same.
>
> Above is my requirement that is why I was parsing all hits for pain and
> head seperatly then i was collecting common "title" or chapter name from
> both results or the result which has atleast one search word and same
> chapter name.
> In my result only "pain" word has "5 Lacs result" and "head" word has "60K"
> results.
>
> Please suggest me if you have other approach in mind.
>
> Thanks,
> Neeraj
>
>
>
>
>
>
> On Sat, Apr 22, 2017 at 12:20 AM, Chris Hostetter <
> hossman_luc...@fucit.org>
> wrote:
>
> >
> > : then which one is right tool for text searching in files. please can
> you
> > : suggest me?
> >
> > so far all you've done is show us your *indexing* code; and said that
> > after you do a search, calling searcher.doc(docid) on 500,000 documents
> is
> > slow.
> >
> > But you still haven't described the usecase you are trying to solve --
> ie:
> > *WHY* do you want these 500,000 results from your search? Once you get
> > those Documents back, *WHAT* are you going to do with them?
> >
> > If you show us some code, and talk us through your goal, then we can help
> > you -- otherwise all we can do is warn you that the specific
> > searcher.doc(docid) API isn't designed to be efficient at that large a
> > scale.  Other APIs in Lucene are designed to be efficient at large scale,
> > but we don't really know what to suggest w/o knowing what you're trying
> to
> > do...
> >
> > https://people.apache.org/~hossman/#xyproblem
> > XY Problem
> >
> > Your question appears to be an "XY Problem" ... that is: you are dealing
> > with "X", you are assuming "Y" will help you, and you are asking about
> "Y"
> > without giving more details about the "X" so that we can understand the
> > full issue.  Perhaps the best solution doesn't involve "Y" at all?
> > See Also: http://www.perlmonks.org/index.pl?node_id=542341
> >
> >
> > PS: please, Please PLEASE upgrade to Lucene 6.x.  3.6 is more then 5
> years
> > old, and completley unsupported -- any advice you are given on this list
> > is likeley to refer to APIs that are completley different then the
> version
> > of Lucene you are working with.
> >
> >
> > :
> > :
> > : On Fri, Apr 21, 2017 at 2:01 PM, Adrien Grand <jpou...@gmail.com>
> wrote:
> > :
> > : > Lucene is not designed for retrieving that many results. What are you
> > doing
> > : > with those 5 lacs documents, I suspect this is too much to display so
> > you
> > : > probably perform some computations on them? If so maybe you could
> move
> > them
> > : > to Lucene using eg. facets? If that does not work, I'm afraid that
> > Lucene
> > : > is not the right tool for your problem.
> > : >
> > : > Le ven. 21 avr. 2017 à 08:56, neeraj shah <neerajsha...@gmail.com> a
> > : > écrit :
> > : >
> > : > > Yes I fetching around 5 lacs result from index searcher.
> > : > > Also i am indexing each line of each file because while searching i
> > need
> > : > > all the lines of a file which has matched term.
> > : > > Please tell me am i doing it right.
> > : > > {code}
> > : > >
> > : > > InputStream  is = new BufferedInputStream(new
> FileInputStream(file));
> > : > >     BufferedReader bufr = new BufferedReader(new
> > InputStreamReader(is));
> > : > >     String inputLine="" ;
> > : > >
> > : > >     while((inputLine=bufr.readLine())!=null ){
> > : > > Document doc = new Document();
> > : > >     doc.add(new
> > : > >
> > : > > Field("contents",inputLine,Field.Store.YES,Field.Index.
> > : > ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
> > : > >     doc.add(new
> > : > > Field("title",section,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > : > >     String newRem = new String(rem);
> > : > >
> > : > >     doc.add(new
> > : > > Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED));
> > : > >     doc.add(new Field("fieldsort2",rem.
> toLowerCase().replaceAll("-",
> > : > > "").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED));
> > : > >
> > : > >     doc.add(new
> > : > > Field("field1",Author,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > : > >     doc.add(new
> > : > > Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > : > >     doc.add(new
> > : > > Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > : > >
> > : > >     writer.addDocument(doc);
> > : > >
> > : > > }
> > : > >     is.close();
> > : > >
> > : > > {/code}
> > : > >
> > : > > On Thu, Apr 20, 2017 at 5:57 PM, Adrien Grand <jpou...@gmail.com>
> > wrote:
> > : > >
> > : > > > IndexSearcher.doc is the right way to retrieve documents. If this
> > is
> > : > > > slowing things down for you, I'm wondering that you might be
> > fetching
> > : > too
> > : > > > many results?
> > : > > >
> > : > > > Le jeu. 20 avr. 2017 à 14:16, neeraj shah <
> neerajsha...@gmail.com>
> > a
> > : > > > écrit :
> > : > > >
> > : > > > > Hello Everyone,
> > : > > > >
> > : > > > > I am using Lucene 3.6. I have to index around 60k docuemnts.
> > After
> > : > > > > performing the search when i try to reterive documents from
> > seacher
> > : > > using
> > : > > > > searcher.doc(docid)  it slows down the search .
> > : > > > > Please is there any other way to get the document.
> > : > > > >
> > : > > > > Also if anyone can give me an end-to-end example for working
> > : > > FieldCache.
> > : > > > > While implementing the cache i have :
> > : > > > >
> > : > > > > int[] fieldIds = FieldCache.DEFAULT.getInts(indexMultiReader,
> > "id");
> > : > > > >
> > : > > > > now i dont know how to further use the fieldIds for improving
> > search.
> > : > > > > Please give me an end-to-end example.
> > : > > > >
> > : > > > > Thanks
> > : > > > > Neeraj
> > : > > > >
> > : > > >
> > : > >
> > : >
> > :
> >
> > -Hoss
> > http://www.lucidworks.com/
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
>

Re: How to get document effectively. or FieldCache example

Reply via email to