Ian, Thanks for your reply.. I am facing the same problem if i use whiteSpaceTokenizer also. My analyzer works perfect in case of Lucene 3.6.
Thanks and Regards Vignesh Srinivasan On Thu, Oct 3, 2013 at 3:23 PM, Ian Lea <ian....@gmail.com> wrote: > Certainly sounds like a bug in your analyzer. You could start a new > thread if you need help with that. But from your previous email it > sounds like you could use WhitespaceTokenizer chained with > LowerCaseFilter. > > > -- > Ian. > > > On Thu, Oct 3, 2013 at 7:16 AM, VIGNESH S <vigneshkln...@gmail.com> wrote: > > Hi, > > > > In my Analyzer,problem actually occurs for words which are preceded by > > punctuation marks.. > > > > For Example: > > If I am Indexing content ",Andrey Gubarev,JingGoogle,Inc." > > > > If I search "Andrew Gubarev" ,It is not working properly since word > Andrew > > is preceded by punctuation ",". > > > > > > On Thu, Oct 3, 2013 at 11:23 AM, VIGNESH S <vigneshkln...@gmail.com> > wrote: > > > >> Hi Ian, > >> > >> In Lucene Is there any Default Analyzer we can use which will ignore > only > >> Spaces. > >> All other numbers,punctuation,dates everything it should preserve. > >> > >> I created my analyzer with tokenizer which returns > >> Character.isDefined(cn) && (!Character.isWhitespace(cn)). > >> My analyzer will use a lowe case filter on top of the tokenizer.This > Woks > >> Perfect in case of 3.6 > >> In 4.3 it is creating problems in offsets of tokens. > >> > >> > >> > >> > >> On Mon, Sep 30, 2013 at 8:21 PM, Ian Lea <ian....@gmail.com> wrote: > >> > >>> Whenever someone says they are using a custom analyzer that has to be > >>> a suspect. Does it work if you use one of the core lucene analyzers > >>> instead? Have you used Luke to verify that the index holds what you > >>> think it does? > >>> > >>> > >>> -- > >>> Ian. > >>> > >>> > >>> On Mon, Sep 30, 2013 at 3:21 PM, VIGNESH S <vigneshkln...@gmail.com> > >>> wrote: > >>> > Hi, > >>> > > >>> > It is not the problem with case..Because Iam using LowercaseFilter. > >>> > > >>> > My Analyzer is a custom analyzer which will ignore just white > spaces.All > >>> > other numbers date and other special characters it will consider.The > >>> Same > >>> > analyzer works for Lucene 3.6. > >>> > > >>> > > >>> > When i do a single term query for "Geoffrey" it is giving hits..But > when > >>> > given as a part of multiphrase query ,it is not able to find..When > the > >>> > below code is Executed with say word ="Geoffrey",it is not finding > the > >>> word > >>> > itself .. > >>> > > >>> > if(TermsEnum.SeekStatus.FOUND ==trm.seekCeil(new BytesRef(word))) > >>> > { do { > >>> > String s = > trm.term().utf8ToString(); > >>> > if (s.equals(word)) { > >>> > termsWithPrefix.add(new > >>> Term("content", > >>> > s)); > >>> > } else { > >>> > break; > >>> > } > >>> > } > >>> > while (trm.next() != null); > >>> > } > >>> > > >>> > > >>> > > >>> > On Mon, Sep 30, 2013 at 3:01 PM, Ian Lea <ian....@gmail.com> wrote: > >>> > > >>> >> Whenever someone says something along the lines of a search for > >>> >> "geoffrey" not matching "Geoffrey" the case difference springs out, > >>> >> Can't recall what if anything you said about the analysis side of > >>> >> things but that could be the cause. See > >>> >> > >>> >> > >>> > http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2F_incorrect_hits.3F > >>> >> > >>> >> If on the other hand the problem is more obscure, and only related > to > >>> >> the multi phrase stuff, I suggest you build a tiny but complete > >>> >> RAMDirectory based program or test case that shows the problem and > >>> >> post it here. > >>> >> > >>> >> > >>> >> -- > >>> >> Ian. > >>> >> > >>> >> > >>> >> > >>> >> On Mon, Sep 30, 2013 at 6:46 AM, VIGNESH S <vigneshkln...@gmail.com > > > >>> >> wrote: > >>> >> > Hi, > >>> >> > > >>> >> > Thanks for your Reply.The Problem I face is there is a word called > >>> >> Geoffrey > >>> >> > Romer in my Field. > >>> >> > > >>> >> > I am Forming a Multiphrase query object properly like " Geoffrey > >>> >> Romer".But > >>> >> > When i do a Search,it is not returning Hits.This Problem I am > facing > >>> is > >>> >> not > >>> >> > for all phrases > >>> >> > This Problem happens for only few Phrases. > >>> >> > > >>> >> > When i do a single query like Geoffrey it is giving a Hit..But > when > >>> i do > >>> >> it > >>> >> > in MultiphraseQuery it is not able to find "geoffrey". I confirmed > >>> this > >>> >> by > >>> >> > doing trm.seekCeil(new BytesRef("Geoffrey")) and then and then > when > >>> i > >>> >> > do String s = trm.term().utf8ToString().It is pointing to a > diffrent > >>> word > >>> >> > instead of geoffrey.seekceil is working properly for many phrases > >>> though. > >>> >> > > >>> >> > What could be the problem..please kindly suggest. > >>> >> > > >>> >> > > >>> >> > > >>> >> > On Fri, Sep 27, 2013 at 6:58 PM, Allison, Timothy B. < > >>> talli...@mitre.org > >>> >> >wrote: > >>> >> > > >>> >> >> 1) An alternate method to your original question would be to do > >>> >> something > >>> >> >> like this (I haven't compiled or tested this!): > >>> >> >> > >>> >> >> Query q = new PrefixQuery(new Term("field", "app")); > >>> >> >> > >>> >> >> q = q.rewrite(indexReader) ; > >>> >> >> Set<Term> terms = new HashSet<Term>(); > >>> >> >> q.extractTerms(terms); > >>> >> >> Term[] arr = terms.toArray(new Term[terms.size()]); > >>> >> >> MultiPhraseQuery mpq = new MultiPhraseQuery(); > >>> >> >> mpq.add(new Term("field", "microsoft"); > >>> >> >> mpq.add(arr); > >>> >> >> > >>> >> >> > >>> >> >> 2) At a higher level, do you need to generate your query > >>> >> programmatically? > >>> >> >> Here are three parsers that could handle this: > >>> >> >> a) ComplexPhraseQueryParser > >>> >> >> b) SurroundQueryParser: > >>> oal.queryparser.surround.parser.QueryParser > >>> >> >> c) experimental: <self_promotion degree="shameless"> > >>> >> >> http://issues.apache.org/jira/browse/LUCENE-5205 > </self_promotion> > >>> >> >> > >>> >> >> > >>> >> >> -----Original Message----- > >>> >> >> From: VIGNESH S [mailto:vigneshkln...@gmail.com] > >>> >> >> Sent: Friday, September 27, 2013 3:33 AM > >>> >> >> To: java-user@lucene.apache.org > >>> >> >> Subject: Re: Multiphrase Query in Lucene 4.3 > >>> >> >> > >>> >> >> Hi, > >>> >> >> > >>> >> >> The word i am giving is "Romer Geoffrey ".The Word is in the > Field. > >>> >> >> > >>> >> >> trm.seekCeil(new BytesRef("Geoffrey")) and then when i do > String s > >>> = > >>> >> >> trm.term().utf8ToString(); and hence > >>> >> >> > >>> >> >> It is giving a diffrent word..I think this is why my > >>> multiphrasequery is > >>> >> >> not giving desired results. > >>> >> >> > >>> >> >> What may be the reason.. > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> On Fri, Sep 27, 2013 at 11:49 AM, VIGNESH S < > >>> vigneshkln...@gmail.com> > >>> >> >> wrote: > >>> >> >> > >>> >> >> > Hi Lan, > >>> >> >> > > >>> >> >> > Thanks for your Reply. > >>> >> >> > > >>> >> >> > I am doing similar to this only..In MultiPhraseQuery object > actual > >>> >> phrase > >>> >> >> > is going proper but it is not returning any hits.. > >>> >> >> > > >>> >> >> > In Lucene 3.6,I implemented the same logic and it is working. > >>> >> >> > > >>> >> >> > In Lucene 4.3,I implemented the Index for that using > >>> >> >> > > >>> >> >> > FieldType offsetsType = new FieldType(TextField.TYPE_STORED); > >>> >> >> > > >>> >> >> > > >>> >> >> > >>> >> > >>> > > offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); > >>> >> >> > > >>> >> >> > For MultiphraseQuery, whether I need to add any other > parameter in > >>> >> >> > addition to this while indexing? > >>> >> >> > > >>> >> >> > Is there any MultiPhraseQueryTest java file for Lucene 4.3? I > >>> checked > >>> >> in > >>> >> >> > Lucene branch and i was not able to find..Please kindly help. > >>> >> >> > > >>> >> >> > > >>> >> >> > > >>> >> >> > > >>> >> >> > > >>> >> >> > > >>> >> >> > On Thu, Sep 26, 2013 at 2:55 PM, Ian Lea <ian....@gmail.com> > >>> wrote: > >>> >> >> > > >>> >> >> >> I use the code below to do something like this. Not exactly > >>> what you > >>> >> >> >> want but should be easy to adapt. > >>> >> >> >> > >>> >> >> >> > >>> >> >> >> public List<String> findTerms(IndexReader _reader, > >>> >> >> >> String _field) throws > IOException { > >>> >> >> >> List<String> l = new ArrayList<String>(); > >>> >> >> >> Fields ff = MultiFields.getFields(_reader); > >>> >> >> >> Terms trms = ff.terms(_field); > >>> >> >> >> TermsEnum te = trms.iterator(null); > >>> >> >> >> BytesRef br; > >>> >> >> >> while ((br = te.next()) != null) { > >>> >> >> >> l.add(br.utf8ToString()); > >>> >> >> >> } > >>> >> >> >> return l; > >>> >> >> >> } > >>> >> >> >> > >>> >> >> >> -- > >>> >> >> >> Ian. > >>> >> >> >> > >>> >> >> >> On Wed, Sep 25, 2013 at 3:04 PM, VIGNESH S < > >>> vigneshkln...@gmail.com> > >>> >> >> >> wrote: > >>> >> >> >> > Hi, > >>> >> >> >> > > >>> >> >> >> > In the Example of Multiphrase Query it is mentioned > >>> >> >> >> > > >>> >> >> >> > "To use this class, to search for the phrase "Microsoft > app*" > >>> first > >>> >> >> use > >>> >> >> >> > add(Term) on the term "Microsoft", then find all terms that > >>> have > >>> >> "app" > >>> >> >> >> as > >>> >> >> >> > prefix using IndexReader.terms(Term), and use > >>> >> >> >> MultiPhraseQuery.add(Term[] > >>> >> >> >> > terms) to add them to the query" > >>> >> >> >> > > >>> >> >> >> > > >>> >> >> >> > How can i replicate the Same in Lucene 4.3 since > >>> >> >> >> IndexReader.terms(Term) is > >>> >> >> >> > no more used > >>> >> >> >> > > >>> >> >> >> > -- > >>> >> >> >> > Thanks and Regards > >>> >> >> >> > Vignesh Srinivasan > >>> >> >> >> > >>> >> >> >> > >>> --------------------------------------------------------------------- > >>> >> >> >> To unsubscribe, e-mail: > java-user-unsubscr...@lucene.apache.org > >>> >> >> >> For additional commands, e-mail: > >>> java-user-h...@lucene.apache.org > >>> >> >> >> > >>> >> >> >> > >>> >> >> > > >>> >> >> > > >>> >> >> > -- > >>> >> >> > Thanks and Regards > >>> >> >> > Vignesh Srinivasan > >>> >> >> > 9739135640 > >>> >> >> > > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> -- > >>> >> >> Thanks and Regards > >>> >> >> Vignesh Srinivasan > >>> >> >> 9739135640 > >>> >> >> > >>> >> >> > >>> --------------------------------------------------------------------- > >>> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>> >> >> For additional commands, e-mail: > java-user-h...@lucene.apache.org > >>> >> >> > >>> >> >> > >>> >> > > >>> >> > > >>> >> > -- > >>> >> > Thanks and Regards > >>> >> > Vignesh Srinivasan > >>> >> > 9739135640 > >>> >> > >>> >> > --------------------------------------------------------------------- > >>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>> >> > >>> >> > >>> > > >>> > > >>> > -- > >>> > Thanks and Regards > >>> > Vignesh Srinivasan > >>> > 9739135640 > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>> > >>> > >> > >> > >> -- > >> Thanks and Regards > >> Vignesh Srinivasan > >> 9739135640 > >> > > > > > > > > -- > > Thanks and Regards > > Vignesh Srinivasan > > 9739135640 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Thanks and Regards Vignesh Srinivasan 9739135640