Re: Twitter analyser

Stephane Nicoll Tue, 05 Nov 2013 05:35:17 -0800

Hi,

Thanks for the reply. It's an index with tweets so any word really is a
target for this. This would mean a significant increase of the index. My
volumes are really small so that shouldn't be a problem (but
performance/scalability is a concern).


I have the control over the query. Another solution would be to translate a
query on "foo" to "foo or #foo or @foo"

WDYT?

Thanks!
S.




On Tue, Nov 5, 2013 at 2:17 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> If your universe of items you want to match this way is small,
> consider something akin to synonyms. Your indexing process
> emits two tokens, with and without the @ or # which should
> cover your situation.
>
> FWIW,
> Erick
>
>
> On Tue, Nov 5, 2013 at 2:40 AM, Stéphane Nicoll
> <stephane.nic...@gmail.com>wrote:
>
> > Hi,
> >
> > I am building an application that indexes tweet and offer some basic
> > search facilities on them.
> >
> > I am trying to find a combination where the following would work:
> >
> > * foo matches the foo word, a mention (@foo) or the hashtag (#foo)
> > * @foo only matches the mention
> > * #foo matches only the hashtag
> >
> > It should matches complete word so I used the WhiteSpaceAnalyzer for
> > indexing.
> >
> > Any recommendation for this use case?
> >
> > Thanks !
> > S.
> >
> > Sent from my iPhone
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>

Re: Twitter analyser

Reply via email to