Yes, I ended up doing essentially that. No need to tokenize, I basically split
the input string into a sequence of alternating "word" and "nonword" tokens
based on Character.isLetter() and then looked up the words
Ilya
-Original Message-
From: Danil ŢORIN [mailto:torin...@gmail.com]
Hi!
I am trying to extend "mahout lucene.vector" driver, so that it can be
feeded with arbitrary
key-value constraints on solr schema fields (and generate only a subset for
mahout vectors,
which seems to be a regular use case).
So the best (easiest) way I see, is to create an IndexReader implemen
I have a situation where there are users that create n keywords. I'm storing
them as individual DB fields for aggregating scores and then building the
query from the fields. Is it faster for Lucene to parse a query of terms
that are OR'd together or to build it up as a loop of BooleanQuery marked a
Hard to believe this ever worked. KeywordAnalyzer '"Tokenizes" the
entire stream as a single token' i.e. there will only be one term. So
your document contains:ba foo would only match a search on ba foo, not
a search on foo. Are you sure you should be using KeywordAnalyzer?
Not usually used on s
Dear Lucene-developers,
I switched to using Lucene 3.5 a few weeks ago and suddenly sentences are
not correctly indexed anymore. Basically, fields can be correctly queried if
they contain one term but if there are multiple terms, the analyzer fails (i
use the latest Luke for testing).
So my quer
greate thanks
On Mon, Jan 16, 2012 at 5:56 AM, findbestopensource <
findbestopensou...@gmail.com> wrote:
> Check out the presentation.
> http://java.dzone.com/videos/archive-it-scaling-beyond
>
> Web archive uses Lucene to index billions of pages.
>
> Regards
> Aditya
> www.findbestopensource.com
Welcome to the list.
This is hard with no quick and easy answers. For a similar index, but
books rather than music, I index author and title separately into 2
fields, author and title combined into another field, author and title
and blurb and whatever all combined into yet another field. Each
s
Or you may simple store the field as is, but index it in whatever way you
like (replacing some tokens with other, or maybe storing both words with
position increment = 0).
On Mon, Jan 16, 2012 at 13:23, Dmytro Barabash wrote:
> I think you need index this field with
> org.apache.lucene.document
Some values in the norm/boost area are stored encoded with some loss
of precision. Details in the javadocs somewhere. What values do you
get when you change the boost?
--
Ian.
2012/1/14 ltomuno :
> the following message comes from Explanation explain
> 0.09375 = (MATCH) fieldWeight(name:85
I think you need index this field with
org.apache.lucene.document.Field.TermVector != NO
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Check out the presentation.
http://java.dzone.com/videos/archive-it-scaling-beyond
Web archive uses Lucene to index billions of pages.
Regards
Aditya
www.findbestopensource.com
On Fri, Jan 13, 2012 at 4:31 PM, Peter K wrote:
> yes and no!
> google is not only the search engine ...
>
> > Just c
Maybe you could simply use String.replace()?
Or the text actually needs to be tokenized?
On Fri, Jan 13, 2012 at 18:44, Ilya Zavorin wrote:
> I am trying to perform a "translation" of sorts of a stream of text. More
> specifically, I need to tokenize the input stream, look up every term in a
> s
12 matches
Mail list logo