Fuzzy Phrase

2010-09-26 Thread Fabiano Nunes
Is it possible to search for fuzzy phrase queries like -- "colorless~ green~ ideas~" -- ? I have had some success with ComplexPhraseQuery, but I can't use it for querying two fields at same time, ie, -- head:"hello~ world"~3 AND contents:"colorless~ green~ ideas~" -- Thank you.

Re: Fuzzy Phrase

2010-09-26 Thread Fabiano Nunes
> -Original Message- > > From: falha...@gmail.com [mailto:falha...@gmail.com] On Behalf Of > > Fabiano Nunes > > Sent: Sunday, September 26, 2010 10:32 AM > > To: java-user@lucene.apache.org > > Subject: Fuzzy Phrase > > > > Is it possible to search fo

Re: Indexing is hung or doesn't complete

2010-10-13 Thread Fabiano Nunes
What version of PDFBox are you running? PDFBox 0.72 does not work properly with some pdf documents. See more in https://issues.apache.org/jira/browse/PDFBOX-361. So, I wrote a extractor (a copy of the original, in fact) based on trunk version (1.2.1, actually). Furthermore, this version is faster e

Custom token attributes and payload. XML analyzing.

2010-11-29 Thread Fabiano Nunes
Hello, I'm trying to store some token attributes found in a XML document. More specifically, token coordinates for future highlighting. Example: I have a XML with this structure: Lucene in Action 2nd Edition I want to store the @c attribute from word element (coordinates left,width,top,height) i

Retrieving payload attribute in highlighter

2010-11-30 Thread Fabiano Nunes
Hello, I'm trying to retrieve payloads from the highlighteds terms by Highlighter class. In my tests, all terms returned from Highlighter has null as payload. Example: Highlighter h = new Highlighter(new Formatter() { public String highlightTerm(String originalText, TokenGroup tokenGroup) { Token

Re: Retrieving payload attribute in highlighter

2010-11-30 Thread Fabiano Nunes
ance issues? Thanks. On Tue, Nov 30, 2010 at 1:20 PM, Fabiano Nunes wrote: > Hello, > I'm trying to retrieve payloads from the highlighteds terms by Highlighter > class. In my tests, all terms returned from Highlighter has null as payload. > Example: > > Highlighter h =

Re: Retrieving payload attribute in highlighter

2010-11-30 Thread Fabiano Nunes
that something > similar > will be there > in the future, but you may have to recompile if you get new jars. > > Best > Erick > > On Tue, Nov 30, 2010 at 11:06 AM, Fabiano Nunes wrote: > > > I've figured out the PayloadSpanUtil class. It's exactly

Re: Retrieving payload attribute in highlighter

2010-12-01 Thread Fabiano Nunes
riginal input? > > Now, this is largely a guess, so don't waste time if I'm really off base > with > this. > > Best > Erick > > On Tue, Nov 30, 2010 at 2:16 PM, Fabiano Nunes wrote: > > > Ok. I'll go ahead. > > Just one more thing: the apidocs

PayloadSpanUtil and unstored fields.

2010-12-01 Thread Fabiano Nunes
PayloadSpanUtil can't retrieve payloads from unstored fields (Field.Store.NO). Since the payloads is stored in terms, why do I need store the fields? Example: PayloadSpanUtil psu = new PayloadSpanUtil(ireader); Collection tests = psu.getPayloadsForQuery(query); Assert.assertTrue((tests.size() > 0)

Re: PayloadSpanUtil and unstored fields.

2010-12-01 Thread Fabiano Nunes
Sorry. I'm opening it again. On Wed, Dec 1, 2010 at 10:18 AM, Fabiano Nunes wrote: > Please, ignore this thread. > It's *my misunderstanding* of query.getSpans(). > > Thanks! > > On Wed, Dec 1, 2010 at 10:15 AM, Fabiano Nunes wrote: > >> PayloadSpanUtil

Re: PayloadSpanUtil and unstored fields.

2010-12-01 Thread Fabiano Nunes
Please, ignore this thread. It's *my misunderstanding* of query.getSpans(). Thanks! On Wed, Dec 1, 2010 at 10:15 AM, Fabiano Nunes wrote: > PayloadSpanUtil can't retrieve payloads from unstored fields ( > Field.Store.NO). Since the payloads is stored in terms, why do I need &g

Re: PDF text extracted without spaces

2010-12-03 Thread Fabiano Nunes
Have you ever tried other extractor tool than PDFBox? I used to extract contents with pdfbox: its capability of extract contents wasn't a problem, but its lack of structure information was. You can try poppler-utils (pdftotext) to extract contents with layout structure. Fabiano Nunes O

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Fabiano Nunes
[x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)