Is it possible to search for fuzzy phrase queries like -- "colorless~ green~
ideas~" -- ?
I have had some success with ComplexPhraseQuery, but I can't use it for
querying two fields at same time, ie, -- head:"hello~ world"~3 AND
contents:"colorless~ green~ ideas~" --
Thank you.
> -Original Message-
> > From: falha...@gmail.com [mailto:falha...@gmail.com] On Behalf Of
> > Fabiano Nunes
> > Sent: Sunday, September 26, 2010 10:32 AM
> > To: java-user@lucene.apache.org
> > Subject: Fuzzy Phrase
> >
> > Is it possible to search fo
What version of PDFBox are you running?
PDFBox 0.72 does not work properly with some pdf documents. See more in
https://issues.apache.org/jira/browse/PDFBOX-361.
So, I wrote a extractor (a copy of the original, in fact) based on trunk
version (1.2.1, actually). Furthermore, this version is faster e
Hello,
I'm trying to store some token attributes found in a XML document.
More specifically, token coordinates for future highlighting.
Example: I have a XML with this structure:
Lucene
in
Action
2nd
Edition
I want to store the @c attribute from word element (coordinates
left,width,top,height) i
Hello,
I'm trying to retrieve payloads from the highlighteds terms by Highlighter
class. In my tests, all terms returned from Highlighter has null as payload.
Example:
Highlighter h = new Highlighter(new Formatter() {
public String highlightTerm(String originalText, TokenGroup tokenGroup) {
Token
ance issues?
Thanks.
On Tue, Nov 30, 2010 at 1:20 PM, Fabiano Nunes wrote:
> Hello,
> I'm trying to retrieve payloads from the highlighteds terms by Highlighter
> class. In my tests, all terms returned from Highlighter has null as payload.
> Example:
>
> Highlighter h =
that something
> similar
> will be there
> in the future, but you may have to recompile if you get new jars.
>
> Best
> Erick
>
> On Tue, Nov 30, 2010 at 11:06 AM, Fabiano Nunes wrote:
>
> > I've figured out the PayloadSpanUtil class. It's exactly
riginal input?
>
> Now, this is largely a guess, so don't waste time if I'm really off base
> with
> this.
>
> Best
> Erick
>
> On Tue, Nov 30, 2010 at 2:16 PM, Fabiano Nunes wrote:
>
> > Ok. I'll go ahead.
> > Just one more thing: the apidocs
PayloadSpanUtil can't retrieve payloads from unstored fields (Field.Store.NO).
Since the payloads is stored in terms, why do I need store the fields?
Example:
PayloadSpanUtil psu = new PayloadSpanUtil(ireader);
Collection tests = psu.getPayloadsForQuery(query);
Assert.assertTrue((tests.size() > 0)
Sorry. I'm opening it again.
On Wed, Dec 1, 2010 at 10:18 AM, Fabiano Nunes wrote:
> Please, ignore this thread.
> It's *my misunderstanding* of query.getSpans().
>
> Thanks!
>
> On Wed, Dec 1, 2010 at 10:15 AM, Fabiano Nunes wrote:
>
>> PayloadSpanUtil
Please, ignore this thread.
It's *my misunderstanding* of query.getSpans().
Thanks!
On Wed, Dec 1, 2010 at 10:15 AM, Fabiano Nunes wrote:
> PayloadSpanUtil can't retrieve payloads from unstored fields (
> Field.Store.NO). Since the payloads is stored in terms, why do I need
&g
Have you ever tried other extractor tool than PDFBox? I used to extract
contents with pdfbox: its capability of extract contents wasn't a problem,
but its lack of structure information was.
You can try poppler-utils (pdftotext) to extract contents with
layout structure.
Fabiano Nunes
O
[x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
13 matches
Mail list logo