Re: Fast way to get the start of document

2012-06-25 Thread Mike Sokolov
l_text" field and only read _the_start_ of it? Otherwise, I'm thinking I'll go with an extra 1st page field for the too-huge documents. -Paul -Original Message- From: Mike Sokolov [mailto:soko...@ifactory.com] Sent: Saturday, June 23, 2012 7:16 PM To: java-user@lucene.ap

RE: Fast way to get the start of document

2012-06-25 Thread Paul Hill
xtra 1st page field for the too-huge documents. -Paul > -Original Message- > From: Mike Sokolov [mailto:soko...@ifactory.com] > Sent: Saturday, June 23, 2012 7:16 PM > To: java-user@lucene.apache.org > Cc: Jack Krupansky > Subject: Re: Fast way to get the start of document > &

Re: Fast way to get the start of document

2012-06-23 Thread Mike Sokolov
I got the sense from Paul's post that he wanted a solution that didn't require changing his index, although I'm not sure there is one. Paul if you're willing to re-index, you could also store the length of the text as a numeric field, retrieve that and use it to drive the decision about whethe

Re: Fast way to get the start of document

2012-06-23 Thread Jack Krupansky
Simply have two fields, "full_body" and "limited_body". The former would index but not store the full document text from Tika (the "content" metadata.) The latter would store but not necessarily index the first 10K or so characters of the full text. Do searches on the full body field and highli