Sorry for cross-posting, but the tika-ml does not seem to be too "lively":
I am trying to make use of the ForkParser. Unfortunately I am getting „Lost
connection to a forked server process“ for an (encrypted) pdf which I can
extract „in-process“. Extracting the document "in-process" takes appro
Thanks Ian. Also, if I have a unigram in the query, and I want to make sure
I match only index entries that do not have more than 2 tokens, is there a
way to do that too?
Thanks
On Wed, Feb 18, 2015 at 2:23 AM, Ian Lea wrote:
> Break the query into words then add them as TermQuery instances as
Break the query into words then add them as TermQuery instances as
optional clauses to a BooleanQuery with a call to
setMinimumNumberShouldMatch(2) somewhere along the line. You may want
to do some parsing or analysis on the query terms to avoid problems of
case matching and the like.
--
Ian.
Hello,
I have a rather simple query. I have a list where I have terms like and
then my query is more natural language. I want to be able to retrieve
matches that has atleast 2 words in common between the query and the index
Can you guys suggest a Query Type and a field that I should be using?
-
Hi All,
I am trying to query records based on fields in both parent and child
documents. The query is not considering the field in the child document.
Below is the structure of my solr record.
user
21
test
M
***@gmail.com
1492932293590777856
permanentAddress
21_172
sec-38
L
Ah, you want to do it the hard way. Sorry, can't help you there - I
prefer to do things the simple way - easier to write and to maintain
and, in my experience, usually more robust in the long run.
--
Ian.
On Tue, Feb 17, 2015 at 11:42 AM, Ravikumar Govindarajan
wrote:
> Thanks Ian
>
> What I
Thanks Ian
What I am currently doing is duplicating the data into 2 different fields
and having my own PerFieldAnalyzerWrapper just like you pointed out
Is there a good way to do this in a single-pass? Like how Bi-Grams or
Common-Grams do…
--
Ravi
On Tue, Feb 17, 2015 at 3:08 PM, Ian Lea wrote
Sounds like a job for
org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper.
--
Ian.
On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan
wrote:
> We have a requirement in that E-mail addresses need to be added in a
> tokenized form to one field while untokenized form is added to
We have a requirement in that E-mail addresses need to be added in a
tokenized form to one field while untokenized form is added to another field
Ex:
"I have mailed a...@xyz.com" . It should tokenize as below
body = {"I", "have", "mailed", "abc", "xyz", "com"};
I also have a body-addr field. To