date:20150217

[tika] ForkParser, Lost connection to a forked server process

2015-02-17 Thread Clemens Wyss DEV

Sorry for cross-posting, but the tika-ml does not seem to be too "lively": I am trying to make use of the ForkParser. Unfortunately I am getting „Lost connection to a forked server process“ for an (encrypted) pdf which I can extract „in-process“. Extracting the document "in-process" takes appro

Re: Indexing Query

2015-02-17 Thread Deepak Gopalakrishnan

Thanks Ian. Also, if I have a unigram in the query, and I want to make sure I match only index entries that do not have more than 2 tokens, is there a way to do that too? Thanks On Wed, Feb 18, 2015 at 2:23 AM, Ian Lea wrote: > Break the query into words then add them as TermQuery instances as

Re: Indexing Query

2015-02-17 Thread Ian Lea

Break the query into words then add them as TermQuery instances as optional clauses to a BooleanQuery with a call to setMinimumNumberShouldMatch(2) somewhere along the line. You may want to do some parsing or analysis on the query terms to avoid problems of case matching and the like. -- Ian.

Indexing Query

2015-02-17 Thread Deepak Gopalakrishnan

Hello, I have a rather simple query. I have a list where I have terms like and then my query is more natural language. I want to be able to retrieve matches that has atleast 2 words in common between the query and the index Can you guys suggest a Query Type and a field that I should be using? -

Solr | query in parent and child documents

2015-02-17 Thread chandan khatri

Hi All, I am trying to query records based on fields in both parent and child documents. The query is not considering the field in the child document. Below is the structure of my solr record. user 21 test M ***@gmail.com 1492932293590777856 permanentAddress 21_172 sec-38 L

Re: URL/Email tokenizer

2015-02-17 Thread Ian Lea

Ah, you want to do it the hard way. Sorry, can't help you there - I prefer to do things the simple way - easier to write and to maintain and, in my experience, usually more robust in the long run. -- Ian. On Tue, Feb 17, 2015 at 11:42 AM, Ravikumar Govindarajan wrote: > Thanks Ian > > What I

Re: URL/Email tokenizer

2015-02-17 Thread Ravikumar Govindarajan

Thanks Ian What I am currently doing is duplicating the data into 2 different fields and having my own PerFieldAnalyzerWrapper just like you pointed out Is there a good way to do this in a single-pass? Like how Bi-Grams or Common-Grams do… -- Ravi On Tue, Feb 17, 2015 at 3:08 PM, Ian Lea wrote

Re: URL/Email tokenizer

2015-02-17 Thread Ian Lea

Sounds like a job for org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper. -- Ian. On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan wrote: > We have a requirement in that E-mail addresses need to be added in a > tokenized form to one field while untokenized form is added to

URL/Email tokenizer

2015-02-17 Thread Ravikumar Govindarajan

We have a requirement in that E-mail addresses need to be added in a tokenized form to one field while untokenized form is added to another field Ex: "I have mailed a...@xyz.com" . It should tokenize as below body = {"I", "have", "mailed", "abc", "xyz", "com"}; I also have a body-addr field. To

[tika] ForkParser, Lost connection to a forked server process

Re: Indexing Query

Re: Indexing Query

Indexing Query

Solr | query in parent and child documents

Re: URL/Email tokenizer

Re: URL/Email tokenizer

Re: URL/Email tokenizer

URL/Email tokenizer

9 matches

Site Navigation

Mail list logo

Footer information