Re: Using POS payloads for chunking

2017-06-15 Thread José Tomás Atria
Ah, good to know! I'm actually using lower level calls, as I'm building the TokenStream by hand from UIMA annotations and not using any analyzer, but I'll keep that in mind for uture projects. Thanks! On Thu, Jun 15, 2017 at 12:10 PM Erick Erickson wrote: > José: > > Do note that, while the byt

Re: Using POS payloads for chunking

2017-06-15 Thread Erick Erickson
José: Do note that, while the bytearray isn't limited, prior to LUCENE-7705 most of the tokenizers you would use limited the incoming token to 256 at most. This is not at all a _Lucene_ limitation at a low level, rather if you're indexing data with a delimited payload (say abc|your_payload_here) t

Re: Using POS payloads for chunking

2017-06-15 Thread José Tomás Atria
Hi Markus, thanks for your response! Now I feel stupid, that is clearly a much simpler approach and it has the added benefits that it would not require me to meddle into the scoring process, which I'm still a bit terrified of. Thanks for the tip. I guess the question is still valid though? i.e. h

Re: email field - analyzed and not analyzed in single field using custom analyzer

2017-06-15 Thread Steve Rowe
Hi Kumaran, WordDelimiterGraphFilter with PRESERVE_ORIGINAL should do what you want: . Here’s a test I added to TestWordDelimiterGraphFilter.java that passed for me:

email field - analyzed and not analyzed in single field using custom analyzer

2017-06-15 Thread Kumaran Ramasubramanian
Hi All, i want to index email fields as both analyzed and not analyzed using custom analyzer. for example, sm...@yahoo.com will.sm...@yahoo.com that is, indexing sm...@yahoo.com as single token as well as analyzed tokens in same email field... My existing custom analyzer, public class Custom