Are you _sure_ you're looking at tokens and not stored data? That can
sometimes be confusing.

admin/<core>/schema browser might help here.

Best,
Erick

On Thu, Sep 11, 2014 at 1:33 PM, suleman mubarik (JIRA) <[email protected]> wrote:
>
>     [ 
> https://issues.apache.org/jira/browse/LUCENE-5943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130661#comment-14130661
>  ]
>
> suleman mubarik edited comment on LUCENE-5943 at 9/11/14 8:33 PM:
> ------------------------------------------------------------------
>
> Here is other example
> if input is this "I love <pizza  hut>"
> then i get tokens "i", "love" ,"pizza", "hut" and offsets (0,1), (2,6), 
> (7,11), (12,14)
> if HTMLStripCharFilter remove text between angle brackets then i should get 
> "i", "love"  and not  "i", "love" ,"pizza", "hut"
>
> here is other example "I love <html>"
> tokens i get "i", "love" ,"html"
> I am on Lucene 4.8
>
>
> was (Author: sulemanmubarik):
> Here is other example
> if input is this "I love <pizza  hut>"
> then i get tokens "i", "love" ,"pizza", "hut" and offsets (0,1), (2,6), 
> (7,11), (12,14)
> if HTMLStripCharFilter remove text between angle brackets then i should get 
> "i", "love"  and not  "i", "love" ,"pizza", "hut"
> I am on Lucene 4.8
>
>> HTML strip filter removes text between < and >
>> ----------------------------------------------
>>
>>                 Key: LUCENE-5943
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-5943
>>             Project: Lucene - Core
>>          Issue Type: Bug
>>          Components: core/index
>>         Environment: Production
>>            Reporter: suleman mubarik
>>
>> If I have this as input “I love <pizza  hut> so much”
>> When I apply html striper it removes “pizza  hut” and I get tokens "i", 
>> "love" ,"so", "much"
>> And these are offsets I get back ((0,1), (2,6), (20,22), (23,27))
>> Html strip filter should return "i", "love" ,"pizza", "hut", "so", "much"
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to