Re: How can I let Tika know the resource name?

2012-08-15 Thread Dave Meikle
Hi, On 13 Aug 2012, at 12:31, 122jxgcn wrote: > Hello, > > I'm using Solr's ExtractingRequestHandler to let Tika know the name of the > file when indexing. > I'm currently sending HTTP request something like > > /update/extract?stream.file=#{filepath}&literal.id=#{filepath}&resource.name=#{re

[jira] [Updated] (TIKA-961) No whitespace added if BoilerpipeContentHandler.setIncludeMarkup(true)

2012-08-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated TIKA-961: --- Attachment: TIKA-961-1.3-2.patch Here's a new patch with unit test. The test breaks when checking for a

[jira] [Created] (TIKA-975) LinkBuilder to optionally collapse anchor whitespace

2012-08-15 Thread Markus Jelsma (JIRA)
Markus Jelsma created TIKA-975: -- Summary: LinkBuilder to optionally collapse anchor whitespace Key: TIKA-975 URL: https://issues.apache.org/jira/browse/TIKA-975 Project: Tika Issue Type: Improve

[jira] [Updated] (TIKA-975) LinkBuilder to optionally collapse anchor whitespace

2012-08-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated TIKA-975: --- Attachment: TIKA-975-1.3-1.patch Here's a patch for trunk. > LinkBuilder to optionally

[jira] [Commented] (TIKA-961) No whitespace added if BoilerpipeContentHandler.setIncludeMarkup(true)

2012-08-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435139#comment-13435139 ] Markus Jelsma commented on TIKA-961: Browsing through the code i believe we can conditio

[jira] [Updated] (TIKA-961) No whitespace added if BoilerpipeContentHandler.setIncludeMarkup(true)

2012-08-15 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated TIKA-961: --- Attachment: TIKA-961-1.3-3.patch It does! This is great! The both assertions now pass and CJK text does