[
https://issues.apache.org/jira/browse/LUCENE-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229181#comment-14229181
]
Robert Muir commented on LUCENE-6073:
-------------------------------------
Hi, thanks for cleanup here!
Just a few questions:
* I'm confused about what looks like leniency in extract(). Does
ExtractWikipedia do this too? Is there a good reason to ignore exceptions?
* extractFile should just use java.io.LineNumberReader
* is there any way to test this thing? there is a 20-line testfile in
o.a.l.benchmark.byTask
> Fix directory deletion in ExtractReuters, recover from errors
> -------------------------------------------------------------
>
> Key: LUCENE-6073
> URL: https://issues.apache.org/jira/browse/LUCENE-6073
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/benchmark
> Reporter: Ramkumar Aiyengar
> Priority: Minor
>
> ExtractReuters in the benchmark module currently fails because it currently
> creates the output directory, and then calls {{IOUtils.rm}} on it (which will
> remove all files in it as well as removes the output directory itself). This
> is to fix this behaviour.
> While I was at it, I also added a bit more logging in case of file errors
> (the download I had some bad data) and made the task recover in case of
> issues with one file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]