[
https://issues.apache.org/jira/browse/LUCENE-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229275#comment-14229275
]
Ramkumar Aiyengar commented on LUCENE-6073:
-------------------------------------------
bq. I'm confused about what looks like leniency in extract(). Does
ExtractWikipedia do this too? Is there a good reason to ignore exceptions?
I didn't take a look at ExtractWikipedia, actually it might be affected by the
same issue actually (of directory deletion) -- I will check. The only "good
reason" was because the particular download I had happened to have bad data on
one line, and it seemed reasonable to continue with other files in such a case
as this was only benchmark data, at worst we would have had a few less docs..
bq. extractFile should just use java.io.LineNumberReader
Will check..
bq. is there any way to test this thing? there is a 20-line testfile in
o.a.l.benchmark.byTask
I just checked this by {{ant get-files}} in the benchmark module (called by
{{ant run-task}} eventually), this was failed before in trying to extract files
on a clean checkout, with this change it no longer does. But did you mean
through Jenkins as a proper test suite? Probably it could use one..
> Fix directory deletion in ExtractReuters, recover from errors
> -------------------------------------------------------------
>
> Key: LUCENE-6073
> URL: https://issues.apache.org/jira/browse/LUCENE-6073
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/benchmark
> Reporter: Ramkumar Aiyengar
> Priority: Minor
>
> ExtractReuters in the benchmark module currently fails because it currently
> creates the output directory, and then calls {{IOUtils.rm}} on it (which will
> remove all files in it as well as removes the output directory itself). This
> is to fix this behaviour.
> While I was at it, I also added a bit more logging in case of file errors
> (the download I had some bad data) and made the task recover in case of
> issues with one file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]