[jira] [Commented] (LUCENE-6073) Fix directory deletion in ExtractReuters, recover from errors

Robert Muir (JIRA) Sun, 30 Nov 2014 10:03:27 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229181#comment-14229181
 ]


Robert Muir commented on LUCENE-6073:
-------------------------------------

Hi, thanks for cleanup here!

Just a few questions:
* I'm confused about what looks like leniency in extract(). Does 
ExtractWikipedia do this too? Is there a good reason to ignore exceptions?
* extractFile should just use java.io.LineNumberReader
* is there any way to test this thing? there is a 20-line testfile in 
o.a.l.benchmark.byTask

> Fix directory deletion in ExtractReuters, recover from errors
> -------------------------------------------------------------
>
>                 Key: LUCENE-6073
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6073
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/benchmark
>            Reporter: Ramkumar Aiyengar
>            Priority: Minor
>
> ExtractReuters in the benchmark module currently fails because it currently 
> creates the output directory, and then calls {{IOUtils.rm}} on it (which will 
> remove all files in it as well as removes the output directory itself). This 
> is to fix this behaviour.
> While I was at it, I also added a bit more logging in case of file errors 
> (the download I had some bad data) and made the task recover in case of 
> issues with one file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6073) Fix directory deletion in ExtractReuters, recover from errors

Reply via email to