[
https://issues.apache.org/jira/browse/LUCENE-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991179#comment-12991179
]
Robert Muir commented on LUCENE-1540:
-------------------------------------
Hi Doron, about the test random seeds:
It is complicated (though maybe we could fix this!) for the same random seed in
trunk to work just like 3.x
But for the locales: the way it picks a random locale is from the available
system locales. This changes from jre to jre,
so unfortunately we cannot guarantee that the same seed chooses the same locale
randomly... Its the same with
timezones too... and these even change in minor jdk updates!
I wish we knew of a good solution, because I hate it when things aren't
completely reproducible everywhere.
> Improvements to contrib.benchmark for TREC collections
> ------------------------------------------------------
>
> Key: LUCENE-1540
> URL: https://issues.apache.org/jira/browse/LUCENE-1540
> Project: Lucene - Java
> Issue Type: Improvement
> Components: contrib/benchmark
> Reporter: Tim Armstrong
> Assignee: Doron Cohen
> Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-1540.patch, LUCENE-1540.patch, LUCENE-1540.patch,
> LUCENE-1540.patch, trecdocs.zip
>
>
> The benchmarking utilities for TREC test collections (http://trec.nist.gov)
> are quite limited and do not support some of the variations in format of
> older TREC collections.
> I have been doing some benchmarking work with Lucene and have had to modify
> the package to support:
> * Older TREC document formats, which the current parser fails on due to
> missing document headers.
> * Variations in query format - newlines after <title> tag causing the query
> parser to get confused.
> * Ability to detect and read in uncompressed text collections
> * Storage of document numbers by default without storing full text.
> I can submit a patch if there is interest, although I will probably want to
> write unit tests for the new functionality first.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]