[
https://issues.apache.org/jira/browse/SOLR-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated SOLR-2864:
-------------------------------
Fix Version/s: (was: 4.7)
4.8
> DataImportHandler has non-deterministic sort order for XML files
> ----------------------------------------------------------------
>
> Key: SOLR-2864
> URL: https://issues.apache.org/jira/browse/SOLR-2864
> Project: Solr
> Issue Type: Bug
> Components: contrib - DataImportHandler
> Affects Versions: 3.4
> Reporter: Gabriel Cooper
> Assignee: Shalin Shekhar Mangar
> Priority: Minor
> Labels: dataimport, patch, xml
> Fix For: 4.8
>
> Attachments: lucene-2864.patch, lucene-2864.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> DataImportHandler's FileListEntityProcessor relies on Java's File.list()
> method to retrieve a list of files from the configured dataimport directory,
> but list() does not guarantee a sort order ^(1)^. This means that if you have
> two files that update the same record, the results are non-deterministic.
> Typically, list() does in fact return them lexigraphically sorted, but this
> is not guaranteed ^(2)^.
> An example of how you can get into trouble is to imagine the following:
> xyz.xml -- Created one hour ago. Contains updates to records "Foo" and "Bar".
> abc.xml -- Created one minute ago. Contains updates to records "Bar" and
> "Baz".
> In this case, the newest file, in abc.xml, would (likely, but not guaranteed)
> be run first, updating the "Bar" and "Baz" records. Next, the older file,
> xyz.xml, would update "Foo" and overwrite "Bar" with outdated changes.
> (1) Per
> http://download.oracle.com/javase/1,5,0/docs/api/java/io/File.html#list%28%29
> "There is no guarantee that the name strings in the resulting array will
> appear in any specific order; they are not, in particular, guaranteed to
> appear in alphabetical order."
> (2) Even if it was guaranteed, lexigraphical sorting would give you the
> following sort order:
> 1.xml
> 10.xml
> 2.xml
> ...
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]