[jira] [Commented] (SOLR-2960) XPathEntityProcessor does not clear nulls from empty multi-valued fields

ASF subversion and git services (JIRA) Tue, 24 Dec 2013 09:12:10 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856395#comment-13856395
 ]


ASF subversion and git services commented on SOLR-2960:
-------------------------------------------------------

Commit 1553305 from [~jdyer] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1553305 ]

SOLR-2960: XPathEntityProcessor was adding spurious nulls to multi-valued fields

> XPathEntityProcessor does not clear nulls from empty multi-valued fields
> ------------------------------------------------------------------------
>
>                 Key: SOLR-2960
>                 URL: https://issues.apache.org/jira/browse/SOLR-2960
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>            Reporter: Michael Watts
>            Assignee: James Dyer
>            Priority: Minor
>         Attachments: SOLR-2960.patch, SOLR-2960.patch
>
>
> I can't confidently say I completeley understand all that these classes so 
> boldy tackle (that is, XPathEntityProcessor and XPathRecordReader) , but 
> there may be someone who does. Nonetheless, I think I've got some or most of 
> this right, and more likely there are more someones like that. So, I won't 
> qualify everything I say with a maybe -- lets this be the refactoring of 
> those. 
> Whenever mapping an XML file into a Solr Index, within the XPathRecordReader, 
> (used by the XPathEntityProcessor within the DataImportHandler), if (A) a 
> field is perceived to be null and is multivalued, it is pushed a value of 
> null (on top of any other values it previously had). Otherwise (B) for 
> multivalued fields, any found value is pushed onto its existing list of 
> values, and the field is marked as found within the frame (a.k.a record). 
> In general, when the end-tag of a record is seen, (C) the XPathRecordReader 
> clears all of the field's values which have been marked as found, as tidiness 
> is a value and they are supposedly no longer useful. 
> However, suppose that for a given record and multivalued field, a value is 
> never found (though it may have been found for other fields in the record), 
> only (A) will have occurred, never will (B) have occurred, the field will 
> never have been marked as found, and thus (C) never will have occurred for 
> the field. 
> So, the field will remain, with its list of nulls. 
> This list of nulls will grow until either the last record or a non-null value 
> is seen. 
> And so, (1) an out-of-memory error may occur, given sufficiently many records 
> and a mortal computer. 
> Moreover, (2), a transformer cannot reliably depend on the number of nulls in 
> the field (and this information cannot be guaranteed to be determined by some 
> other value). 
> I will try to provide more information, if this seems an issue and if there 
> doesn't seem to be an answer. 
> At this point, if I understand the problem correctly, it seems the answer is 
> to 'mark' those null fields, considering 'null' and added value. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-2960) XPathEntityProcessor does not clear nulls from empty multi-valued fields

Reply via email to