On Fri, Aug 16, 2013 at 02:31:53PM +0200, Age Jan Kuperus wrote: > We are using libxml2/lib(e)xslt since 2004, and very happy with it > in general. Recently we discovered that str:tokenize and str:split > do not always meet our expectations. The problem we have is that > empty elements are silently removed. As an example, > str:tokenize('abcdef,fghij, klmnop, ,,qrstuvw , xyz, ,,', ',') > generates a node-set with seven elements instead of the ten we > expected. Some applications (conversion of .csv based files is the > obvious example) really need to know where empty fields are present. > A second enhancement we would like to have (in str:tokenize only) is > an indication (in an attribute of the token) of the delimiter that > was present between two tokens. What is your opinion about this?
Might be an overlook in the implementation, however the definition http://www.exslt.org/str/functions/tokenize/ states "The str:tokenize function splits up a string and returns a node set of token elements, each containing one token from the string." The problem is that in a in an XML context a token is usually taken as this definition: http://www.w3.org/TR/REC-xml/#NT-Nmtoken [7] Nmtoken ::= (NameChar)+ and hum, that doesn't allow for an empty string. I guess the best at this point would be to check what the other implementations are doing and try to follow the majority, because i don't thing there is much maintainance on EXSLT at this point. The other options is to stick to the XSLT-2.0 semantic for the equivalent function and indeed it seems to do what you expect, e.g. Example 3 in http://zvon.org/comp/r/ref-XSLT_2.html#Functions~tokenize being clear there So sounds it can be caracterized as a bug :-) but it's a bit fuzzy Daniel -- Daniel Veillard | Open Source and Standards, Red Hat veill...@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml