Larry Trammell <ridge...@nwi.net> added the comment:
Not a bug, strictly speaking... more like user abuse. The parsers (expat as well as SAX) must be able to return content text as a sequence of pieces when necessary. For example, as a text sequence interrupted by grouping or styling tags (like <span> or <i>). Or, extensive text blocks might need to be subdivided for efficient processing. Users would expect hazards like these and be wary. But how many users would suspect that a quoted string of length 8 characters would be returned in multiple pieces? Or that an entity notation would be split down the middle? Virtually all existing tutorial examples showing content extraction are WRONG -- because the ONLY content that can be trusted must be filtered through some kind of aggregator object. How many users will know this instinctively? It would be very useful for the parser systems to provide some kind of support for text aggregation function. A guarantee that "small contiguous" text items will not be chopped might also be helpful. ---------- resolution: -> not a bug stage: -> resolved status: open -> closed _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue43483> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com