>From Muhammad Haggag <mhag...@gmail.com>:

Muhammad Haggag has uploaded a new change for review.

Change subject: fdo#53399 Word count is inconsistent and wrong with 
non-breaking space
......................................................................

fdo#53399 Word count is inconsistent and wrong with non-breaking space

This change replaces lcl_IsSkippableWhitespace with a call to ICU's u_isspace, 
which covers all Unicode separators. It also updates and fixes one of the 
SwScanner unit tests.

Bug details:
SwScanner::NextWord skips whitespace before calling into ICU's BreakIterator. 
The function used to identify whitespace (lcl_IsSkippableWhitespace) doesn't 
cover the full category of Unicode separators (code [Zs], 18 in total. See: 
http://www.fileformat.info/info/unicode/category/Zs/index.htm).

Since 0xA0 (no-break space) is not identified as whitespace and not skipped, we 
end up calling ICU starting at the position 0xA0, asking it to get us the 
boundary of the next word forward. ICU sees that it's called at the end of a 
word, and reverses the query direction to backward, and returns the word 
before. This causes NextWord to think we've hit the end of the string and call 
it a day, terminating word count for the rest of the line.

Change-Id: I29c89ddb0b26e88da822501253898856b28e3fa5
---
M sw/qa/core/swdoc-test.cxx
M sw/source/core/txtnode/txtedt.cxx
2 files changed, 11 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.libreoffice.org:29418/core refs/changes/53/453/1
--
To view, visit https://gerrit.libreoffice.org/453
To unsubscribe, visit https://gerrit.libreoffice.org/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I29c89ddb0b26e88da822501253898856b28e3fa5
Gerrit-PatchSet: 1
Gerrit-Project: core
Gerrit-Branch: master
Gerrit-Owner: Muhammad Haggag <mhag...@gmail.com>

_______________________________________________
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice

Reply via email to