On 28/05/2014 10:10, Aravinda VK wrote:
Hi,
How to find number of characters in a string?
Following example returns byte count instead of number of characters.
use std::string::String;
fn main() {
let unicode_str = String::from_str("ಅ");
let ascii_str = String::from_str("a");
println!("unicode str: {}, ascii str: {}", unicode_str.len(),
ascii_str.len());
}
It depends on what you call a "character". As you noted, the .len()
method returns the number of UTF-8 bytes. Since strings are represented
as UTF-8 internally, .len() takes O(1) time.
There is also the .char_len() method, which counts the number of Unicode
code points in O(n) time.
http://static.rust-lang.org/doc/master/std/str/trait.StrSlice.html#tymethod.char_len
However, what users perceive as a single "character" may be more than a
single code point. These are sometimes "grapheme clusters". For example,
"áo" (which renders incorrectly in my email client…) is two grapheme
clusters, but is made of three code points U+0065, U+0301, and U+006F.
Rust’s standard libraries do not currently have a method for counting
grapheme clusters, as far as I can tell. However, except for very
specific cases (such as handling text selection in an editor), you
generally don’t need to deal with grapheme clusters. Twitter also has a
very specific idea of what "140 characters" means:
https://dev.twitter.com/docs/counting-characters
--
Simon Sapin
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev