On 28/05/2014 10:10, Aravinda VK wrote:
Hi,

How to find number of characters in a string?

Following example returns byte count instead of number of characters.

     use std::string::String;

     fn main() {
         let unicode_str = String::from_str("ಅ");
         let ascii_str = String::from_str("a");
         println!("unicode str: {}, ascii str: {}", unicode_str.len(),
ascii_str.len());
     }

It depends on what you call a "character". As you noted, the .len() method returns the number of UTF-8 bytes. Since strings are represented as UTF-8 internally, .len() takes O(1) time.

There is also the .char_len() method, which counts the number of Unicode code points in O(n) time.

http://static.rust-lang.org/doc/master/std/str/trait.StrSlice.html#tymethod.char_len

However, what users perceive as a single "character" may be more than a single code point. These are sometimes "grapheme clusters". For example, "áo" (which renders incorrectly in my email client…) is two grapheme clusters, but is made of three code points U+0065, U+0301, and U+006F.

Rust’s standard libraries do not currently have a method for counting grapheme clusters, as far as I can tell. However, except for very specific cases (such as handling text selection in an editor), you generally don’t need to deal with grapheme clusters. Twitter also has a very specific idea of what "140 characters" means:

https://dev.twitter.com/docs/counting-characters

--
Simon Sapin
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to