On Sun, Aug 20, 2006 at 11:34:43AM +0200, Abdelrazak Younes wrote:
> >>There's an added benefit if we go the basic_string way: I think most
> >>compilers (gcc, msvc) now do implicit sharing on strings so passing
> >>parameters won't be as costly as with std::vector().
> >
> >I doubt any recent co
On Sun, Aug 20, 2006 at 03:06:25PM +0200, Lars Gullik Bjønnes wrote:
> Andre Poenitz <[EMAIL PROTECTED]> writes:
>
> | On Wed, Aug 16, 2006 at 10:34:52PM +0200, Lars Gullik Bjønnes wrote:
> | > | I know it's a bit late to voice my opinion but I think it should have
> been:
> | >
> | > Yes. I hav
Lars Gullik Bjønnes wrote:
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| Andre Poenitz wrote:
| > On Wed, Aug 16, 2006 at 05:05:53PM +0200, Abdelrazak Younes wrote:
| >> Abdelrazak Younes wrote:
| Here comes the next bit: I discovered that the result of
|
| std::vector ucs4_to_u
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| Andre Poenitz wrote:
| > On Wed, Aug 16, 2006 at 05:05:53PM +0200, Abdelrazak Younes wrote:
| >> Abdelrazak Younes wrote:
| Here comes the next bit: I discovered that the result of
|
| std::vector ucs4_to_utf8(boost::uint32_t c)
|
On Fri, Aug 18, 2006 at 01:45:23AM +0200, Andre Poenitz wrote:
> On Wed, Aug 16, 2006 at 10:34:52PM +0200, Lars Gullik Bjønnes wrote:
> > | I know it's a bit late to voice my opinion but I think it should have
> > been:
> >
> > Yes. I have been calling on help on the unicode branch for months...
Andre Poenitz <[EMAIL PROTECTED]> writes:
| On Wed, Aug 16, 2006 at 10:34:52PM +0200, Lars Gullik Bjønnes wrote:
| > | I know it's a bit late to voice my opinion but I think it should have
been:
| >
| > Yes. I have been calling on help on the unicode branch for months...
|
| Could we take a not
Andre Poenitz wrote:
On Wed, Aug 16, 2006 at 05:05:53PM +0200, Abdelrazak Younes wrote:
Abdelrazak Younes wrote:
Here comes the next bit: I discovered that the result of
std::vector ucs4_to_utf8(boost::uint32_t c)
was never used as a vector. I changed it to std::string, and that
simplifies
On Wed, Aug 16, 2006 at 05:05:53PM +0200, Abdelrazak Younes wrote:
> Abdelrazak Younes wrote:
> >>Here comes the next bit: I discovered that the result of
> >>
> >>std::vector ucs4_to_utf8(boost::uint32_t c)
> >>
> >>was never used as a vector. I changed it to std::string, and that
> >>simplifies
On Wed, Aug 16, 2006 at 10:34:52PM +0200, Lars Gullik Bjønnes wrote:
> | I know it's a bit late to voice my opinion but I think it should have been:
>
> Yes. I have been calling on help on the unicode branch for months...
Could we take a note for the future that working in branches does not
reall
Lars Gullik Bjønnes wrote:
Helge Hafting <[EMAIL PROTECTED]> writes:
| Angus Leeming wrote:
| > UTF-8 is a multi-byte encoding. It's useful for output to file
| > because the data are stored as characters (bytes). So, much of a
| > UTF-8 encoded file will be human readable; only the multi-byte
|
Helge Hafting <[EMAIL PROTECTED]> writes:
| Angus Leeming wrote:
| > UTF-8 is a multi-byte encoding. It's useful for output to file
| > because the data are stored as characters (bytes). So, much of a
| > UTF-8 encoded file will be human readable; only the multi-byte
| > sequences will not.
| >
|
Angus Leeming wrote:
UTF-8 is a multi-byte encoding. It's useful for output to file because the
data are stored as characters (bytes). So, much of a UTF-8 encoded file will
be human readable; only the multi-byte sequences will not.
Actually, the multibyte sequences are human readable
too, if
> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:
Lars> If you give a nice name to ascii_guill I'd think that the code
Lars> could be even clearer than it is now.
Lars> I am not sure that you can have you cake and eat it at all
Lars> times... unicode is more cumbersome to work with.
Am Mittwoch, 16. August 2006 23:40 schrieb Lars Gullik Bjønnes:
> Georg Baum <[EMAIL PROTECTED]> writes:
>
> | Does this mean we have now agreed on using docstring always when
dealing
> | with multibyte strings?
>
> when not close to the converter itself, imho yes.
OK. I am currently working o
Georg Baum <[EMAIL PROTECTED]> writes:
| Am Mittwoch, 16. August 2006 23:12 schrieb Lars Gullik Bjønnes:
| > Change the code so that those conversions os not needed, don't change
| > the conversions.
|
| Does this mean we have now agreed on using docstring always when dealing
| with multibyte st
Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes:
| Lars> if (disp == ascii_guill) ...
|
| This I do not like much. This hides the meaning of the code,
| especially since I have a
| else if (disp == ">>")
| two lines after. A piece of trivial code is going to be changed to
| something not so nic
Am Mittwoch, 16. August 2006 23:12 schrieb Lars Gullik Bjønnes:
> Change the code so that those conversions os not needed, don't change
> the conversions.
Does this mean we have now agreed on using docstring always when dealing
with multibyte strings?
I have had a closer look, and noticed that we
Lars Gullik Bjønnes wrote:
Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes:
| > "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:
|
| Lars> So imho if docstring should change to anything as of now it is a
| Lars> std::vector
|
| Is it a threat? ;)
Yes. Stop bickering about the b
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| Jean-Marc Lasgouttes wrote:
| >> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:
| > Lars> So imho if docstring should change to anything as of now it is
| > a
| > Lars> std::vector
| > Is it a threat? ;)
|
| No, just Lars reinventing
> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:
Lars> Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes: | Here is a
Lars> different (related) question. In the insetquote code there | is
Lars> this | if (disp == "<<") | code. How should I change it if disp
Lars> is a docstring so tha
Lars Gullik Bjønnes wrote:
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| Lars Gullik Bjønnes wrote:
| > Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| > | > ucs-2 with qt.
| > | | so no utf8 here.
| > | | > utf-8/ucs-2 with pango.
| > | | So utf8 is not necessary there also as pango deals per
Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes:
| > "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:
|
| Lars> So imho if docstring should change to anything as of now it is a
| Lars> std::vector
|
| Is it a threat? ;)
Yes. Stop bickering about the basic_string already!
--
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| Lars Gullik Bjønnes wrote:
| > Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| > | > ucs-2 with qt.
| > | | so no utf8 here.
| > | | > utf-8/ucs-2 with pango.
| > | | So utf8 is not necessary there also as pango deals perfectly
| > with ucs2
| > | (a
Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes:
| Here is a different (related) question. In the insetquote code there
| is this
| if (disp == "<<")
| code. How should I change it if disp is a docstring so that it still
| fits on one line. How do I change a C string to something that
| compares
Lars Gullik Bjønnes wrote:
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| > ucs-2 with qt.
|
| so no utf8 here.
|
| > utf-8/ucs-2 with pango.
|
| So utf8 is not necessary there also as pango deals perfectly with ucs2
| (and 4?)
What is your point really?
My point is that you don't really
Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes:
| I'd like to find a syntax such that the translation of our code to
| unicode does not transform every line into 3 lines (like the examples
| where we push \0 explicitely).
That is easy: just make everything use docstring and char_type.
Why do yo
> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:
| That said, I'd appreciate to work on strings instead of vectors for
| utf-8...
Lars> Huh? Where do you plan to work on utf-8 at all?
OK, I got it wrong.
Here is a different (related) question. In the insetquote code there
is thi
Jean-Marc Lasgouttes wrote:
"Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:
Lars> So imho if docstring should change to anything as of now it is a
Lars> std::vector
Is it a threat? ;)
No, just Lars reinventing basic_string with vector ;-)
If you look at the STL code basic_string i
Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes:
| > "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:
|
| Lars> However our internal interface is in message.C and from
| Lars> Message::get we can perfectly well output a docstring instead of
| Lars> a string. (and thus ucs-4)
|
| Yes
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| > ucs-2 with qt.
|
| so no utf8 here.
|
| > utf-8/ucs-2 with pango.
|
| So utf8 is not necessary there also as pango deals perfectly with ucs2
| (and 4?)
What is your point really?
| > What aspell can use I have no
| > idea about. (it can use uc
> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:
Lars> So imho if docstring should change to anything as of now it is a
Lars> std::vector
Is it a threat? ;)
JMarc
> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:
Lars> | why do we use unsigned short instead of boost::uint16_t here.
Lars> I know | they are the same, but wouldn't it be clearer?
Lars> Perhaps. (but of course we don't have a basic_string short> anyway...)
I thought about the vec
Lars Gullik Bjønnes wrote:
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| Lars Gullik Bjønnes wrote:
| > Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| > | That's exactly what this means but, sure, that is just my opinion.
| > You
| > | obviously are in love with your vector solution ;-)
| > I
> "Lars" == Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:
Lars> However our internal interface is in message.C and from
Lars> Message::get we can perfectly well output a docstring instead of
Lars> a string. (and thus ucs-4)
Yes, and if it is costly, we'll change it later on.
But if our po
Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes:
| > "Abdelrazak" == Abdelrazak Younes <[EMAIL PROTECTED]> writes:
|
| Abdelrazak> I know it's a bit late to voice my opinion but I think it
| Abdelrazak> should have been:
|
| Off-topic questions:
|
| Abdelrazak> typedef std::basic_string ucs
Lars Gullik Bjønnes wrote:
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| Lars Gullik Bjønnes wrote:
| > Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| > | > Both ucs2 and ucs4 use a fixed number of bytes for one character
| > (2
| > | > and 4, respectively, surprise, surprise!). The problem i
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| Lars Gullik Bjønnes wrote:
| > Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| > | That's exactly what this means but, sure, that is just my opinion.
| > You
| > | obviously are in love with your vector solution ;-)
| > If the only semantics are "bun
> "Abdelrazak" == Abdelrazak Younes <[EMAIL PROTECTED]> writes:
Abdelrazak> I know it's a bit late to voice my opinion but I think it
Abdelrazak> should have been:
Off-topic questions:
Abdelrazak> typedef std::basic_string ucs2_string;
why do we use unsigned short instead of boost::uint16_t
Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes:
| > "Abdelrazak" == Abdelrazak Younes <[EMAIL PROTECTED]> writes:
|
| >> Or communicationg with other libs api's.
|
| Abdelrazak> Which one?
|
| gettext at least.
Gettext msgids should be ASCII. (And we require them to be ASCII..)
Gettext o
Lars Gullik Bjønnes wrote:
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| That's exactly what this means but, sure, that is just my opinion. You
| obviously are in love with your vector solution ;-)
If the only semantics are "bunch of bytes" etc. then a vector is
correct.
Sure but in this ca
> "Abdelrazak" == Abdelrazak Younes <[EMAIL PROTECTED]> writes:
>> Or communicationg with other libs api's.
Abdelrazak> Which one?
gettext at least.
JMarc
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| Lars Gullik Bjønnes wrote:
| > Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| > | > Both ucs2 and ucs4 use a fixed number of bytes for one character
| > (2
| > | > and 4, respectively, surprise, surprise!). The problem is a
| > | > variable-byte enc
Lars Gullik Bjønnes wrote:
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| > Both ucs2 and ucs4 use a fixed number of bytes for one character (2
| > and 4, respectively, surprise, surprise!). The problem is a
| > variable-byte encoding such as utf8.
|
| Yes I understood that far, sorry for "qui
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| That's exactly what this means but, sure, that is just my opinion. You
| obviously are in love with your vector solution ;-)
If the only semantics are "bunch of bytes" etc. then a vector is
correct.
For passing ucs-4 strings around we already have
Lars Gullik Bjønnes wrote:
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| > UTF-8 is a multi-byte encoding. It's useful for output to file
| > because the data are stored as characters (bytes). So, much of a
| > UTF-8 encoded file will be human readable; only the multi-byte
| > sequences will n
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| > UTF-8 is a multi-byte encoding. It's useful for output to file
| > because the data are stored as characters (bytes). So, much of a
| > UTF-8 encoded file will be human readable; only the multi-byte
| > sequences will not.
| > Storing UTF-8 encoded
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| > Both ucs2 and ucs4 use a fixed number of bytes for one character (2
| > and 4, respectively, surprise, surprise!). The problem is a
| > variable-byte encoding such as utf8.
|
| Yes I understood that far, sorry for "quiproquo". IMHO, the only code
Angus Leeming wrote:
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
Hum... I am not I follows everything but let me summarize what I
understand from current code. The std::vectors I am talking about are:
* vector: could be replaced by std::basic_string
* vector: that is ucs2 right? That could b
Georg Baum wrote:
Am Mittwoch, 16. August 2006 18:41 schrieb Abdelrazak Younes:
Hum... I am not I follows everything but let me summarize what I
understand from current code. The std::vectors I am talking about are:
* vector: could be replaced by std::basic_string
* vector: that is ucs2 right?
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| Georg Baum wrote:
| > Am Mittwoch, 16. August 2006 18:12 schrieb Abdelrazak Younes:
| >> Lars Gullik Bjønnes wrote:
| >>
| >>> string.length() will be lying to you when you store utf-8 in it.
| >> Why is that? Because of some trailing \0?
| > No. ut
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
> Hum... I am not I follows everything but let me summarize what I
> understand from current code. The std::vectors I am talking about are:
>
> * vector: could be replaced by std::basic_string
> * vector: that is ucs2 right? That could be replaced by
Am Mittwoch, 16. August 2006 18:41 schrieb Abdelrazak Younes:
> Hum... I am not I follows everything but let me summarize what I
> understand from current code. The std::vectors I am talking about are:
>
> * vector: could be replaced by std::basic_string
> * vector: that is ucs2 right? That could
Georg Baum wrote:
Am Mittwoch, 16. August 2006 18:12 schrieb Abdelrazak Younes:
Lars Gullik Bjønnes wrote:
string.length() will be lying to you when you store utf-8 in it.
Why is that? Because of some trailing \0?
No. utf8 is a multibyte encoding: Some characters use one byte, some two
and
Am Mittwoch, 16. August 2006 18:01 schrieb Lars Gullik Bjønnes:
> Georg Baum <[EMAIL PROTECTED]> writes:
> | Here comes the next bit: I discovered that the result of
> |
> | std::vector ucs4_to_utf8(boost::uint32_t c)
> |
> | was never used as a vector. I changed it to std::string, and that
simp
Am Mittwoch, 16. August 2006 18:12 schrieb Abdelrazak Younes:
> Lars Gullik Bjønnes wrote:
>
> > string.length() will be lying to you when you store utf-8 in it.
>
> Why is that? Because of some trailing \0?
No. utf8 is a multibyte encoding: Some characters use one byte, some two
and some even m
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| Abdelrazak Younes wrote:
| >> Here comes the next bit: I discovered that the result of
| >>
| >> std::vector ucs4_to_utf8(boost::uint32_t c)
| >>
| >> was never used as a vector. I changed it to std::string, and that
| >> simplifies
| >> the code. In
Lars Gullik Bjønnes wrote:
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| Georg Baum wrote:
| > Lars Gullik Bjønnes wrote:
| >
| >> Conversion between the different unicode encodings are pretty cheap.
| > Yes, but what I am more concerned about are lots of ucs4_to_utf8 or
| > vice
| > versa in
Georg Baum <[EMAIL PROTECTED]> writes:
| Lars Gullik Bjønnes wrote:
|
| > Conversion between the different unicode encodings are pretty cheap.
|
| Yes, but what I am more concerned about are lots of ucs4_to_utf8 or vice
| versa in the code. That just makes it a bit less readable.
|
| > | Since
Abdelrazak Younes <[EMAIL PROTECTED]> writes:
| Georg Baum wrote:
| > Lars Gullik Bjønnes wrote:
| >
| >> Conversion between the different unicode encodings are pretty cheap.
| > Yes, but what I am more concerned about are lots of ucs4_to_utf8 or
| > vice
| > versa in the code. That just makes it
Abdelrazak Younes wrote:
Here comes the next bit: I discovered that the result of
std::vector ucs4_to_utf8(boost::uint32_t c)
was never used as a vector. I changed it to std::string, and that
simplifies
the code. In particular it removes manual fiddling with the terminating
'\0', which we sho
Georg Baum wrote:
Lars Gullik Bjønnes wrote:
Conversion between the different unicode encodings are pretty cheap.
Yes, but what I am more concerned about are lots of ucs4_to_utf8 or vice
versa in the code. That just makes it a bit less readable.
| Since the po
| files will eventually be in
Lars Gullik Bjønnes wrote:
> Conversion between the different unicode encodings are pretty cheap.
Yes, but what I am more concerned about are lots of ucs4_to_utf8 or vice
versa in the code. That just makes it a bit less readable.
> | Since the po
> | files will eventually be in utf8 it seems nat
Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:
> Actually I guess that using a mix of utf-8 and ucs-4 is the cop-out to
> have everything work as soon as possible.
> A full change to ucs-4 will require more code changes than a mix.
Ok, you're being practical. That I understand even if I'm uncomf
Angus Leeming <[EMAIL PROTECTED]> writes:
| Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:
| > Georg Baum <[EMAIL PROTECTED]> writes:
|
| > | > Should a call to gettext (_()) give us utf8 or ucs4?, so far I am
| > | > inclined to go for utf8.
| > | If we only knew which variant results in less
Lars Gullik Bjønnes <[EMAIL PROTECTED]> writes:
> Georg Baum <[EMAIL PROTECTED]> writes:
> | > Should a call to gettext (_()) give us utf8 or ucs4?, so far I am
> | > inclined to go for utf8.
> | If we only knew which variant results in less conversions.
> Conversion between the different unicode
Georg Baum <[EMAIL PROTECTED]> writes:
| > Should a call to gettext (_()) give us utf8 or ucs4?, so far I am
| > inclined to go for utf8.
|
| If we only knew which variant results in less conversions.
Conversion between the different unicode encodings are pretty cheap.
| Since the po
| files wi
Jean-Marc Lasgouttes <[EMAIL PROTECTED]> writes:
| > "Georg" == Georg Baum <[EMAIL PROTECTED]> writes:
|
| | Or should we not change the type, but use utf8 as encoding instead?
| | I believe the former is safer.
|
| >> This is one of the things I am thinking about... esp. in rel. to
| >> get
> "Georg" == Georg Baum <[EMAIL PROTECTED]> writes:
| Or should we not change the type, but use utf8 as encoding instead?
| I believe the former is safer.
>> This is one of the things I am thinking about... esp. in rel. to
>> gettext and l10n.
In general, we should declare what code should d
Lars Gullik Bjønnes wrote:
> Georg Baum <[EMAIL PROTECTED]>
> writes:
>
> So far I have only created what I needed. But even if we add more
> convenience fuctions we should be careful when adding them, we do not
> want to many imho.
Yes. We'll see what is useful as the conversion goes on.
> | O
Georg Baum <[EMAIL PROTECTED]> writes:
| This small patch makes most of plain text readable again (in utf8).
|
| Questions:
|
| 1) Is it on purpose that the functions in unicode.h convert only between
| std::vectors of characters and C strings, but not std::string/docstring? I
| think we should
70 matches
Mail list logo