Re: String literals, ASCII vs UTF-8

2012-03-13 Thread Lubos Lunak
On Tuesday 13 of March 2012, Olivier Hallot wrote: > > Are we going to remove RTL_CONSTASCII_USTRINGPARAM from all the code? > There a 33599(*) lines in whole code... That is the plan, but I still want to give it a little more time in case somebody runs into a problem. I just massively broke th

Re: String literals, ASCII vs UTF-8

2012-03-13 Thread Olivier Hallot
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Lubos Em 05-03-2012 09:29, Lubos Lunak escreveu: > On Tuesday 28 of February 2012, Lubos Lunak wrote: >> I'd like to revisit the choice of considering string literals to be either >> ASCII or UTF-8, as discussed in the thread about removing >> RTL

Re: Checking string allocations (was Re: String literals, ASCII vs UTF-8)

2012-03-08 Thread Stephan Bergmann
On 03/05/2012 04:29 PM, Michael Meeks wrote: On Fri, 2012-03-02 at 17:16 +, Caolán McNamara wrote: Yeah, back the O[UString contents with direct new/delete calls in a real implementation body instead of current thin header-only wrapper around the C-API which backs onto rtl_allocateMemory/rt

Re: Checking string allocations (was Re: String literals, ASCII vs UTF-8)

2012-03-05 Thread Michael Meeks
On Mon, 2012-03-05 at 16:34 +, Caolán McNamara wrote: > In the general case presumably you then need to run around sticking > throw()/nothrow onto loads of things in order to tell the compiler that > stuff isn't going to throw exceptions, and/or disabling exceptions to > get the compiler to do

Re: Checking string allocations (was Re: String literals, ASCII vs UTF-8)

2012-03-05 Thread Caolán McNamara
On Mon, 2012-03-05 at 15:29 +, Michael Meeks wrote: > On Fri, 2012-03-02 at 17:16 +, Caolán McNamara wrote: > At the most banal level, I suspect that: > > struct Empty { int unused; }; > Empty *p = new Empty(); > delete p; > > can't legitimately be optimised away i

Re: Checking string allocations (was Re: String literals, ASCII vs UTF-8)

2012-03-05 Thread Michael Meeks
On Fri, 2012-03-02 at 17:16 +, Caolán McNamara wrote: > Yeah, back the O[UString contents with direct new/delete calls in a real > implementation body instead of current thin header-only wrapper around > the C-API which backs onto rtl_allocateMemory/rtl_freeMemory. I'm really rather c

Re: String literals, ASCII vs UTF-8

2012-03-05 Thread Lubos Lunak
On Tuesday 28 of February 2012, Lubos Lunak wrote: > I'd like to revisit the choice of considering string literals to be either > ASCII or UTF-8, as discussed in the thread about removing > RTL_CONSTASCII_USTRINGPARAM. While I was ambivalent about it, I now think > we should go with ASCII only, un

Re: Checking string allocations (was Re: String literals, ASCII vs UTF-8)

2012-03-02 Thread Caolán McNamara
On Fri, 2012-03-02 at 16:00 +0100, Stephan Bergmann wrote: > And when/if we replace the sal C API with a C++ one in LO 4 (where a > memory allocation function, if we would still need a home-grown one > anyway, would naturally throw bad_alloc) Yeah, back the O[UString contents with direct new/del

Re: Checking string allocations (was Re: String literals, ASCII vs UTF-8)

2012-03-02 Thread Stephan Bergmann
On 03/01/2012 03:42 PM, Lubos Lunak wrote: On Wednesday 29 of February 2012, Caolán McNamara wrote: But ok, it's too much to just abort in every case. I think however that whether to abort or try to recover does not actually depend on the class where the problem occurs, but on where the class i

Re: Checking string allocations (was Re: String literals, ASCII vs UTF-8)

2012-03-01 Thread Lubos Lunak
On Wednesday 29 of February 2012, Caolán McNamara wrote: > On Wed, 2012-02-29 at 17:11 +0100, Lubos Lunak wrote: > > Do we actually have code that tries to gracefully handle running out of > > memory? Because if not, and I doubt we realistically do[*] > > Its not O[U]String related, but FWIW vcl/un

Re: Checking string allocations (was Re: String literals, ASCII vs UTF-8)

2012-02-29 Thread Caolán McNamara
On Wed, 2012-02-29 at 17:11 +0100, Lubos Lunak wrote: > Do we actually have code that tries to gracefully handle running out of > memory? Because if not, and I doubt we realistically do[*] Its not O[U]String related, but FWIW vcl/unx/source/gdi/salbmp.cxx has some std::bad_alloc catches from comm

Re: Checking string allocations (was Re: String literals, ASCII vs UTF-8)

2012-02-29 Thread Lubos Lunak
On Wednesday 29 of February 2012, Stephan Bergmann wrote: > On 02/29/2012 03:28 PM, Lubos Lunak wrote: > > On Wednesday 29 of February 2012, Stephan Bergmann wrote: > >> However, there are also situations where bad input (malicious or > >> otherwise) would cause an application to request excessive

Re: Checking string allocations (was Re: String literals, ASCII vs UTF-8)

2012-02-29 Thread Stephan Bergmann
On 02/29/2012 03:28 PM, Lubos Lunak wrote: On Wednesday 29 of February 2012, Stephan Bergmann wrote: However, there are also situations where bad input (malicious or otherwise) would cause an application to request excessive amounts of memory to do a single task (e.g., open a document), and at l

Checking string allocations (was Re: String literals, ASCII vs UTF-8)

2012-02-29 Thread Lubos Lunak
First, so that we have some numbers, libmswordlo.so debug build, but compiled with -O2 -g0 : 4627576 - without any extra checks in OUString 4625248 - with abort() 4642216 - with std::bad_alloc Yes, adding abort() there somehow makes it a tiny bit smaller, search me why. For std::bad_alloc

Re: String literals, ASCII vs UTF-8

2012-02-29 Thread Noel Grandin
I would expect that filters should be validating their inputs. By the time that we get a bad_alloc, it's too late to recover properly. Unless we're talking about someday running filters in a separate process, and then validating the document they generate, in which case the main process would re

Re: String literals, ASCII vs UTF-8

2012-02-29 Thread Eike Rathke
Hi Stephan, On Wednesday, 2012-02-29 08:42:35 +0100, Stephan Bergmann wrote: > However, there are also situations where bad input (malicious or > otherwise) would cause an application to request excessive amounts > of memory to do a single task (e.g., open a document), and at least > in theory th

Re: String literals, ASCII vs UTF-8

2012-02-29 Thread Stephan Bergmann
On 02/29/2012 10:57 AM, Michael Meeks wrote: Having said all this - I think we can agree that if we are calling this new 'createFromAscii_WithLength' method - which is (currently) only called during these magic constructors for compile time constants, that the chance of having a 4Gb compi

Re: String literals, ASCII vs UTF-8

2012-02-29 Thread Stephan Bergmann
On 02/28/2012 02:39 PM, Stephan Bergmann wrote: On 02/28/2012 12:30 PM, Lubos Lunak wrote: PS: Any idea why ' OUString foo() { return "foo";} ' does not work, even though the ctor is not explicit? I can't recall a reason why a return value would need to be different from the other cases. Looks

Re: String literals, ASCII vs UTF-8

2012-02-29 Thread Michael Meeks
On Wed, 2012-02-29 at 08:42 +0100, Stephan Bergmann wrote: > But how bad is that, anyway? A little experiment shows that the > compiler will happily outline those inline functions detecting for > bad_alloc, creating one instance of them per library. Heh ;-) so - I'd love to see the siz

Re: String literals, ASCII vs UTF-8

2012-02-29 Thread Stephan Bergmann
On 02/29/2012 08:57 AM, Noel Grandin wrote: Surely the cheapest call-site check for the result of malloc() is just to attempt a fetch from the memory location? That will trigger SIGSEGV, but at least you'll get a stack-trace out of it. But, as I argue, propagating to the call site is generally

Re: String literals, ASCII vs UTF-8

2012-02-28 Thread Noel Grandin
Surely the cheapest call-site check for the result of malloc() is just to attempt a fetch from the memory location? That will trigger SIGSEGV, but at least you'll get a stack-trace out of it. On 2012-02-29 09:42, Stephan Bergmann wrote: On 02/28/2012 02:48 PM, Lubos Lunak wrote: Speaking of t

Re: String literals, ASCII vs UTF-8

2012-02-28 Thread Stephan Bergmann
On 02/28/2012 02:48 PM, Lubos Lunak wrote: Speaking of the size at the call-site, I good part is the code trying to throw std::bad_alloc in case the allocation fails. That actually looks rather useless to me, for several reasons: - not all OUString methods check for this anyway - rtl_uString*

Re: String literals, ASCII vs UTF-8

2012-02-28 Thread Caolán McNamara
On Tue, 2012-02-28 at 14:48 +0100, Lubos Lunak wrote: > - with today's systems (overcommitting, etc.) it is rather pointless to guard > against allocation failures Another scenario of possibly more usefulness than actually "running out of memory" is being directed to allocate a lunatic string siz

Re: String literals, ASCII vs UTF-8

2012-02-28 Thread Michael Meeks
On Tue, 2012-02-28 at 15:07 -0300, Olivier Hallot wrote: > Em 28-02-2012 14:40, Eike Rathke escreveu: > > People also complained about Writer paragraphs being limited to 64k > > characters ... that's ~15 pages of eye-tiring continuous lines, but ... > > https://issues.apache.org/ooo/show_bug.cgi?

Checking string allocations (was Re: String literals, ASCII vs UTF-8)

2012-02-28 Thread Lubos Lunak
On Tuesday 28 of February 2012, Eike Rathke wrote: > On Tuesday, 2012-02-28 16:37:38 +, Michael Meeks wrote: > > Of course, on the very rare > > occasions that we do a huge allocation for a string - perhaps we store > > an entire VBA module in a single string or something silly ;-) > > As soon

Re: String literals, ASCII vs UTF-8

2012-02-28 Thread Michael Meeks
On Tue, 2012-02-28 at 18:40 +0100, Eike Rathke wrote: > As soon as Calc will use OUString instead of String for cell content and > formula results exactly that will happen.. :-) > I've seen Calc abused as a front end for some sort of web CMS, holding > entire HTML "template" fragments in

Re: String literals, ASCII vs UTF-8

2012-02-28 Thread Olivier Hallot
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi gentlemen Em 28-02-2012 14:40, Eike Rathke escreveu: > People also complained about Writer paragraphs being limited to 64k > characters ... that's ~15 pages of eye-tiring continuous lines, but ... https://issues.apache.org/ooo/show_bug.cgi?id=1717

Re: String literals, ASCII vs UTF-8

2012-02-28 Thread Eike Rathke
Hi Michael, On Tuesday, 2012-02-28 16:37:38 +, Michael Meeks wrote: > Of course, on the very rare > occasions that we do a huge allocation for a string - perhaps we store > an entire VBA module in a single string or something silly ;-) As soon as Calc will use OUString instead of String for

Re: String literals, ASCII vs UTF-8

2012-02-28 Thread Michael Meeks
On Tue, 2012-02-28 at 14:48 +0100, Lubos Lunak wrote: > Speaking of the size at the call-site, I good part is the code trying to > throw std::bad_alloc in case the allocation fails. That actually looks rather > useless to me, for several reasons: > > - not all OUString methods check for this a

Re: String literals, ASCII vs UTF-8

2012-02-28 Thread Lubos Lunak
On Tuesday 28 of February 2012, Michael Meeks wrote: > I would really prefer to use a new: > > rtl_uString_newFromAsciiL( &pNew, literal, N - 1 ); Attached. > method - which should shrink the call-site, and allow for a rather > better implementation vs. Speaking of the siz

Re: String literals, ASCII vs UTF-8

2012-02-28 Thread Stephan Bergmann
On 02/28/2012 12:30 PM, Lubos Lunak wrote: The reason for this is that I have patches adding more functions taking string literals and there it makes much more sense to require only ASCII. For example OUString::operator== can be simply a call to OUString::equalsAsciiL() for ASCII, but for UTF-8

Re: String literals, ASCII vs UTF-8

2012-02-28 Thread Michael Meeks
On Tue, 2012-02-28 at 12:30 +0100, Lubos Lunak wrote: > RTL_CONSTASCII_USTRINGPARAM. While I was ambivalent about it, I now think we > should go with ASCII only, unless explicitly marked otherwise. :-) your arguments make sense to me at least. OUString( const char (&literal)[ N ] )