Re: [Kicad-developers] 6.0 string proposal

Jeff Young Fri, 03 May 2019 08:18:59 -0700

I don’t believe that’s the case.  Neither of the two crashes that I tracked 
down involved direct index access (or any change to the string).  One was 
calling foo.IsEmpty(), and the other foo.Length().  Both use const iterators 
under the hood.


When wxUSE_UNICODE_UTF8 is off, wxWidgets gets around parsing the string by 
just not doing it:

Remarks
Note that while the behaviour of wxString 
<https://docs.wxwidgets.org/trunk/classwx_string.html> when 
wxUSE_UNICODE_WCHAR==1 resembles UCS-2 encoding, it's not completely correct to 
refer to wxString <https://docs.wxwidgets.org/trunk/classwx_string.html> as 
UCS-2 encoded since you can encode code points outside the BMP in a wxString 
<https://docs.wxwidgets.org/trunk/classwx_string.html> as two code units (i.e. 
as a surrogate pair; as already mentioned however wxString 
<https://docs.wxwidgets.org/trunk/classwx_string.html> will "see" them as two 
different code points)


> On 3 May 2019, at 16:06, John Beard <john.j.be...@gmail.com> wrote:
> 
> Hi Jeff,
> 
> I think it is the index access operator that performs this caching, to allow 
> you to access the n'th code point any number of times while only iterating 
> the string only once.
> 
> However, you can still use the iterator access safely. It is only index based 
> access that is cached and thread-unsafe.
> 
> This is what the wxString documention recommends. Furthermore, in any Unicode 
> string, regardless of encoding (8, 16, 32), index access is almost entirely 
> useless anyway, as code units/points are only indirectly related to glyphs 
> and/or perceived characters anyway. If you need to parse a Unicode string, 
> you must iterate from the start. There is no way around it.
> 
> If we're crashing due to cross thread access by index, the bug is probably 
> that we access the string by index at all. If this was accessed by iterator, 
> cross thread, and the string is not changed, it's fine. If the string is 
> changed in another thread, cached iterators are invalid (same as if you 
> change an C++ container in a single thread. The standard tells you what 
> iterators are invalidated for each operation on a container).
> 
> I may have got the wrong end of the wxStick here (I can't check it for myself 
> right now), but as far as I can tell, this is fixable by just never caching 
> indices, as if we were looking at a C-style char array, and using iterators 
> instead.
> 
> We should probably also turn off the unsafe string conversions by defining 
> wxNO_UNSAFE_WXSTRING_CONV, if it is not already define.
> 
> Cheers,
> 
> John
> 
> On 3 May 2019 16:35:30 CEST, Jeff Young <j...@rokeby.ie> wrote:
> Yes, we know exactly why it crashes: in order to speed up iterator access 
> each iterator keeps a pointer into the last location accessed (so that i-1 
> and i+1 can be fast).  These pointers are kept in a linked-list.  Adding and 
> removing pointers from this list is not thread-protected.
> 
> Note that wxWidgets will add/remove a pointer even for something seemingly 
> innocuous like an Empty() check.  So doing mutex locks on our side for 
> non-const iterator access is not sufficient.
> 
> The worst part is that since two threads collide on the same string only 
> rarely, we don’t even know how many of these bugs we have.  We’ve fixed 3 or 
> 4 of them (by adding our own mutex checking on any access), but are there 0 
> or 10 more?  Haven’t a clue.
> 
>>> It is between sad and breath taking.
> 
> Indeed.
> 
> Cheers,
> Jeff.
> 
>> On 3 May 2019, at 15:16, Dick Hollenbeck <d...@softplc.com 
>> <mailto:d...@softplc.com>> wrote:
>> 
>> Thanks Jeff.
>> 
>> On 5/3/19 4:22 AM, Jeff Young wrote:
>>> Hi Dick,
>>> 
>>>>> h) What is the list of deficiencies with current string usage?
>>> 
>>> I only have one issue with the current use of wxString, but it’s a big one: 
>>> it crashes
>>> (unpredictably) when used multi-threaded in UTF8 mode.
>> 
>> The fact that it is onely *One* issue is an important data point.
>> 
>> Since you know it is crashing in this class, you must know approximately 
>> where, and under
>> what kind of read/write activity.  Of course, if read activity triggers a 
>> lazy (deferred)
>> transformation, then this distinction can get blurred.  But more information 
>> on source
>> file locations would be very helpful to me.
>> 
>> Another important data point you brought is that the wx library designers 
>> are advising
>> against using wxString for core application.  It will take a couple of hours 
>> to even
>> contemplate that, it is basically staggering to me.  It is between sad and 
>> breath taking.
>> Sounds like they designed themselves into a corner and are now acknowledging 
>> that what
>> they designed is more of an API commitment that they want to disavow than a 
>> real solution.
>> 
>> I can see where that can happen.  Superior designs come from experience.  
>> Experience comes
>> with usage and time, neither of which are always available up front.
>> 
>> 
>> 
>> 
>> 
>>> 
>>> This design document makes for fascinating
>>> reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support 
>>> <https://wiki.wxwidgets.org/Development:_UTF-8_Support>.  It appears that 
>>> the
>>> current wxString is at least in part modelled on QtString.
>>> 
>>> There’s also a bunch of interesting info
>>> here: https://docs.wxwidgets.org/trunk/overview_string.html 
>>> <https://docs.wxwidgets.org/trunk/overview_string.html>, which I believe is 
>>> more
>>> up-to-date than the previous link.  In particular, there’s the mention that 
>>> wxString
>>> handles extra-BMP characters transparently when compiled in UTF8 mode 
>>> (currently used by
>>> Kicad), but does NOT when compiled in default mode (in which case the app 
>>> must handle
>>> surrogate pairs).  This of course directly leads to your point (d):
>>> 
>>>>>> d) What does the set of characters that don't fall into UCS2 actually 
>>>>>> look like?  How big
>>>>>> is this set, really?  (UTF16 is bigger than UCS2 and picks up the 
>>>>>> difference.)
>>> 
>>> Do we really need to handle extra-BMP characters?
>>> 
>>> An even more recent version of the second document
>>> (https://docs.wxwidgets.org/trunk/classwx_string.html 
>>> <https://docs.wxwidgets.org/trunk/classwx_string.html>) finally makes an 
>>> oblique reference
>>> to the multi-threading issue by starting with this (rather unhelpful) 
>>> suggestion:
>>> 
>>> Note
>>>    While the use of wxString 
>>> <https://docs.wxwidgets.org/trunk/classwx_string.html 
>>> <https://docs.wxwidgets.org/trunk/classwx_string.html>> is
>>>    unavoidable in wxWidgets program, you are encouraged to use the standard 
>>> string
>>>    classes |std::string| or |std::wstring| in your applications and convert 
>>> them to and
>>>    from wxString <https://docs.wxwidgets.org/trunk/classwx_string.html 
>>> <https://docs.wxwidgets.org/trunk/classwx_string.html>> only when
>>>    interacting with wxWidgets.
>>> 
>>> 
>>> Cheers,
>>> Jeff.
>>> 
>>> 
>>>> On 3 May 2019, at 02:03, Dick Hollenbeck <d...@softplc.com 
>>>> <mailto:d...@softplc.com> <mailto:d...@softplc.com 
>>>> <mailto:d...@softplc.com>>> wrote:
>>>> 
>>>> On 5/2/19 5:32 PM, Dick Hollenbeck wrote:
>>>>> On 4/30/19 4:36 AM, Jeff Young wrote:
>>>>>> We had talked earlier about throwing the wxWidgets UTF8 compile switch 
>>>>>> to get rid of
>>>>>> our wxString re-entrancy problems.  However, I noticed that the 6.0 work 
>>>>>> packages doc
>>>>>> includes an item for std::string-ization of the BOARD.  (While a lot 
>>>>>> more work, this
>>>>>> is a better solution because it also increases our gui-toolkit-choice 
>>>>>> flexibility.)
>>>>>> 
>>>>>> I’d like to propose that we use std::wstring for that.  UTF8 should 
>>>>>> *only* be an
>>>>>> encoding format (similar to s-expr).  It should never be used 
>>>>>> internally. That’s what
>>>>>> unicode wchar_t’s are for.
>>>>>> 
>>>>>> And I’d like to propose that we extend std::wstring-ization to SCH_ITEM 
>>>>>> and LIB_ITEM.
>>>>>>  (Then we can get rid of a bunch of our ugly mutex hacks.)
>>>>> 
>>>>> 
>>>>> I've been looking at this for a few months now.  I think it is so 
>>>>> important, that a
>>>>> sub-committee should be formed, and if that committee takes as long as 4 
>>>>> months to come to
>>>>> a recommendation, this would not be too long.  This issue is simply too 
>>>>> critical.
>>>>> 
>>>>> I would like to volunteer to be on that committee.  For the entire list 
>>>>> to participate in
>>>>> this simply does not make sense to me.  I would welcome the opportunity 
>>>>> to study this with
>>>>> a team of 5-6 players.  More than that probably leads to anxiety.  Then, 
>>>>> given the
>>>>> recommendations, the list would of course have an opportunity to raise 
>>>>> questions and take
>>>>> shots, before a strategy is formulated, and before anything is 
>>>>> implemented.
>>>>> 
>>>>> Again, approximately:
>>>>> 
>>>>>  committee recommendations -> list approval -> strategy formulation -> 
>>>>> implementation
>>>>> 
>>>>> 
>>>>> Up to now I have looked at many libraries and have [way *too* much] 
>>>>> experience in multiple
>>>>> languages on multiple platforms, so I think I can be valuable contributor.
>>>>> 
>>>>> The final work product initially would simply be a list of 
>>>>> recommendations, that quickly
>>>>> transforms to a strategy thereafter.  This is an enormous undertaking, so 
>>>>> I suggest
>>>>> against racing to a solution.  It could look a lot easier than it will 
>>>>> ultimately be, as
>>>>> is typical in software development.  But the return on investment needs 
>>>>> to be near optimal
>>>>> in the end.
>>>>> 
>>>>> Some questions to answer are:
>>>>> 
>>>>> a) How did wxString get to its current state?  Is is merely a 
>>>>> conglomeration of after
>>>>> thought, or is is anywhere near optimal.
>>>>> 
>>>>> b) Why so many forms of it?  Can one form be chosen for all platforms?
>>>>> 
>>>>> c) How does wxString it compare to QtString?
>>>>> 
>>>>> d) What does the set of characters that don't fall into UCS2 actually 
>>>>> look like?  How big
>>>>> is this set, really?  (UTF16 is bigger than UCS2 and picks up the 
>>>>> difference.)
>>>>> 
>>>>> e) For data files, I think UTF8 is fine.  So the change is for RAM 
>>>>> manipulation of
>>>>> strings.  Aren't we talking about a RAM resident string that bridges into 
>>>>> the GUI
>>>>> seamlessly?
>>>>> 
>>>>> f) What does new C++ language support offer?
>>>>> 
>>>>> g) What do C++ language designers suggest?
>>>> 
>>>> h) What is the list of deficiencies with current string usage?
>>>> 
>>>> 
>>>>> 
>>>>> 
>>>>> etc.
>>>>> 
>>>>> But this is best continued in a smaller group, as said.
>>>>> 
>>>>> 
>>>>> The other thing that I bring to this is vast familiarity with KiCad's 
>>>>> internal workings,
>>>>> string use cases, and goals.
>>>>> 
>>>>> Let me know if I can help.
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Dick
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~kicad-developers 
>>>>> <https://launchpad.net/~kicad-developers>
>>>>> Post to     : kicad-developers@lists.launchpad.net 
>>>>> <mailto:kicad-developers@lists.launchpad.net>
>>>>> <mailto:kicad-developers@lists.launchpad.net 
>>>>> <mailto:kicad-developers@lists.launchpad.net>>
>>>>> Unsubscribe : https://launchpad.net/~kicad-developers 
>>>>> <https://launchpad.net/~kicad-developers>
>>>>> More help   : https://help.launchpad.net/ListHelp 
>>>>> <https://help.launchpad.net/ListHelp>
>>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~kicad-developers 
>>>> <https://launchpad.net/~kicad-developers>
>>>> Post to     : kicad-developers@lists.launchpad.net 
>>>> <mailto:kicad-developers@lists.launchpad.net>
>>>> <mailto:kicad-developers@lists.launchpad.net 
>>>> <mailto:kicad-developers@lists.launchpad.net>>
>>>> Unsubscribe : https://launchpad.net/~kicad-developers 
>>>> <https://launchpad.net/~kicad-developers>
>>>> More help   : https://help.launchpad.net/ListHelp 
>>>> <https://help.launchpad.net/ListHelp>

_______________________________________________
Mailing list: https://launchpad.net/~kicad-developers
Post to     : kicad-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kicad-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Kicad-developers] 6.0 string proposal

Reply via email to