I don’t believe that’s the case. Neither of the two crashes that I tracked down involved direct index access (or any change to the string). One was calling foo.IsEmpty(), and the other foo.Length(). Both use const iterators under the hood.
When wxUSE_UNICODE_UTF8 is off, wxWidgets gets around parsing the string by just not doing it: Remarks Note that while the behaviour of wxString <https://docs.wxwidgets.org/trunk/classwx_string.html> when wxUSE_UNICODE_WCHAR==1 resembles UCS-2 encoding, it's not completely correct to refer to wxString <https://docs.wxwidgets.org/trunk/classwx_string.html> as UCS-2 encoded since you can encode code points outside the BMP in a wxString <https://docs.wxwidgets.org/trunk/classwx_string.html> as two code units (i.e. as a surrogate pair; as already mentioned however wxString <https://docs.wxwidgets.org/trunk/classwx_string.html> will "see" them as two different code points) > On 3 May 2019, at 16:06, John Beard <john.j.be...@gmail.com> wrote: > > Hi Jeff, > > I think it is the index access operator that performs this caching, to allow > you to access the n'th code point any number of times while only iterating > the string only once. > > However, you can still use the iterator access safely. It is only index based > access that is cached and thread-unsafe. > > This is what the wxString documention recommends. Furthermore, in any Unicode > string, regardless of encoding (8, 16, 32), index access is almost entirely > useless anyway, as code units/points are only indirectly related to glyphs > and/or perceived characters anyway. If you need to parse a Unicode string, > you must iterate from the start. There is no way around it. > > If we're crashing due to cross thread access by index, the bug is probably > that we access the string by index at all. If this was accessed by iterator, > cross thread, and the string is not changed, it's fine. If the string is > changed in another thread, cached iterators are invalid (same as if you > change an C++ container in a single thread. The standard tells you what > iterators are invalidated for each operation on a container). > > I may have got the wrong end of the wxStick here (I can't check it for myself > right now), but as far as I can tell, this is fixable by just never caching > indices, as if we were looking at a C-style char array, and using iterators > instead. > > We should probably also turn off the unsafe string conversions by defining > wxNO_UNSAFE_WXSTRING_CONV, if it is not already define. > > Cheers, > > John > > On 3 May 2019 16:35:30 CEST, Jeff Young <j...@rokeby.ie> wrote: > Yes, we know exactly why it crashes: in order to speed up iterator access > each iterator keeps a pointer into the last location accessed (so that i-1 > and i+1 can be fast). These pointers are kept in a linked-list. Adding and > removing pointers from this list is not thread-protected. > > Note that wxWidgets will add/remove a pointer even for something seemingly > innocuous like an Empty() check. So doing mutex locks on our side for > non-const iterator access is not sufficient. > > The worst part is that since two threads collide on the same string only > rarely, we don’t even know how many of these bugs we have. We’ve fixed 3 or > 4 of them (by adding our own mutex checking on any access), but are there 0 > or 10 more? Haven’t a clue. > >>> It is between sad and breath taking. > > Indeed. > > Cheers, > Jeff. > >> On 3 May 2019, at 15:16, Dick Hollenbeck <d...@softplc.com >> <mailto:d...@softplc.com>> wrote: >> >> Thanks Jeff. >> >> On 5/3/19 4:22 AM, Jeff Young wrote: >>> Hi Dick, >>> >>>>> h) What is the list of deficiencies with current string usage? >>> >>> I only have one issue with the current use of wxString, but it’s a big one: >>> it crashes >>> (unpredictably) when used multi-threaded in UTF8 mode. >> >> The fact that it is onely *One* issue is an important data point. >> >> Since you know it is crashing in this class, you must know approximately >> where, and under >> what kind of read/write activity. Of course, if read activity triggers a >> lazy (deferred) >> transformation, then this distinction can get blurred. But more information >> on source >> file locations would be very helpful to me. >> >> Another important data point you brought is that the wx library designers >> are advising >> against using wxString for core application. It will take a couple of hours >> to even >> contemplate that, it is basically staggering to me. It is between sad and >> breath taking. >> Sounds like they designed themselves into a corner and are now acknowledging >> that what >> they designed is more of an API commitment that they want to disavow than a >> real solution. >> >> I can see where that can happen. Superior designs come from experience. >> Experience comes >> with usage and time, neither of which are always available up front. >> >> >> >> >> >>> >>> This design document makes for fascinating >>> reading: https://wiki.wxwidgets.org/Development:_UTF-8_Support >>> <https://wiki.wxwidgets.org/Development:_UTF-8_Support>. It appears that >>> the >>> current wxString is at least in part modelled on QtString. >>> >>> There’s also a bunch of interesting info >>> here: https://docs.wxwidgets.org/trunk/overview_string.html >>> <https://docs.wxwidgets.org/trunk/overview_string.html>, which I believe is >>> more >>> up-to-date than the previous link. In particular, there’s the mention that >>> wxString >>> handles extra-BMP characters transparently when compiled in UTF8 mode >>> (currently used by >>> Kicad), but does NOT when compiled in default mode (in which case the app >>> must handle >>> surrogate pairs). This of course directly leads to your point (d): >>> >>>>>> d) What does the set of characters that don't fall into UCS2 actually >>>>>> look like? How big >>>>>> is this set, really? (UTF16 is bigger than UCS2 and picks up the >>>>>> difference.) >>> >>> Do we really need to handle extra-BMP characters? >>> >>> An even more recent version of the second document >>> (https://docs.wxwidgets.org/trunk/classwx_string.html >>> <https://docs.wxwidgets.org/trunk/classwx_string.html>) finally makes an >>> oblique reference >>> to the multi-threading issue by starting with this (rather unhelpful) >>> suggestion: >>> >>> Note >>> While the use of wxString >>> <https://docs.wxwidgets.org/trunk/classwx_string.html >>> <https://docs.wxwidgets.org/trunk/classwx_string.html>> is >>> unavoidable in wxWidgets program, you are encouraged to use the standard >>> string >>> classes |std::string| or |std::wstring| in your applications and convert >>> them to and >>> from wxString <https://docs.wxwidgets.org/trunk/classwx_string.html >>> <https://docs.wxwidgets.org/trunk/classwx_string.html>> only when >>> interacting with wxWidgets. >>> >>> >>> Cheers, >>> Jeff. >>> >>> >>>> On 3 May 2019, at 02:03, Dick Hollenbeck <d...@softplc.com >>>> <mailto:d...@softplc.com> <mailto:d...@softplc.com >>>> <mailto:d...@softplc.com>>> wrote: >>>> >>>> On 5/2/19 5:32 PM, Dick Hollenbeck wrote: >>>>> On 4/30/19 4:36 AM, Jeff Young wrote: >>>>>> We had talked earlier about throwing the wxWidgets UTF8 compile switch >>>>>> to get rid of >>>>>> our wxString re-entrancy problems. However, I noticed that the 6.0 work >>>>>> packages doc >>>>>> includes an item for std::string-ization of the BOARD. (While a lot >>>>>> more work, this >>>>>> is a better solution because it also increases our gui-toolkit-choice >>>>>> flexibility.) >>>>>> >>>>>> I’d like to propose that we use std::wstring for that. UTF8 should >>>>>> *only* be an >>>>>> encoding format (similar to s-expr). It should never be used >>>>>> internally. That’s what >>>>>> unicode wchar_t’s are for. >>>>>> >>>>>> And I’d like to propose that we extend std::wstring-ization to SCH_ITEM >>>>>> and LIB_ITEM. >>>>>> (Then we can get rid of a bunch of our ugly mutex hacks.) >>>>> >>>>> >>>>> I've been looking at this for a few months now. I think it is so >>>>> important, that a >>>>> sub-committee should be formed, and if that committee takes as long as 4 >>>>> months to come to >>>>> a recommendation, this would not be too long. This issue is simply too >>>>> critical. >>>>> >>>>> I would like to volunteer to be on that committee. For the entire list >>>>> to participate in >>>>> this simply does not make sense to me. I would welcome the opportunity >>>>> to study this with >>>>> a team of 5-6 players. More than that probably leads to anxiety. Then, >>>>> given the >>>>> recommendations, the list would of course have an opportunity to raise >>>>> questions and take >>>>> shots, before a strategy is formulated, and before anything is >>>>> implemented. >>>>> >>>>> Again, approximately: >>>>> >>>>> committee recommendations -> list approval -> strategy formulation -> >>>>> implementation >>>>> >>>>> >>>>> Up to now I have looked at many libraries and have [way *too* much] >>>>> experience in multiple >>>>> languages on multiple platforms, so I think I can be valuable contributor. >>>>> >>>>> The final work product initially would simply be a list of >>>>> recommendations, that quickly >>>>> transforms to a strategy thereafter. This is an enormous undertaking, so >>>>> I suggest >>>>> against racing to a solution. It could look a lot easier than it will >>>>> ultimately be, as >>>>> is typical in software development. But the return on investment needs >>>>> to be near optimal >>>>> in the end. >>>>> >>>>> Some questions to answer are: >>>>> >>>>> a) How did wxString get to its current state? Is is merely a >>>>> conglomeration of after >>>>> thought, or is is anywhere near optimal. >>>>> >>>>> b) Why so many forms of it? Can one form be chosen for all platforms? >>>>> >>>>> c) How does wxString it compare to QtString? >>>>> >>>>> d) What does the set of characters that don't fall into UCS2 actually >>>>> look like? How big >>>>> is this set, really? (UTF16 is bigger than UCS2 and picks up the >>>>> difference.) >>>>> >>>>> e) For data files, I think UTF8 is fine. So the change is for RAM >>>>> manipulation of >>>>> strings. Aren't we talking about a RAM resident string that bridges into >>>>> the GUI >>>>> seamlessly? >>>>> >>>>> f) What does new C++ language support offer? >>>>> >>>>> g) What do C++ language designers suggest? >>>> >>>> h) What is the list of deficiencies with current string usage? >>>> >>>> >>>>> >>>>> >>>>> etc. >>>>> >>>>> But this is best continued in a smaller group, as said. >>>>> >>>>> >>>>> The other thing that I bring to this is vast familiarity with KiCad's >>>>> internal workings, >>>>> string use cases, and goals. >>>>> >>>>> Let me know if I can help. >>>>> >>>>> Regards, >>>>> >>>>> Dick >>>>> >>>>> >>>>> _______________________________________________ >>>>> Mailing list: https://launchpad.net/~kicad-developers >>>>> <https://launchpad.net/~kicad-developers> >>>>> Post to : kicad-developers@lists.launchpad.net >>>>> <mailto:kicad-developers@lists.launchpad.net> >>>>> <mailto:kicad-developers@lists.launchpad.net >>>>> <mailto:kicad-developers@lists.launchpad.net>> >>>>> Unsubscribe : https://launchpad.net/~kicad-developers >>>>> <https://launchpad.net/~kicad-developers> >>>>> More help : https://help.launchpad.net/ListHelp >>>>> <https://help.launchpad.net/ListHelp> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Mailing list: https://launchpad.net/~kicad-developers >>>> <https://launchpad.net/~kicad-developers> >>>> Post to : kicad-developers@lists.launchpad.net >>>> <mailto:kicad-developers@lists.launchpad.net> >>>> <mailto:kicad-developers@lists.launchpad.net >>>> <mailto:kicad-developers@lists.launchpad.net>> >>>> Unsubscribe : https://launchpad.net/~kicad-developers >>>> <https://launchpad.net/~kicad-developers> >>>> More help : https://help.launchpad.net/ListHelp >>>> <https://help.launchpad.net/ListHelp>
_______________________________________________ Mailing list: https://launchpad.net/~kicad-developers Post to : kicad-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~kicad-developers More help : https://help.launchpad.net/ListHelp