Re: readchars, seek back, and readchars again

2020-04-28 Thread Samantha McVey
uot; feature, it can be tricky to predict where you're > going to end up, because the point you're starting at depends on what > kind text you've been reading, not just the number of bytes you've > read. > > Is that making any sense? I posted a later code examp

Re: readchars, seek back, and readchars again

2020-04-28 Thread Samantha McVey
On zaterdag 25 april 2020 21:51:41 CEST Joseph Brenner wrote: > > Yary has an issue posted regarding 'display-width' of UTF-16 encoded strings: > > https://github.com/rakudo/rakudo/issues/3461 > > > > I know it might be far-fetched, but what if your UTF-8 issue and > > Yary's UTF-16 issue wer

Re: "ICU - International Components for Unicode"

2020-09-27 Thread Samantha McVey
So MoarVM uses its own database of the UCD. One nice thing is this can probably be faster than calling to the ICU to look up information of each codepoint in a long string. Secondly it implements its own text data structures, so the nice features of the UCD to do that would be difficult to use.

Bug #131383 for perl6: [IO][MOAR][REGRESSION] .readchars($size) sometimes returns $size+1 chars

2017-05-27 Thread Samantha McVey
I bisected MoarVM and the offending commit is here: https://github.com/MoarVM/MoarVM/commit/c98634cf2542874d7daa5b45f77f7de4cf04a081 From what I see, this commit did not actually cause the root bug, it just exposed it. The Unicode Database was rebuilt so that NFG_QC=False for Emoji characters,

[perl #131383] [IO][MOAR][REGRESSION] .readchars($size) sometimes returns $size+1 chars

2017-05-28 Thread Samantha McVey
I bisected MoarVM and the offending commit is here: https://github.com/MoarVM/MoarVM/commit/c98634cf2542874d7daa5b45f77f7de4cf04a081 From what I see, this commit did not actually cause the root bug, it just exposed it. The Unicode Database was rebuilt so that NFG_QC=False for Emoji characters,

[perl #130045]

2016-11-08 Thread Samantha McVey
Actually it has nothing to with <$a>, and this triggers it as well: m: ' ' ~~ m:s/ /; OUTPUT«===SORRY!=== Error while compiling ␤Null regex not allowed␤at :1␤--> ' ' ~~ m:s/ ⏏/;␤» If this is indeed a bug, this should probably be renamed.

Re: [perl #130045] AutoReply: Regex: using variable interpolation and sigspace ignores spaces

2016-11-08 Thread Samantha McVey
https://design.perl6.org/S05.html Reading this again, it seems that leading whitespace is ignored. It says: "The new :s (:sigspace) modifier causes certain whitespace sequences to be considered "significant"; they are replaced by a whitespace matching rule, <.ws>. Only whitespace sequences immed

Re: [perl #130384] AutoReply: Mo or Mn Unicode characters incorrectly combine with any other character

2016-12-23 Thread Samantha McVey
It looks like according to the Unicode grapheme things, ‘degenerates’ do not have to be accounted for in supported the spec. > Ignore degenerates. No special provisions are made to get marginally better behavior for degenerate cases that never occur in practice, such as an A followed by an Indi

#122470: [UNI] uniname("\0") returns NULL

2016-12-27 Thread Samantha McVey
I have fixed this in https://github.com/MoarVM/MoarVM/pull/469 There are already tests, but once this is accepted this issue should be closed. Cheers.

#122471: [UNI] uniname("\x[80]") returns empty string

2016-12-27 Thread Samantha McVey
I have fixed this in https://github.com/MoarVM/MoarVM/pull/469 There are already tests, but once this is accepted this issue should be closed. Cheers. signature.asc Description: This is a digitally signed message part.

#129878: [BUG][UNI] Grapheme boundaries not recalculated for string repetition

2016-12-27 Thread Samantha McVey
What is going on here is not a bug in string repetition, but a bug in converting from List to a Str object. say ("\c[REGIONAL INDICATOR SYMBOL LETTER G]" xx 2).elems #> 2 say ("\c[REGIONAL INDICATOR SYMBOL LETTER G]" xx 2)[0].ords #> 127468 say ("\c[REGIONAL INDICATOR SYMBOL LETTER G]" xx 2)[0].c

[perl #117683] [UNI] Several unicode char (nick)names unrecognized

2017-01-13 Thread Samantha McVey
I have fixed it on the JVM as of NQP commit: # Fix RT #117683 on JVM \c[LINE FEED] \c[CARRIAGE RETURN] #Also fixes \c[NEXT LINE] as well. https://github.com/perl6/nqp/commit/0c249e7236a63325e6440df55a762a4378e6e63a Fixed on MoarVM as of MoarVM commit: # Fix RT #117683 \c[LINE FEED] \c[CARRIAGE

Re: [perl #130542] AutoReply: \c[BELL] returns the U+0007 control code not U+1F514 BELL

2017-01-13 Thread Samantha McVey
This has been fixed on MoarVM as of https://github.com/MoarVM/MoarVM/commit/816186484b5cc52f9ff1be6afa3b6f49264335bf BELL now resolves to 🔔 U+1F514 on MoarVM, but this is still broken on the JVM

#130549: [UNI] <:Digit> apparently matches anything

2017-01-13 Thread Samantha McVey
This seems a little different than https://rt.perl.org/Ticket/Display.html?id=130483 Digit resolves to the Numeric_Type property, whose uniprop-int value is 0 for non-numbers. <:Digit> and <:Numeric_Type> both match everything. Will need more investigation.

Re: [perl #130542] AutoReply: \c[BELL] returns the U+0007 control code not U+1F514 BELL

2017-01-14 Thread Samantha McVey
On Saturday, 14 January 2017 02.06.57 PST you wrote: > > BELL now resolves to 🔔 U+1F514 on MoarVM, but this is still broken on the > > JVM > > What causes this kind of difference? > > > U+0007's Unicode 1 name was BELL, and with version 2 the name was removed. Unicode 1 names are essentiall

#127048: [UNI] Emoji sequences with ZERO WIDTH JOINER counted as separate chars when they probably shouldn't

2017-01-16 Thread Samantha McVey
There is a new roast test, S15-nfg/emoji-test.t We used to fail 1500+/1943 tests, but now we only fail 275 tests. Will keep this open until we are passing all the Emoji tests which contain ZWJ characters

Re: [perl #130638] [LTA] X::Seq don't say which Seq the exception occurred on

2017-01-25 Thread Samantha McVey
On Wednesday, 25 January 2017 01.45.59 PST you wrote: > On Tue, 24 Jan 2017 23:15:32 -0800, samant...@posteo.net wrote: > > CODE: > > my Seq $thing = (1,3,4).Seq; $thing.iterator; $thing.iterator > > > > STDERR: > > This Seq has already been iterated, and its values consumed > > (you might solve t

Re: [perl #130710] Cannot use hyper operator on $_ by doing .».method

2017-02-03 Thread Samantha McVey
The second code should have been: $_ = (' a ', ' b '); .».trim.perl.say which does have the error message.

[perl #131383] [IO][MOAR][REGRESSION] .readchars($size) sometimes returns $size+1 chars

2017-05-28 Thread Samantha McVey via RT
I bisected MoarVM and the offending commit is here: https://github.com/MoarVM/MoarVM/commit/c98634cf2542874d7daa5b45f77f7de4cf04a081 >From what I see, this commit did not actually cause the root bug, it just >exposed it. The Unicode Database was rebuilt so that NFG_QC=False for Emoji characters

[perl #128875] [BUG] ignoremark + ignorecase ignores everything but first letter

2017-06-07 Thread Samantha McVey via RT
On Mon, 08 Aug 2016 17:34:57 -0700, timo wrote: > to be more precise, the way we code-gen "literal" qregex nodes with > subtype "ignoremark+ignorecase" will only ever check the ordbaseat of > the first character in the literal against the haystack. > This has been fixed as of https://github.com/

[perl #125813] [UNI] Malformed UTF-8 (string out of bounds) with “say ('a' x 10000).IO.open”

2017-06-08 Thread Samantha McVey via RT
> Result: > Malformed UTF-8 at line 1 col 1029 > in block at ./test.pl:2

[perl #130384] [UNI] degenerates: Mo or Mn Unicode characters combine with punctuation

2017-07-15 Thread Samantha McVey via RT
Bug has been open a while, and I have not forgotten it, I had just not reached a final decision. After further thought I'm closing this WONTFIX. It would needlessly complicate our grapheme concatenation and in addition I believe it may break some of the grapheme concatenation tests.

[perl #131881] [REGRESSION] JSON::Tiny tests output bad text on latest build

2017-08-14 Thread Samantha McVey via RT
I have fixed this as of this MoarVM commit: https://github.com/MoarVM/MoarVM/commit/712cff3341270362b808ba0f4c519f4557a4671d Full explaination in the commit description. Thanks a lot for reporting this bug :)

[perl #129878] [TESTNEEDED][UNI] Grapheme boundaries not recalculated for string repetition

2017-10-03 Thread Samantha McVey via RT
On Thu, 07 Sep 2017 09:52:07 -0700, sml...@gmail.com wrote: > On Wed, 06 Sep 2017 15:20:17 -0700, coke wrote: > > With a recent rakudo, these now both output 1 > > Bisectable shows that it was fixed during recent MoarVM changes: > > https://gist.github.com/Whateverable/01a82d07e8009c7beffe5893432

[perl #129878] [BUG][UNI] Grapheme boundaries not recalculated for string repetition

2016-12-27 Thread Samantha McVey via RT
On Fri, 14 Oct 2016 11:06:54 -0700, c...@cpan.org wrote: > Cf > > say ("\c[REGIONAL INDICATOR SYMBOL LETTER G]" x 2).chars #=> 2 > > vs > > say ([~] "\c[REGIONAL INDICATOR SYMBOL LETTER G]" xx 2).chars #=> 1 > What is going on here is not a bug in string repetition, but a bug in conv

[perl #129878] [BUG][UNI] Stringifying a List adds in spaces between each item

2016-12-27 Thread Samantha McVey via RT
Actually this is not a bug at all, and it is not limited to those characters. If you do ('a' xx 2).chars you will get 3 as well. If you want to join the list after you create it: say ('a' xx 2).join.chars #> 3 Rejecting.