Re: [Rd] R-4.3 version list.files function could not work correctly in chinese

2023-08-16 Thread Tomas Kalibera



On 8/15/23 16:00, Tomas Kalibera wrote:


On 8/15/23 09:04, Ivan Krylov wrote:

В Tue, 15 Aug 2023 08:38:11 +0200
Tomas Kalibera  пишет:


As this was reported to be regression in 4.3, it is entirely possible
this change came with a regression (though a bit surprising we didn't
catch it earlier by testing), so it would be a great help if I could
have the example and debug it.

Sorry, let me try to be more clear.

The Windows filename length limit is 255(?) wide characters. The
WIN32_FIND_DATAA structure contains a 260-byte buffer for the filename
to be returned by FindFirstFileA()/FindNextFileA(). If a wide character
takes more than one byte to be represented in UTF-8, it may overflow
the 260 byte limit in the WIN32_FIND_DATAA structure despite being
below the 260 wide character limit. When such an overflow happens,
FindNextFile() returns FALSE with GetLastError() == ERROR_MORE_DATA,
which results in R_readdir() returning NULL and makes list_files() stop
before listing the rest of the directory.

This is easier to make happen by accident with Chinese characters,
because they take three UTF-8 bytes per character.

Take the ø (\uf8) letter. It takes two bytes to represent in UTF-8.
Create a file with a name consisting of this symbol repeated 140 times.
When you run list.files() on the resulting directory on Windows with a
UTF-8 locale, Windows tries to fit (0xc3 0xb8) times 140 into a
260-byte buffer, which doesn't work. I'm afraid the only way to avoid
such a failure is to rewrite R_readdir using the wide character API and
convert the file names on the fly. (Just like mingw readdir() did in
the past?)

stopifnot(.Platform$OS.type == 'windows', l10n_info()$`UTF-8`)
# any character for which nchar(enc2utf8(.), 'bytes') > 1 will do
# any number >260/2 should do
file.create(strrep('\uf8', 140))
list.files()

Does this work? I don't have access to a UTF-8 Windows machine right
now.


Thanks, yes, I can reproduce the problem. Some Windows functions 
impose 260 wide characters limit, but other 260 bytes limit, so one 
can create a file with a name too long to be found by FindNextFileA.


In R 4.2, we used readdir() from mingw-w64, which itself used 
findnext, which however had the same problem, it used a buffer of size 
260 bytes and from the code of mingw-w64 and the Windows 
documentation, it should have behaved the same, it should have stopped 
the search on such a long file name. However, in my use case, R 4.2.3 
crashed inside findnext due to stack overrun, R 4.1.3 worked, but 
clearly it would require a different use case to overrun this buffer 
as it didn't use UTF-8. This suggests that findnext didn't have a 
check for this and hence caused memory corruption, which can lead to a 
crash or work by coincidence. Which could have been the case for the 
user reporting this as a regression compared to R 4.2. But it is not a 
regression, the problem existed for long.


So, yes, we'd probably have to use wide variants of 
FindNext/FindFirst. I'll fix.


Fixed in R-devel (84960). Please let me know if you see any problem with 
the fix.


Thanks,
Tomas



Thanks for debugging this,
Tomas





__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Heads up about documentation-related reports

2023-08-16 Thread Martin Maechler
> Elio Campitelli 
> on Mon, 14 Aug 2023 17:42:42 -0300 writes:

> For the upcoming R Sprint I'm proposing a project to improve documentation
> 
.

That's good, thank you.

However, do concentrate on the existing bugzilla issues
   
https://contributor.r-project.org/r-project-sprint-2023/projects/documentation/#list-of-bugs

(and can you *PLEASE* (change the CSS or ?? to) make the table
 of relevant bugzilla entries wider so it becomes readable ?!) 

> Part of the project will include several small related reports; for
> instance, trying to improve examples of many functions (here's a list of
> some of the possible reports
> 

> ).

> Would it be better to send one single big report or many small reports?

> Cheers,
> Elio

{Do you mean bugzilla bug reports? 
 In that case, maybe rather  *none* [see also below]:
 If you do not like an example that is definitely not a bug in R!
 ---> https://www.r-project.org/bugs.html  "What is a bug"

 Also:  Every bugzilla report (and comment) creates an e-mail to
all of R core.  Yes, we can sort / pre-filter /
.. e-mails, but still
}

Well, as co-author of many of R's help page examples, I must say
that "improving" an example needs to have a well defined notion
of "bad - better - good" etc.
In my opinion much of that is a matter of taste rather than objectivity.

As an R core member I'd not like you to propose changing
examples I or others had chosen to be "funny" , "cute",
"special", or "thought provoking" ... just because other people
think that such examples should be as simple (and boring) as possible.

I hope the helpers at the upcoming R Sprint will concentrate on improving R
by following what R core member Luke Tierney and Tomas Kalibera
wrote in their two R blogs:
 ==> https://blog.r-project.org/
   and look for the 2 blog entries with  "Reviewing Bug Reports"
   in their title.


I'm sorry if the above does not sound encouraging..
I hope it still does encourage to rather concentrate on helping
to make R better by reviewing bugs, fixing bugs, exploring
problems, etc. 

With regards,
Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Concerns with SVD -- and the Matrix Exponential

2023-08-16 Thread Martin Maechler
> Durga Prasad G me14d059 
> on Wed, 16 Aug 2023 13:36:10 +0530 writes:

> Dear Martin, I am getting different responses from different officials of
> R-Software, 

well, well, ..
Here on R-devel, we got two messages in addition to mine,
none by any "official" (even though John C Nash probably gets
the title of most senior professor still active on the R mailing list),
but both to the point,
notably Aidan Lakshman below showing you how you were unaware of
several things about the SVD, and confusing
positive semi definite matrices with arbitrary symmetric matrices.

> but those statements are contradicting with the statements
> discussed in your email. 

I don't think so.  I think we all agreed (John Nash, me, Aidan Lakshman),
even though focussing on different aspects of your partly
incorrect claims.

Martin Maechler



> Kindly go through the previous files and emails,
> and respond. I personally think, together we can fix the issue which is
> observed in SVD.

> Thanks and regards
> Durga Prasad

> On Tue, Aug 1, 2023 at 4:51 PM Lakshman, Aidan H  wrote:

>> Hi Durga,
>> 
>> There’s an error in your calculations here. You mention that for the SVD
>> of a symmetric matrix, we must have U=V, but this is not a correct
>> statement. The unitary matrices are only equivalent if the matrix A is
>> positive semidefinite.
>> 
>> In your example, you provide the matrix {{1,4},{4,1}}, which has
>> eigenvalues 5 and -3. This is not positive semidefinite, and thus there's
>> no requirement that the unitary matrices be equivalent.
>> 
>> If you verify your example with something like wolfram alpha, you’ll find
>> that R’s solution is correct.
>> 
>> -Aidan
>> 
>> ---
>> 
>> Aidan Lakshman (he/him) 
>> 
>> Doctoral Fellow, Wright Lab 
>> 
>> University of Pittsburgh School of Medicine
>> 
>> Department of Biomedical Informatics
>> 
>> ah...@pitt.edu
>> 
>> (724) 612-9940
>> 
>> 
>> 
>> --
>> *From:* R-devel  on behalf of Durga Prasad
>> G me14d059 
>> *Sent:* Tuesday, August 1, 2023 4:18:20 AM
>> *To:* Martin Maechler ; r-devel@r-project.org
>> ; profjcn...@gmail.com 
>> *Subject:* Re: [Rd] Concerns with SVD -- and the Matrix Exponential
>> 
>> Hi Martin, Thank you for your reply. The response and the links provided 
by
>> you helped to learn more. But I am not able to obtain the simple even
>> powers of a matrix: one simple case is the square of a matrix. The square
>> of the matrix using direct matrix multiplication operations and svd (A = 
U
>> D V') are different. Kindly check the attached file for the complete
>> explanation. I want to know which technique was used in building the svd 
in
>> R-Software. I want to discuss about svd if you schedule a meeting.
>> 
>> Thanks and Regards
>> Durga Prasad
>> 
>> 
>> On Mon, Jul 17, 2023 at 2:13 PM Martin Maechler <
>> maech...@stat.math.ethz.ch>
>> wrote:
>> 
>> > > J C Nash
>> > > on Sun, 16 Jul 2023 13:30:57 -0400 writes:
>> >
>> > > Better check your definitions of SVD -- there are several
>> > > forms, but all I am aware of (and I wrote a couple of the
>> > > codes in the early 1970s for the SVD) have positive
>> > > singular values.
>> >
>> > > JN
>> >
>> > Indeed.
>> >
>> > More generally, the decomposition A = U D V'
>> > (with diagonal D and orthogonal U,V)
>> > is not at all unique.
>> >
>> > There are not only many possible different choices of the sign
>> > of the diagonal entries, but also the *ordering* of the singular values
>> > is non unique.
>> > That's why R and 'Lapack', the world-standard for
>> >   computer/numerical linear algebra, and others I think,
>> > make the decomposition unique by requiring
>> > non-negative entries in D and have them *sorted* decreasingly.
>> >
>> > The latter is what the help page   help(svd)  always said
>> > (and you should have studied that before raising such concerns).
>> >
>> > -
>> >
>> > To your second point (in the document), the matrix exponential:
>> > It is less known, but still has been known among experts for
>> > many years (and I think even among students of a class on
>> > numerical linear algebra), that there are quite a
>> > few mathematically equivalent ways to compute the matrix exponential,
>> > *BUT* that most of these may be numerically disastrous, for several
>> > different reasons depending on the case.
>> >
>> > This has been known for close to 50 years now:
>> >
>>

Re: [Rd] Concerns with SVD -- and the Matrix Exponential

2023-08-16 Thread Durga Prasad G me14d059
Dear Martin, I am getting different responses from different officials of
R-Software, but those statements are contradicting with the statements
discussed in your email. Kindly go through the previous files and emails,
and respond. I personally think, together we can fix the issue which is
observed in SVD.

Thanks and regards
Durga Prasad

On Tue, Aug 1, 2023 at 4:51 PM Lakshman, Aidan H  wrote:

> Hi Durga,
>
> There’s an error in your calculations here. You mention that for the SVD
> of a symmetric matrix, we must have U=V, but this is not a correct
> statement. The unitary matrices are only equivalent if the matrix A is
> positive semidefinite.
>
> In your example, you provide the matrix {{1,4},{4,1}}, which has
> eigenvalues 5 and -3. This is not positive semidefinite, and thus there's
> no requirement that the unitary matrices be equivalent.
>
> If you verify your example with something like wolfram alpha, you’ll find
> that R’s solution is correct.
>
> -Aidan
>
> ---
>
> Aidan Lakshman (he/him) 
>
> Doctoral Fellow, Wright Lab 
>
> University of Pittsburgh School of Medicine
>
> Department of Biomedical Informatics
>
> ah...@pitt.edu
>
> (724) 612-9940
>
>
>
> --
> *From:* R-devel  on behalf of Durga Prasad
> G me14d059 
> *Sent:* Tuesday, August 1, 2023 4:18:20 AM
> *To:* Martin Maechler ; r-devel@r-project.org
> ; profjcn...@gmail.com 
> *Subject:* Re: [Rd] Concerns with SVD -- and the Matrix Exponential
>
> Hi Martin, Thank you for your reply. The response and the links provided by
> you helped to learn more. But I am not able to obtain the simple even
> powers of a matrix: one simple case is the square of a matrix. The square
> of the matrix using direct matrix multiplication operations and svd (A = U
> D V') are different. Kindly check the attached file for the complete
> explanation. I want to know which technique was used in building the svd in
> R-Software. I want to discuss about svd if you schedule a meeting.
>
> Thanks and Regards
> Durga Prasad
>
>
> On Mon, Jul 17, 2023 at 2:13 PM Martin Maechler <
> maech...@stat.math.ethz.ch>
> wrote:
>
> > > J C Nash
> > > on Sun, 16 Jul 2023 13:30:57 -0400 writes:
> >
> > > Better check your definitions of SVD -- there are several
> > > forms, but all I am aware of (and I wrote a couple of the
> > > codes in the early 1970s for the SVD) have positive
> > > singular values.
> >
> > > JN
> >
> > Indeed.
> >
> > More generally, the decomposition A = U D V'
> > (with diagonal D and orthogonal U,V)
> > is not at all unique.
> >
> > There are not only many possible different choices of the sign
> > of the diagonal entries, but also the *ordering* of the singular values
> > is non unique.
> > That's why R and 'Lapack', the world-standard for
> >   computer/numerical linear algebra, and others I think,
> > make the decomposition unique by requiring
> > non-negative entries in D and have them *sorted* decreasingly.
> >
> > The latter is what the help page   help(svd)  always said
> > (and you should have studied that before raising such concerns).
> >
> > -
> >
> > To your second point (in the document), the matrix exponential:
> > It is less known, but still has been known among experts for
> > many years (and I think even among students of a class on
> > numerical linear algebra), that there are quite a
> > few mathematically equivalent ways to compute the matrix exponential,
> > *BUT* that most of these may be numerically disastrous, for several
> > different reasons depending on the case.
> >
> > This has been known for close to 50 years now:
> >
> >  Cleve Moler and Charles Van Loan  (1978)
> >  Nineteen Dubious Ways to Compute the Exponential of a Matrix
> >  SIAM Review Vol. 20(4)
> >
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoi.org%2F10.1137%2F1020098&data=05%7C01%7Cahl27%40pitt.edu%7C8575b77db32345ca544b08db927ceae0%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C638264837816871329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Y4mlFL%2FggLKd7FoIoY62esiFGUwukRG0YmELsJj7nd0%3D&reserved=0
> 
> >
> > Where as that publication had been important and much cited at
> > the time, the same authors (known world experts in the field)
> > wrote a review of that review 25 years later which I think (and
> > hope) is even more widely cited  (in R's man/*.Rd syntax) :
> >
> >   Cleve Moler and Charles Van Loan (2003)
> >   Nineteen dubious ways to compute the exponential of a matrix,
> >   twenty-five years later. \emph{SIAM Review} \bold{45}, 1, 3--49.
> >   \doi{10.1137/S00361445024180}
> > i.e.
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoi.org%2F10.1137%2FS00361445024180&data=05%7C01%7Cahl27%40pitt.edu%7C8575b77d

Re: [Rd] R-4.3 version list.files function could not work correctly in chinese

2023-08-16 Thread yu gong
a little more information for this issue.
Search in MS website today , found doc about "Maximum Path Length Limitation", 
Maximum Path Length Limitation - Win32 apps | Microsoft 
Learn
 .
According the doc, need to do two things to avoid this issue on window 10  and 
latter:
1 edit registry or group policy  set
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem] 
"LongPathsEnabled"=dword:0001

2 app manifest (R already done it)

Regards,
yu


From: R-devel  on behalf of Tomas Kalibera 

Sent: Wednesday, August 16, 2023 15:42
To: Ivan Krylov 
Cc: r-devel@r-project.org 
Subject: Re: [Rd] R-4.3 version list.files function could not work correctly in 
chinese


On 8/15/23 16:00, Tomas Kalibera wrote:
>
> On 8/15/23 09:04, Ivan Krylov wrote:
>> �� Tue, 15 Aug 2023 08:38:11 +0200
>> Tomas Kalibera  ��ڬ�֬�:
>>
>>> As this was reported to be regression in 4.3, it is entirely possible
>>> this change came with a regression (though a bit surprising we didn't
>>> catch it earlier by testing), so it would be a great help if I could
>>> have the example and debug it.
>> Sorry, let me try to be more clear.
>>
>> The Windows filename length limit is 255(?) wide characters. The
>> WIN32_FIND_DATAA structure contains a 260-byte buffer for the filename
>> to be returned by FindFirstFileA()/FindNextFileA(). If a wide character
>> takes more than one byte to be represented in UTF-8, it may overflow
>> the 260 byte limit in the WIN32_FIND_DATAA structure despite being
>> below the 260 wide character limit. When such an overflow happens,
>> FindNextFile() returns FALSE with GetLastError() == ERROR_MORE_DATA,
>> which results in R_readdir() returning NULL and makes list_files() stop
>> before listing the rest of the directory.
>>
>> This is easier to make happen by accident with Chinese characters,
>> because they take three UTF-8 bytes per character.
>>
>> Take the �� (\uf8) letter. It takes two bytes to represent in UTF-8.
>> Create a file with a name consisting of this symbol repeated 140 times.
>> When you run list.files() on the resulting directory on Windows with a
>> UTF-8 locale, Windows tries to fit (0xc3 0xb8) times 140 into a
>> 260-byte buffer, which doesn't work. I'm afraid the only way to avoid
>> such a failure is to rewrite R_readdir using the wide character API and
>> convert the file names on the fly. (Just like mingw readdir() did in
>> the past?)
>>
>> stopifnot(.Platform$OS.type == 'windows', l10n_info()$`UTF-8`)
>> # any character for which nchar(enc2utf8(.), 'bytes') > 1 will do
>> # any number >260/2 should do
>> file.create(strrep('\uf8', 140))
>> list.files()
>>
>> Does this work? I don't have access to a UTF-8 Windows machine right
>> now.
>
> Thanks, yes, I can reproduce the problem. Some Windows functions
> impose 260 wide characters limit, but other 260 bytes limit, so one
> can create a file with a name too long to be found by FindNextFileA.
>
> In R 4.2, we used readdir() from mingw-w64, which itself used
> findnext, which however had the same problem, it used a buffer of size
> 260 bytes and from the code of mingw-w64 and the Windows
> documentation, it should have behaved the same, it should have stopped
> the search on such a long file name. However, in my use case, R 4.2.3
> crashed inside findnext due to stack overrun, R 4.1.3 worked, but
> clearly it would require a different use case to overrun this buffer
> as it didn't use UTF-8. This suggests that findnext didn't have a
> check for this and hence caused memory corruption, which can lead to a
> crash or work by coincidence. Which could have been the case for the
> user reporting this as a regression compared to R 4.2. But it is not a
> regression, the problem existed for long.
>
> So, yes, we'd probably have to use wide variants of
> FindNext/FindFirst. I'll fix.

Fixed in R-devel (84960). Please let me know if you see any problem with
the fix.

Thanks,
Tomas

>
> Thanks for debugging this,
> Tomas
>
>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-4.3 version list.files function could not work correctly in chinese

2023-08-16 Thread Ivan Krylov
On Wed, 16 Aug 2023 09:42:09 +0200
Tomas Kalibera  wrote:

> Fixed in R-devel (84960). Please let me know if you see any problem
> with the fix.

Thank you for implementing the fix! I gave 叶月光 the link to the
GitHub Action build of the r84960 installer.

I'm worried that 叶月光 was seeing FindNextFileA fail for a different
reason (all the examples given at the Capital of Statistics forum
seemed to use less than 256/4 = 64 characters per file name...), but
maybe this won't reappear with the switch to FindNextFileW. If this
keeps happening, it might be worth producing a warning when
FindNextFileW() fails with an unexpected GetLastError() value.

fs::dir_fs() uses NtQueryDirectoryFile() and WideCharToMultiByte()
instead of FindNextFileW() and wcstombs(), but maybe this shouldn't
matter. In particular, both list.files() and fs::dir_fs() would fail
given a file name that cannot be represented in UTF-8 (invalid UTF-16
surrogate pairs?)

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-4.3 version list.files function could not work correctly in chinese

2023-08-16 Thread Tomas Kalibera


On 8/16/23 13:11, yu gong wrote:
> a little more information for this issue.
> Search in MS website today , found doc about "Maximum Path Length 
> Limitation", Maximum Path Length Limitation - Win32 apps | Microsoft 
> Learn 
> 
>  .
> According the doc, need to do two things to avoid this issue on window 
> 10  and latter:
> 1 edit registry or group policy  set 
> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem] 
> "LongPathsEnabled"=dword:0001
>
> 2 app manifest (R already done it)

These settings are for long paths (meaning a full path containing of 
multiple elements separated by backslashes), more about that is also in 
[1].


But the problem that Ivan reported (which is not clear whether it is the 
same problem as the one reported originally on this thread), is about 
the limit for a single file/directory name - that is, for a single 
element of a path. Having the long paths enabled in the registry 
wouldn't help with this.


These two limits are not directly related, except the obvious: by 
choosing rather long names for individual files, one usually soon runs 
out of the limit for the full path.


Best

Tomas


[1] - 
https://blog.r-project.org/2023/03/07/path-length-limit-on-windows/index.html

>
> Regards,
> yu
>
> 
> *From:* R-devel  on behalf of Tomas 
> Kalibera 
> *Sent:* Wednesday, August 16, 2023 15:42
> *To:* Ivan Krylov 
> *Cc:* r-devel@r-project.org 
> *Subject:* Re: [Rd] R-4.3 version list.files function could not work 
> correctly in chinese
>
> On 8/15/23 16:00, Tomas Kalibera wrote:
> >
> > On 8/15/23 09:04, Ivan Krylov wrote:
> >> В Tue, 15 Aug 2023 08:38:11 +0200
> >> Tomas Kalibera  пишет:
> >>
> >>> As this was reported to be regression in 4.3, it is entirely possible
> >>> this change came with a regression (though a bit surprising we didn't
> >>> catch it earlier by testing), so it would be a great help if I could
> >>> have the example and debug it.
> >> Sorry, let me try to be more clear.
> >>
> >> The Windows filename length limit is 255(?) wide characters. The
> >> WIN32_FIND_DATAA structure contains a 260-byte buffer for the filename
> >> to be returned by FindFirstFileA()/FindNextFileA(). If a wide character
> >> takes more than one byte to be represented in UTF-8, it may overflow
> >> the 260 byte limit in the WIN32_FIND_DATAA structure despite being
> >> below the 260 wide character limit. When such an overflow happens,
> >> FindNextFile() returns FALSE with GetLastError() == ERROR_MORE_DATA,
> >> which results in R_readdir() returning NULL and makes list_files() stop
> >> before listing the rest of the directory.
> >>
> >> This is easier to make happen by accident with Chinese characters,
> >> because they take three UTF-8 bytes per character.
> >>
> >> Take the ø (\uf8) letter. It takes two bytes to represent in UTF-8.
> >> Create a file with a name consisting of this symbol repeated 140 times.
> >> When you run list.files() on the resulting directory on Windows with a
> >> UTF-8 locale, Windows tries to fit (0xc3 0xb8) times 140 into a
> >> 260-byte buffer, which doesn't work. I'm afraid the only way to avoid
> >> such a failure is to rewrite R_readdir using the wide character API and
> >> convert the file names on the fly. (Just like mingw readdir() did in
> >> the past?)
> >>
> >> stopifnot(.Platform$OS.type == 'windows', l10n_info()$`UTF-8`)
> >> # any character for which nchar(enc2utf8(.), 'bytes') > 1 will do
> >> # any number >260/2 should do
> >> file.create(strrep('\uf8', 140))
> >> list.files()
> >>
> >> Does this work? I don't have access to a UTF-8 Windows machine right
> >> now.
> >
> > Thanks, yes, I can reproduce the problem. Some Windows functions
> > impose 260 wide characters limit, but other 260 bytes limit, so one
> > can create a file with a name too long to be found by FindNextFileA.
> >
> > In R 4.2, we used readdir() from mingw-w64, which itself used
> > findnext, which however had the same problem, it used a buffer of size
> > 260 bytes and from the code of mingw-w64 and the Windows
> > documentation, it should have behaved the same, it should have stopped
> > the search on such a long file name. However, in my use case, R 4.2.3
> > crashed inside findnext due to stack overrun, R 4.1.3 worked, but
> > clearly it would require a different use case to overrun this buffer
> > as it didn't use UTF-8. This suggests that findnext didn't have a
> > check for this and hence caused memory corruption, which can lead to a
> > crash or work by coincidence. Which could have been the case for the
> > user reporting this as a regression compared to R 4.2. But it is not a
> > regression, the problem existed for long.
> >
> > So, yes, we'd probably have to use wide variants of
> > FindNext/FindFirst. I'll fix.
>
> Fixed in R-devel (84960). Please let me know if you see a

Re: [Rd] R-4.3 version list.files function could not work correctly in chinese

2023-08-16 Thread Tomas Kalibera



On 8/16/23 13:22, Ivan Krylov wrote:

On Wed, 16 Aug 2023 09:42:09 +0200
Tomas Kalibera  wrote:


Fixed in R-devel (84960). Please let me know if you see any problem
with the fix.

Thank you for implementing the fix! I gave 叶月光 the link to the
GitHub Action build of the r84960 installer.

Thanks and thanks for looking at the change.


I'm worried that 叶月光 was seeing FindNextFileA fail for a different
reason (all the examples given at the Capital of Statistics forum
seemed to use less than 256/4 = 64 characters per file name...), but
maybe this won't reappear with the switch to FindNextFileW. If this
keeps happening, it might be worth producing a warning when
FindNextFileW() fails with an unexpected GetLastError() value.


I've added a warning to R-devel when list.files() on Windows stops 
listing a directory due to an error.


There is probably not more we can do unless there is a revised bug 
report of the original problem.



fs::dir_fs() uses NtQueryDirectoryFile() and WideCharToMultiByte()
instead of FindNextFileW() and wcstombs(), but maybe this shouldn't
matter. In particular, both list.files() and fs::dir_fs() would fail
given a file name that cannot be represented in UTF-8 (invalid UTF-16
surrogate pairs?)


Right, R only support file names that are valid strings, this assumption 
is present at many places in the code, so it is fine/consistent to be 
here as well. The choice of opendir/readdir in R was probably motivated 
by minimization of platform-specific code.


Best
Tomas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Concerns with SVD -- and the Matrix Exponential

2023-08-16 Thread Lakshman, Aidan H
Hi Durga,

Just to add to previous comments--if you’re interested in the implementation of 
SVD in R, I’d recommend looking at the source code and/or sending general 
questions to the community slack, which may provide faster responses and help 
with double checking your work. Most of the perceived concerns you mention are 
related to the underlying LAPACK code--if you can verify that the code in 
LAPACK and/or R is incorrect, feel free to email r-devel with examples or 
discuss it on the slack.

The R implementation of SVD is available here, which essentially just wraps 
LAPACK code:
https://github.com/r-devel/r-svn/blob/b6394a83c0c12b12b6b1aceb05db0fd66227fd30/src/modules/lapack/Lapack.c#L111

The R method that invokes the C code is called here:
https://github.com/r-devel/r-svn/blob/b6394a83c0c12b12b6b1aceb05db0fd66227fd30/src/library/base/R/LAPACK.R#L19

The related LAPACK code is located here:
https://netlib.org/lapack/explore-html/d1/d7e/group__double_g_esing_gad8e0f1c83a78d3d4858eaaa88a1c5ab1.html

You can join the R-Contributors slack here:
https://contributor.r-project.org/slack

-Aidan

---
Aidan Lakshman (he/him)
Doctoral Fellow, Wright Lab
University of Pittsburgh School of Medicine
Department of Biomedical Informatics
ah...@pitt.edu
(724) 612-9940


From: Martin Maechler 
Date: Wednesday, August 16, 2023 at 04:40
To: Durga Prasad G me14d059 
Cc: Lakshman, Aidan H , Martin Maechler 
, r-devel@r-project.org , 
profjcn...@gmail.com 
Subject: Re: [Rd] Concerns with SVD -- and the Matrix Exponential
> Durga Prasad G me14d059
> on Wed, 16 Aug 2023 13:36:10 +0530 writes:

> Dear Martin, I am getting different responses from different officials of
> R-Software,

well, well, ..
Here on R-devel, we got two messages in addition to mine,
none by any "official" (even though John C Nash probably gets
the title of most senior professor still active on the R mailing list),
but both to the point,
notably Aidan Lakshman below showing you how you were unaware of
several things about the SVD, and confusing
positive semi definite matrices with arbitrary symmetric matrices.

> but those statements are contradicting with the statements
> discussed in your email.

I don't think so.  I think we all agreed (John Nash, me, Aidan Lakshman),
even though focussing on different aspects of your partly
incorrect claims.

Martin Maechler



> Kindly go through the previous files and emails,
> and respond. I personally think, together we can fix the issue which is
> observed in SVD.

> Thanks and regards
> Durga Prasad

> On Tue, Aug 1, 2023 at 4:51 PM Lakshman, Aidan H  wrote:

>> Hi Durga,
>>
>> There’s an error in your calculations here. You mention that for the SVD
>> of a symmetric matrix, we must have U=V, but this is not a correct
>> statement. The unitary matrices are only equivalent if the matrix A is
>> positive semidefinite.
>>
>> In your example, you provide the matrix {{1,4},{4,1}}, which has
>> eigenvalues 5 and -3. This is not positive semidefinite, and thus there's
>> no requirement that the unitary matrices be equivalent.
>>
>> If you verify your example with something like wolfram alpha, you’ll find
>> that R’s solution is correct.
>>
>> -Aidan
>>
>> ---
>>
>> Aidan Lakshman (he/him) 
>
>>
>> Doctoral Fellow, Wright Lab 
>
>>
>> University of Pittsburgh School of Medicine
>>
>> Department of Biomedical Informatics
>>
>> ah...@pitt.edu
>>
>> (724) 612-9940
>>
>>
>>
>> --
>> *From:* R-devel  on behalf of Durga Prasad
>> G me14d059 
>> *Sent:* Tuesday, August 1, 2023 4:18:20 AM
>> *To:* Martin Maechler ; r-devel@r-project.org
>> ; profjcn...@gmail.com 
>> *Subject:* Re: [Rd] Concerns with SVD -- and the Matrix Exponential
>>
>> Hi Martin, Thank you for your reply. The response and the links provided 
by
>> you helped to learn more. But I am n

Re: [Rd] Concerns with SVD -- and the Matrix Exponential

2023-08-16 Thread Bill Dunlap
You wrote:
 Using singular value decomposition, any second-order tensor is
given as
  A = UΣVt
  where U and V are the orthogonal tensors, and Σ is the diagonal
matrix (Eigenvalue matrix).

  For a symmetric matrix, the orthogonal tensors are the same,
i.e., U=V.

Can you state your definition of the SVD and prove (or outline a proof of)
that last statement?

-Bill

On Wed, Aug 16, 2023 at 3:47 AM Durga Prasad G me14d059 <
me14d...@smail.iitm.ac.in> wrote:

> Dear Martin, I am getting different responses from different officials of
> R-Software, but those statements are contradicting with the statements
> discussed in your email. Kindly go through the previous files and emails,
> and respond. I personally think, together we can fix the issue which is
> observed in SVD.
>
> Thanks and regards
> Durga Prasad
>
> On Tue, Aug 1, 2023 at 4:51 PM Lakshman, Aidan H  wrote:
>
> > Hi Durga,
> >
> > There’s an error in your calculations here. You mention that for the SVD
> > of a symmetric matrix, we must have U=V, but this is not a correct
> > statement. The unitary matrices are only equivalent if the matrix A is
> > positive semidefinite.
> >
> > In your example, you provide the matrix {{1,4},{4,1}}, which has
> > eigenvalues 5 and -3. This is not positive semidefinite, and thus there's
> > no requirement that the unitary matrices be equivalent.
> >
> > If you verify your example with something like wolfram alpha, you’ll find
> > that R’s solution is correct.
> >
> > -Aidan
> >
> > ---
> >
> > Aidan Lakshman (he/him) 
> >
> > Doctoral Fellow, Wright Lab 
> >
> > University of Pittsburgh School of Medicine
> >
> > Department of Biomedical Informatics
> >
> > ah...@pitt.edu
> >
> > (724) 612-9940
> >
> >
> >
> > --
> > *From:* R-devel  on behalf of Durga
> Prasad
> > G me14d059 
> > *Sent:* Tuesday, August 1, 2023 4:18:20 AM
> > *To:* Martin Maechler ;
> r-devel@r-project.org
> > ; profjcn...@gmail.com 
> > *Subject:* Re: [Rd] Concerns with SVD -- and the Matrix Exponential
> >
> > Hi Martin, Thank you for your reply. The response and the links provided
> by
> > you helped to learn more. But I am not able to obtain the simple even
> > powers of a matrix: one simple case is the square of a matrix. The square
> > of the matrix using direct matrix multiplication operations and svd (A =
> U
> > D V') are different. Kindly check the attached file for the complete
> > explanation. I want to know which technique was used in building the svd
> in
> > R-Software. I want to discuss about svd if you schedule a meeting.
> >
> > Thanks and Regards
> > Durga Prasad
> >
> >
> > On Mon, Jul 17, 2023 at 2:13 PM Martin Maechler <
> > maech...@stat.math.ethz.ch>
> > wrote:
> >
> > > > J C Nash
> > > > on Sun, 16 Jul 2023 13:30:57 -0400 writes:
> > >
> > > > Better check your definitions of SVD -- there are several
> > > > forms, but all I am aware of (and I wrote a couple of the
> > > > codes in the early 1970s for the SVD) have positive
> > > > singular values.
> > >
> > > > JN
> > >
> > > Indeed.
> > >
> > > More generally, the decomposition A = U D V'
> > > (with diagonal D and orthogonal U,V)
> > > is not at all unique.
> > >
> > > There are not only many possible different choices of the sign
> > > of the diagonal entries, but also the *ordering* of the singular values
> > > is non unique.
> > > That's why R and 'Lapack', the world-standard for
> > >   computer/numerical linear algebra, and others I think,
> > > make the decomposition unique by requiring
> > > non-negative entries in D and have them *sorted* decreasingly.
> > >
> > > The latter is what the help page   help(svd)  always said
> > > (and you should have studied that before raising such concerns).
> > >
> > > -
> > >
> > > To your second point (in the document), the matrix exponential:
> > > It is less known, but still has been known among experts for
> > > many years (and I think even among students of a class on
> > > numerical linear algebra), that there are quite a
> > > few mathematically equivalent ways to compute the matrix exponential,
> > > *BUT* that most of these may be numerically disastrous, for several
> > > different reasons depending on the case.
> > >
> > > This has been known for close to 50 years now:
> > >
> > >  Cleve Moler and Charles Van Loan  (1978)
> > >  Nineteen Dubious Ways to Compute the Exponential of a Matrix
> > >  SIAM Review Vol. 20(4)
> > >
> >
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoi.org%2F10.1137%2F1020098&data=05%7C01%7Cahl27%40pitt.edu%7C8575b77db32345ca544b08db927ceae0%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C638264837816871329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Y4mlFL%2FggLKd7FoIoY

Re: [Rd] Concerns with SVD -- and the Matrix Exponential

2023-08-16 Thread Duncan Murdoch
Dear Durga, I think you have a basic misunderstanding of this mailing 
list.  The responses you have received are from users and volunteer 
developers.  There are no "officials of R-Software".  R is an open 
source project containing contributions from hundreds (maybe thousands) 
of people.


It's only natural that there will be some contradictions in the 
responses from those people.  It's up to you to read the responses and 
find the parts of them that are useful to you.  It's rude of you to ask 
one particular respondent to do that work for you.


Duncan Murdoch

On 16/08/2023 4:06 a.m., Durga Prasad G me14d059 wrote:

Dear Martin, I am getting different responses from different officials of
R-Software, but those statements are contradicting with the statements
discussed in your email. Kindly go through the previous files and emails,
and respond. I personally think, together we can fix the issue which is
observed in SVD.

Thanks and regards
Durga Prasad

On Tue, Aug 1, 2023 at 4:51 PM Lakshman, Aidan H  wrote:


Hi Durga,

There’s an error in your calculations here. You mention that for the SVD
of a symmetric matrix, we must have U=V, but this is not a correct
statement. The unitary matrices are only equivalent if the matrix A is
positive semidefinite.

In your example, you provide the matrix {{1,4},{4,1}}, which has
eigenvalues 5 and -3. This is not positive semidefinite, and thus there's
no requirement that the unitary matrices be equivalent.

If you verify your example with something like wolfram alpha, you’ll find
that R’s solution is correct.

-Aidan

---

Aidan Lakshman (he/him) 

Doctoral Fellow, Wright Lab 

University of Pittsburgh School of Medicine

Department of Biomedical Informatics

ah...@pitt.edu

(724) 612-9940



--
*From:* R-devel  on behalf of Durga Prasad
G me14d059 
*Sent:* Tuesday, August 1, 2023 4:18:20 AM
*To:* Martin Maechler ; r-devel@r-project.org
; profjcn...@gmail.com 
*Subject:* Re: [Rd] Concerns with SVD -- and the Matrix Exponential

Hi Martin, Thank you for your reply. The response and the links provided by
you helped to learn more. But I am not able to obtain the simple even
powers of a matrix: one simple case is the square of a matrix. The square
of the matrix using direct matrix multiplication operations and svd (A = U
D V') are different. Kindly check the attached file for the complete
explanation. I want to know which technique was used in building the svd in
R-Software. I want to discuss about svd if you schedule a meeting.

Thanks and Regards
Durga Prasad


On Mon, Jul 17, 2023 at 2:13 PM Martin Maechler <
maech...@stat.math.ethz.ch>
wrote:


J C Nash
 on Sun, 16 Jul 2023 13:30:57 -0400 writes:


 > Better check your definitions of SVD -- there are several
 > forms, but all I am aware of (and I wrote a couple of the
 > codes in the early 1970s for the SVD) have positive
 > singular values.

 > JN

Indeed.

More generally, the decomposition A = U D V'
(with diagonal D and orthogonal U,V)
is not at all unique.

There are not only many possible different choices of the sign
of the diagonal entries, but also the *ordering* of the singular values
is non unique.
That's why R and 'Lapack', the world-standard for
   computer/numerical linear algebra, and others I think,
make the decomposition unique by requiring
non-negative entries in D and have them *sorted* decreasingly.

The latter is what the help page   help(svd)  always said
(and you should have studied that before raising such concerns).

-

To your second point (in the document), the matrix exponential:
It is less known, but still has been known among experts for
many years (and I think even among students of a class on
numerical linear algebra), that there are quite a
few mathematically equivalent ways to compute the matrix exponential,
*BUT* that most of these may be numerically disastrous, for several
different reasons depending on the case.

This has been known for close to 50 years now:

  Cleve Moler and Charles Van Loan  (1978)
  Nineteen Dubious Ways to Compute the Exponential of a Matrix
  SIAM Review Vol. 20(4)


https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoi.org%2F10.1137%2F1020098&data=05%7C01%7Cahl27%40pitt.edu%7C8575b77db32345ca544b08db927ceae0%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C638264837816871329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Y4mlFL%2FggLKd7FoIoY62esiFGUwukRG0YmELsJj7nd0%3D&reserved=0



Where as that publication had been important and much cited at
the time, the same authors (known world experts in the field)
wrote a review of that review 25 years later which I think (and
hope) is even more widely cited  (in R's man/*.Rd syntax) :

   Cleve Moler and Char