Hi,

That’s curious. Isn’t the point of non-blocking connections that you
keep them open across multiple read calls? So why would it be useful
to default-open unz() as non-blocking only to close it again?

The most similar connection type to unz() seems to me to be gzfile()
which is opened as blocking by default [1] and is also special-cased
to avoid the readLines() push-back [2]. I wonder why it’s useful for
unz() not to behave the same? Especially considering that at least
readLines() from an open non-blocking unz() is just an error currently.

Best,

Mikko

[1]: 
https://github.com/wch/r-source/blob/69af5a3859b0a45432a01f118dbecf5d8831085b/src/main/connections.c#L2451
[2]: 
https://github.com/wch/r-source/blob/69af5a3859b0a45432a01f118dbecf5d8831085b/src/main/connections.c#L4163

From: Iris Simmons <ikwsi...@gmail.com>
Sent: Friday, 25 October 2024 14:38
To: Marttila Mikko <mikko.martt...@orionpharma.com>
Cc: R help Mailing list <r-help@r-project.org>
Subject: Re: [R] readLines() and unz() and non-empty final line

Hi again,


The unz connection is non-blocking by default. I checked do_unz which calls 
R_newunz which calls init_con and the only place in any of those functions that 
sets 'blocking' is init_con which sets it to FALSE:

https://github.com/wch/r-source/blob/0c26529e807a9b1dd65f7324958c17bf72e1de1a/src/main/connections.c#L713

I'll open an issue on R-bugzilla and see if they're willing to do something 
similar to 'file()'; that is, add a 'blocking' argument to unz. It's hard to 
say whether they would choose 'blocking = FALSE' for back compatibility or 
'blocking = TRUE' for consistency with 'file()'.


Regards,

Iris
On Fri, Oct 25, 2024, 04:47 Marttila Mikko 
<mikko.martt...@orionpharma.com<mailto:mikko.martt...@orionpharma.com>> wrote:
Thanks Iris, Bert, and Tim.

Whether unz() is blocking or not by default doesn’t seem to be documented. 
Indeed, thank you Iris for finding out that explicitly opening it as blocking 
would work. That made me wonder if it’s non-blocking by default then, which 
would have been surprising. However, explicitly opening it as non-blocking 
seems to lead to problems as well:

> local({
+   con <- unz("hello.zip", "hello.txt")
+   open(con, blocking = FALSE)
+   on.exit(close(con))
+   res <- readLines(con)
+   res
+ })
Error in readLines(con) : seek not enabled for this connection
Calls: local ... eval.parent -> eval -> eval -> eval -> eval -> readLines
Execution halted
So, the behaviour of unz() seems to be different depending on whether it was 
explicitly opened before passed to readLines(). Should this be fixed or 
documented?

Best,

Mikko

From: Bert Gunter <bgunter.4...@gmail.com<mailto:bgunter.4...@gmail.com>>
Sent: Thursday, 24 October 2024 18:13
To: Iris Simmons <ikwsi...@gmail.com<mailto:ikwsi...@gmail.com>>
Cc: Marttila Mikko 
<mikko.martt...@orionpharma.com<mailto:mikko.martt...@orionpharma.com>>; 
r-help@r-project.org<mailto:r-help@r-project.org>
Subject: Re: [R] readLines() and unz() and non-empty final line

You don't often get email from 
bgunter.4...@gmail.com<mailto:bgunter.4...@gmail.com>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
But note:

> zip("hello.zip", "hello.txt")
updating: hello.txt (stored 0%)
> readChar(unz("hello.zip","hello.txt"),100)
[1] "hello"

I leave it to you and other wiser heads to figure out.

Cheers,
Bert

On Thu, Oct 24, 2024 at 8:57 AM Iris Simmons 
<ikwsi...@gmail.com<mailto:ikwsi...@gmail.com>> wrote:
Hi Mikko,


I tried running a few different things, and it seems as though
explicitly using `open()` and opening a blocking connection works.

```R
cat("hello", file = "hello.txt")
zip("hello.zip", "hello.txt")
local({
    conn <- unz("hello.zip", "hello.txt")
    on.exit(close(conn))
    ## you can use "r" instead of "rt"
    ##
    ## 'blocking = TRUE' is the default, so remove if desired
    open(conn, "rb", blocking = TRUE)
    readLines(conn)
})
```

A blocking connection might be undesirable for you, in which case
someone else might have a better solution.

On Thu, Oct 24, 2024 at 10:58 AM Marttila Mikko via R-help
<r-help@r-project.org<mailto:r-help@r-project.org>> wrote:
>
> Dear list,
>
> I'm seeing a strange interaction with readLines() and unz() when reading
> a file without an empty final line. The final line gets dropped silently:
>
> > cat("hello", file = "hello.txt")
> > zip("hello.zip", "hello.txt")
>   adding: hello.txt (stored 0%)
> > readLines(unz("hello.zip", "hello.txt"))
> character(0)
>
> The documentation for readLines() says if the final line is incomplete for
> "non-blocking text-mode connections" the line is "pushed back, silently"
> but otherwise "accepted with a warning".
>
> My understanding is that the unz() here is blocking so the line should be
> accepted. Is that incorrect? If so, how would I go about reading such
> lines from a zip file?
>
> Best,
>
> Mikko
>
>
> This e-mail transmission may contain confidential or legally privileged 
> information that is intended only for the individual or entity named in the 
> e-mail address. If you are not the intended recipient, you are hereby 
> notified that any disclosure, copying, distribution, or reliance upon the 
> contents of this e-mail is strictly prohibited. If you have received this 
> e-mail transmission in error, please reply to the sender, so that they can 
> arrange for proper delivery, and then please delete the message from your 
> computer systems. Thank you.
>
> ______________________________________________
> R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
> UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This e-mail transmission may contain confidential or legally privileged 
information that is intended only for the individual or entity named in the 
e-mail address. If you are not the intended recipient, you are hereby notified 
that any disclosure, copying, distribution, or reliance upon the contents of 
this e-mail is strictly prohibited. If you have received this e-mail 
transmission in error, please reply to the sender, so that they can arrange for 
proper delivery, and then please delete the message from your computer systems. 
Thank you.


This e-mail transmission may contain confidential or legally privileged 
information that is intended only for the individual or entity named in the 
e-mail address. If you are not the intended recipient, you are hereby notified 
that any disclosure, copying, distribution, or reliance upon the contents of 
this e-mail is strictly prohibited. If you have received this e-mail 
transmission in error, please reply to the sender, so that they can arrange for 
proper delivery, and then please delete the message from your computer systems. 
Thank you.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to