On Sun, Dec 24, 2023 at 8:46 PM arkiver 
<arki...@protonmail.com<mailto:arki...@protonmail.com>> wrote:
Thank you for your replies Eric and Rich, and thank you for looking into this 
with me! I will reply to you both in this message (divided in sections due to 
length).

That actually isn't that helpful, because it means that I need to trim the 
message to respond.
R$: I completely agree with EKR.

My idea of a web archive format is that we do not want to support only the 
currently most used modern protocols, but also the earlier (obsolete) versions, 
as they may still be used somewhere and we have to take them into account 
during archiving. Else we have to exclude certain data from being archived and 
might make it more difficult in the future to allow for this data be archived, 
or create confusion when support for archiving this data is added eventually.

Does your current archive fetching things from servers that only do SSLv2? Or 
is this a theoretical concern?


Somewhat central to a WARC record is the URI. It shows the location and 
connection over which data was received. It is for example also the main header 
from WARC records to index and find information with in these WARC files. For 
me, 
"tls://archive.org:443<https://urldefense.com/v3/__http:/archive.org:443__;!!GjvTz_vk!XzCbQHHAYswL3gjbPGf54jxfpzC_O0GPcmQHZUgdBVqtbLXVj679UZs9ifrB4v6z0BVo_Q$>"
 would describe "data received over a TLS connection at 
archive.org:443<https://urldefense.com/v3/__http:/archive.org:443__;!!GjvTz_vk!XzCbQHHAYswL3gjbPGf54jxfpzC_O0GPcmQHZUgdBVqtbLXVj679UZs9ifrB4v6z0BVo_Q$>",

But that is incomplete. It doesn’t tell you IP address, v4 or v6. Given that 
that your first message said you were concerned about the kind of response you 
got, I would expect that knowing the exact IP address you reached would be 
important. Saying “archive.org” will give you what the DNS system (and its 
complicated interaction of resolvers and DNS-based filtering) thinks it is 
*now*. It does not tell you what it was at the time of the archive fetch. Of 
course, IP addresses move as well, so that’s not perfect either. I don’t know 
what would be.

Your proposal also doesn’t address which protocol was used to do the fetching. 
Maybe that information is stored in another part of the WARC file, but your 
decription quoted above is still incomplete. What version of HTTP are you 
using? Or is it gopher? RealPlayer audio? H3? You cannot intuit that just from 
the “443” and if you are concerned about SSLv2, presumably you also want dead 
formats like the first two.

Well, the URI used to retrieve the data isn't "tls:" but rather "https:". In 
any case, it's not appropriate to register a generic "tls:" URI for this use 
case.

Exactly.
_______________________________________________
TLS mailing list
TLS@ietf.org
https://www.ietf.org/mailman/listinfo/tls

Reply via email to