Here is a good article that helped me with what's going wrong:
http://www.oracle.com/technetwork/articles/javase/supplementary-142654.html
Basically, Java is stuck at 16 bits per char due to legacy reasons. They
admit that for a new language, they would probably use 32 (or 24?) bits
per char.
\u
Glad we got to the bottom of that. That's quite a nasty compiler/language
bug I must say. Not even a warning. Still, python crashes when trying to
print the name of a null character. It wouldn't surprise me if there are
other weird issues lurking. Would definitely sleep better with a more
restricte
Ok, I just fixed the String filtering so that it can handle SMP chars
and my implementation behaves exactly like in your modified testcase
quoted below.
Bitcoinj code available on this branch, in case we decide to change the
spec:
https://github.com/schildbach/bitcoinj/commits/bip38-normalize-con
Please excuse me. I had a more thorough look at the original problem and
found that the only problem with the original test case was that you
cannot specify codepoints from the SMP using \u in Java. I always tried
\u010400 but that doesn't work.
Here is a fix for bitcoinj. The test now passes.
ht
If I first remove \u, so the non-normalized passphrase is
"\u03D2\u0301\U00010400\U0001F4A9", and then NFC normalize it, it
becomes "\u03D3\U00010400\U0001F4A9"
UTF-8 encoded this is: 0xcf93f0909080f09f92a9 (not the same as what
you got, Andreas!)
Encoding private key: 5Jajm8eQ22H3pGWLEVCXyvN
On Wed, Jul 16, 2014 at 11:29 AM, Mike Hearn wrote:
> Yes sorry, you're right, the issue starts with the null code point. Python
> seems to have problems starting there too. It might work if we took that
> out.
Forbidding control characters, at least anything < 32 makes a lot of
sense to me. Carr
Damn, I just realized that I implement only the decoding side of BIP38.
So I cannot propose a complete test vector. Here is what I have:
Passphrase: ϓ␀𐐀💩 (\u03D2\u0301\u\U00010400\U0001F4A9; GREEK
UPSILON WITH HOOK, COMBINING ACUTE ACCENT, NULL, DESERET CAPITAL LETTER
LONG I, PILE OF POO)
P
I will change the bitcoinj implementation and propose a new test vector.
On 07/16/2014 11:29 AM, Mike Hearn wrote:
> Yes sorry, you're right, the issue starts with the null code point.
> Python seems to have problems starting there too. It might work if we
> took that out.
>
>
> On Wed, Jul 16
Yes sorry, you're right, the issue starts with the null code point. Python
seems to have problems starting there too. It might work if we took that
out.
On Wed, Jul 16, 2014 at 11:17 AM, Andreas Schildbach
wrote:
> Guys, you are always talking about the Unicode astral plane, but in fact
> its a
Guys, you are always talking about the Unicode astral plane, but in fact
its a plain old (ASCII) control character where this problem starts and
likely ends: \u.
Let's ban/filter ISO control characters and be done with it. Most
control characters will never be enterable by any keyboard into a
I'm all for fixing bugs, but I know from bitter experience that outside the
BMP dragons lurk. Browsers don't even expose Unicode APIs at all. You end
up needing to ship an entire pure-js implementation, which can be too large
for some use cases (too much time sunk on that issue in my last job).
I'
If the user creates a password on an iOS device with an astral
character and then can't enter that password on a JVM wallet, that
sucks. If JVMs really can't support unicode NFC then that's a strong
case to limit the spec to the subset of unicode that all popular
platforms can support, but it sound
Yes, we know, Andreas' code is indeed doing normalisation.
However it appears the output bytes end up being different. What I get back
is:
cf9300*01*303430300166346139
vs
cf9300*f0*909080f09f92a9
from the spec.
I'm not sure why. It appears this is due to the character from the astral
planes.
I was part of adding in that test vector, and I think it's a good test
vector since it is an extreme edge-case of the current definition: If the
BIP38 proposal allows any password that can be in UTF-8, NFC normalized
form, those characters cover the various edge cases (combining characters,
null ch
On whitespace: Security UX testing I've seen shows it is mentally
easier for some users to memorize and use longer passphrases, if they
are permitted spaces. I've not seen anything written on use of
tabs/NLs/FFs in passphrases.
I can see the logic of some systems, that convert \s+ into ' ' for
p
Can you provide the rationale for standard practice? For example, why
should whitespace be allowed? I regularly use trim() on any passphrase
(or other input ftm).
So what's the action point? Should we amend the spec to filter control
characters? That would get rid of the \u problem.
On 07/15
>
> Unicode guarantees that null-terminated strings still work.
UTF-8 guarantees that. Other encodings do not, you can have null bytes in
UTF-16 strings for example. Indeed most languages that use pascal-style
encodings internally allow null characters in strings, it's just not a good
idea to exp
Unicode guarantees that null-terminated strings still work. U+
terminates a unicode (or C) string. strlen() gets the string byte
count. mbstowcs() gets the character count.
Whitespace can be problematic, but should be allowed. Control
characters should be filtered. Emoticons probably cann
I have a python implementation that seems to pass this test vector:
https://github.com/wozz/electrum/blob/bip38_import/lib/bip38.py#L299
On Jul 15, 2014, at 9:19 AM, Andreas Schildbach wrote:
> I think generally control-characters (such as \u) should be
> disallowed in passphrases. (Even
I think generally control-characters (such as \u) should be
disallowed in passphrases. (Even the use of whitespaces is very
questionable.)
I'm ok with allowing pile-of-poo's. On mobile phones there is keyboards
just containing emoticons -- why not allow those? Assuming NFC works of
course.
O
I don't know for sure if the test vector is correct NFC form. But for what
it's worth, the Pile of Poo character is pretty easily accessible on the
iPhone and Android keyboards, and in this string it's already in NFC form
(f09f92a9 in the test result). I've certainly seen it in usernames around
t
[+cc aaron]
We recently added an implementation of BIP 38 (password protected private
keys) to bitcoinj. It came to my attention that the third test vector may
be broken. It gives a hex version of what the NFC normalised version of the
input string should be, but this does not match the results of
22 matches
Mail list logo