Re: Make Simple Things Hard to Figure out

default0 via Digitalmars-d-learn Mon, 21 Dec 2015 10:07:01 -0800

On Monday, 21 December 2015 at 16:20:18 UTC, Adam D. Ruppe wrote:

On Monday, 21 December 2015 at 13:51:57 UTC, default0 wrote:
The thing I was trying to do was dead simple: Receive a base64encoded text via a query parameter.
So when I read this, I thought you might have missed anotherlittle fact... there's more than one base64.

I am aware of this and I used Base64URL in my code, as does myfrontend :-) Glad you pointed it out though, I really did writemy post as if I missed that fact.

Yup, normal Base64 encoding uses + and / as characters, whichare special in URLs, so often (but not always!), base64 urlencoding uses - and _ instead.
This isn't D specific, it is just part of the confusing messthat is the real world of computer data.
Normal base64 does work in urls, as long as it is properly urlencoded. (Got enough encoding yet?!)


Oh you can keep going, I'm not that easily scared :D

My first instinct was to use google.
Tip I tell people at work too: yes, look for it yourself, butif you don't see an answer with a few minutes, go ahead and askus, drop a quick question in the chatroom. D has one on IRCfreenode called #d.

I don't have an IRC client set up since I rarely use that, plusan IRC is always kind of "out of the way". It's good to know, butif you're a beginner trying to learn about basics of a language,standalone tutorials and/or easy-to-understand documentation withexamples are miles better :-)

There is a decode function, but I couldn't quite figure outwhat it did or how I was supposed to use it, if it did what Iwanted it to - no examples.
std.utf.decode will take a few chars and decode them into asingle wchar or dchar.
Take the character “ for example, the double curly quote thatMicrosoft Word likes to put in when you type " on your keyboard.
“ has several different encodings as bytes.

http://www.fileformat.info/info/unicode/char/201c/index.htm

UTF-8 (hex)     0xE2 0x80 0x9C (e2809c)
UTF-16 (hex)    0x201C (201c)
UTF-32 (hex)    0x0000201C (201c)


UTF-8 is char in D. That curly quote takes up three chars:

char[] curlyQuote = [0xE2, 0x80, 0x9C];
size_t idx = 0;
dchar curlyQuoteAsDchar = decode(curlyQuote[], idx);

assert(curlyQuoteAsDchar == '\u201c');

Nice explanation, thanks. I wish the documentation could havetaught me that information as clearly as you did :-)

There's one big exception though... the validate function.

http://dlang.org/phobos/std_utf.html#validate
That works on a whole string and validates the whole sequenceof chars as being valid utf8, throwing an exception if itisn't. (Weird behavior btw, I think I would have preferred`isValid` returning bool, or `validate` taking bytes andreturning chars - which would be exactly what you wanted - butit returns void and throws instead :( )

Well, a ubyte[] isn't exactly an array of code-points, so justcalling validate and casting is confusing (even though logical ifyou think about it for a second).Having an API like bool tryDecode(ubyte[], char[] outBuf) exceptmore rangified and an analogous char[] decode(ubyte[]) (alsorangified) would be much easier tounderstand (and I would argue use, too). The task I'm trying todo is explicitly not "casting this byte array to code points" but"decode this byte array into code points". That an implementationof this functionality may simply cast the originalarray is an implementation detail, so going forcast(string)ubytes in the first place is kind ofcounter-intuitive (since I did have some D exposure for a while Imanaged to figure that one out without too much of a hasslethough).

This stuff btw is pretty confusing, there's an awful lot toknow about text encoding, so don't feel bad if it makes verylittle sense to you. I spent like four pages in my bookintroducing unicode as part of the discussion on D strings...and still, that left out a lot of things too...

Text encoding in general makes sense to me - I don't usually havetrouble dealing with it. It was just hard to navigate theinformation available on how to write the code to do thenecessary things in D :-)

After that I moved on to std.string. It only had one functionthat seemed somewhat interesting - assumeUTF. After readingthrough the docs, it failed my criteria since it had novalidation - as its name states, it simply assumes thatwhatever you give it is correctly encoded. I didn't expectmuch here anyways, it would have been an odd place to put thisfunctionality.
Ooooh you're close though.

If you did

---
import std.base64, std.string, std.utf;

auto utf = assumeUTF(Base64.decode(it));
validate(utf);
---

you'd probably get what you wanted...

That plus some text explaining the details should be the answerto the SO question.http://stackoverflow.com/questions/34401744/convert-ubyte-to-string-in-d is where I asked. Would be awesome if you could respond there!

Really inconvenient. It then goes on to state that itsupersedes std.utf.decode, but I don't remember reading anynotice in std.utf.decode that it actually was superseded and Ishouldn't even really bother trying to learn about it, weirdbut okay.
blargh I had to look at the source to understand what theseactually did


That sounds painful @_@

EncodingScheme.create("UTF-8").isValid(decodedBase64) followedby a type-system-ignoring cast from ubyte[] to char[] (since Inow know it is valid so this cast is fine). All in all,including the explicit error handling required by isValid thishas taken about an hour of research and 7 lines of code.
yeah that works too
So with that in mind, any ideas to improve the situation (thatdo not require 500 man-decades of work)?
We need a lot more examples, and not just of individualfunctions. Examples on how to bring the functions together todo real world tasks.

Yup, lots of things in D require composition of different partsof std. This is not easy to learn or understand unless you arequite familiar with std - or have a heap of examples for lots ofdifferent tasks somewhere.

Re: Make Simple Things Hard to Figure out

Reply via email to