...hit send by accident ..or BLACK SHOULDERED KITE
You'll see all three regularly, depending on context and writer. It's always the same bird. A natural search turns up all matches, regardless of capitalization differences. The hyphens are hard....I'm a huge, huge, huge fan of fuzzy string matching and n-grams, but that's an unrelated topic. On Mon, Mar 18, 2019 at 10:18 PM Morris de Oryx <morrisdeo...@gmail.com> wrote: > Sounds like I may have touched a nerve with some. If so, no offense > intended! > > There are cases where case-sensitivity is required or desirable, it would > be silly to argue otherwise. Where you have such cases, then case-sensitive > queries are great. Some RDBMS systems default to case-sensitive searches, > others default to case-blind searches. I'll take that to mean that either > choice has merit, and it's not foolish to prefer either. I need case-blind > searches vanishingly close to 100% of the time, but other people have > different conditions and may find that they almost never need case-blind > searches. I find this extreme hard to imagine, but find it easy to imagine > people who need case-sensitive searches quite often. > > What I've been thinking about most are user-driven searches on > user-oriented data. Users, at least any user I've ever had, don't want > case-sensitive searches. They also don't care about diacritical characters, > at least in English. I worked for a French company for many years. > Diacritical searches were not always preferred, but sometimes were. It > depends on your user community and their requirements and norms. Sometimes > it comes down to an individual user. Options are good! I was really just > trying to warn someone coming from a base-blind default about Postgres > behavior because, well, it hurts if you aren't expecting it. That doesn't > make Postgres wrong (it's not that kind of a choice), but it is important > to know about. > > I'm new to Postgres (only about a year in), and it's great. But I'm used > to a case-blind search as a default. And, honestly, I can *never* remember > a case when a user asked for a case-sensitive search. Ever. In 30+ years. > Maybe it's just me. Just kidding, it's not just me. If you're presenting > users with a search interface, you can find out by asking them. Or you can > AB test a search UI where there is the option of case-sensitive/blind > searching, but you randomly flip which is the default. For users, > case-sensitive searches are assumed. That's what Google does. Seriously, > Google === Search. It's not a hard test to run. If you find that with a > case-blind search, 30% of user tick the box to make it case-sensitive, then > you've got users that often do care about case-sensitive search. > > And since it seems to be unclear at a few places in the discussion above: *It > absolutely makes sense to store data in its original form* and to allow > for case-sensitive searches, when required. It would be very weird to store > > * Call me back Ishmael, I've gotta go...* > > and get back anything else, be that > > > * call me back ishmael, i've gotta go...* > > or > > * CALL ME BACK ISHMAEL, I'VE GOTTA GO...* > > As far as I understand it in Postgres, you can: > > * Use something like UPPER or LOWER in Every. Single. Search. > > * Fold text to one case consistently in an index to make searches > case-blind. > > * Use citext. > > * Teach and convince every user to enter and search for data > case-sensitively correctly Every. Single. Time. > > On that last point, good luck. Here's an example, I'm keen on birds. Do > you write it: > > Black-shouldered Kite > Black-Shouldered Kite > > > On Sun, Mar 17, 2019 at 8:46 AM Peter J. Holzer <hjp-pg...@hjp.at> wrote: > >> On 2019-03-16 14:00:34 -0600, Rob Sargent wrote: >> > What sort of content is in your field of type text? Certainly, >> in >> > English >> > prose, “rob” is different than “Rob” >> > >> > >> > I disagree. While the grammar for written English has rules when to >> > write "rob" and when to write "Rob", that distinction usually >> carries no >> > semantic difference. Consider: >> [...] >> > I don’t think it’s solely about the semantics. One might be >> contractually >> > obligated to always spell a name in some exact way including it >> capitalization. >> > For instance if referring to "Rob Sargent” as a quote or accreditation, >> then >> > it’s not okay to let a typo “rob Sargent” go through. >> >> 1) Such contracts might exist, but they are only binding to the signing >> parties, they don't affect what is commonly understood as "the English >> language". Everybody else will see it as an obvious typo and won't >> assume that this refers to some "rob Sargent" who is a different person >> than "Rob Sargent". >> >> 2) I don't think the OP was talking about spell-checking. And in any >> case spell-checking is more complicated than simply comparing strings >> byte by byte. >> >> hp >> >> >> -- >> _ | Peter J. Holzer | we build much bigger, better disasters now >> |_|_) | | because we have much more sophisticated >> | | | h...@hjp.at | management tools. >> __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> >> -----BEGIN PGP SIGNATURE----- >> >> iQIzBAABCAAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAlyNbrwACgkQ8g5IURL+ >> KF2mKhAAq8b9RLFBihsO+dQf3pxjLt3bWoD9mhgMEy8GvxADuxAF4dLo1SqfX2S0 >> aIZFEyQgMoNKWOaiylj7TYPiBRilfVSCxggisPsirUKLKpXr4Bw9oIGiPBiE+21m >> ajlaONOZNiaM9D+BFthFkPM0TcjR2FHTaXOch0HbFnPnDWMgEPwY9yyDeN8ZPCOn >> 7G002EB3wxHcnhoFm8jGO2E8SL9l0NLU6+CVlCPAenGI0H6gbFatHi1LKTmNe+PH >> d32Il7Pf4GE0o/k92X0Yil3oqMSU/vV08LSrMVGxUDVAz+cL0/W40V/3BthELi95 >> l7kGJRJfgic7LU7IK84yszxR1ZJt1aBLyFIUpEoEsPl6XuSQsTOm/sFt/1C0L8Io >> iLYNTnLh2CveKt/bldITAAPlHZ1rF92BUsAJRA2Wci5yv5V7orykDPgE8HzhB5U9 >> m1cHfjInqMi1ogGgSLw0sPH6ZDlT76p1H8RjjQQhD9eAFNYt29xFOUARyyWHVEut >> ZyWMIeHzWpLRr/h+5y6uJ+LhH4NVMONc3unE6fy1MgIZtIcmeFrbx8+7XGKmv/kD >> RY12aNOrCF85HdiIRd/ImXOsCWWiq7u7itZQrsqhQVRXNMnV/7M4Hlpe1rBeQ9V+ >> k2FXuFM/eEtyEO+KKYwct+1iYerTTKY6hyCQwBINVewe/Fohd+A= >> =8B/x >> -----END PGP SIGNATURE----- >> >