po 30. 11. 2020 v 22:15 odesÃlatel Pavel Stehule <pavel.steh...@gmail.com> napsal:
> > > po 30. 11. 2020 v 14:14 odesÃlatel Peter Eisentraut < > peter.eisentr...@enterprisedb.com> napsal: > >> On 2020-11-29 18:36, Pavel Stehule wrote: >> > >> > I don't really get the point of this function. There is AFAICT no >> > function to produce this escaped format, and it's not a recognized >> > interchange format. So under what circumstances would one need to >> > use this? >> > >> > >> > Some corporate data can be in CSV format with escaped unicode >> > characters. Without this function it is not possible to decode these >> > files without external application. >> >> I would like some supporting documentation on this. So far we only have >> one stackoverflow question, and then this implementation, and they are >> not even the same format. My worry is that if there is not precise >> specification, then people are going to want to add things in the >> future, and there will be no way to analyze such requests in a >> principled way. >> >> > I checked this and it is "prefix backslash-u hex" used by Java, > JavaScript or RTF - > https://billposer.org/Software/ListOfRepresentations.html > > In some languages (Python), there is decoder "unicode-escape". Java has a > method escapeJava, for conversion from unicode to ascii. I can imagine so > these data are from Java systems exported to 8bit strings - so this > implementation can be accepted as referential. This format is used by > https://docs.oracle.com/javase/8/docs/technotes/tools/unix/native2ascii.html > tool too. > > Postgres can decode this format too, and the patch is based on Postgres > implementation. I just implemented a different interface. > > Currently decode function does only text->bytea transformation. Maybe a > more generic function "decode_text" and "encode_text" for similar cases can > be better (here we need text->text transformation). But it looks like > overengineering now. > > Maybe we introduce new encoding "ascii" and we can implement new > conversions "ascii_to_utf8" and "utf8_to_ascii". It looks like the most > clean solution. What do you think about it? > a better name of new encoding can be "unicode-escape" than "ascii". We use "to_ascii" function for different use case. set client_encoding to unicode-escape; copy tab from xxx; ... but it doesn't help when only a few columns from the table are in unicode-escape format. > Regards > > Pavel > > >