Re: proposal: unescape_text function

Pavel Stehule Mon, 30 Nov 2020 13:57:34 -0800

po 30. 11. 2020 v 22:15 odesílatel Pavel Stehule <pavel.steh...@gmail.com>
napsal:


>
>
> po 30. 11. 2020 v 14:14 odesílatel Peter Eisentraut <
> peter.eisentr...@enterprisedb.com> napsal:
>
>> On 2020-11-29 18:36, Pavel Stehule wrote:
>> >
>> >     I don't really get the point of this function.  There is AFAICT no
>> >     function to produce this escaped format, and it's not a recognized
>> >     interchange format.  So under what circumstances would one need to
>> >     use this?
>> >
>> >
>> > Some corporate data can be in CSV format with escaped unicode
>> > characters. Without this function it is not possible to decode these
>> > files without external application.
>>
>> I would like some supporting documentation on this.  So far we only have
>> one stackoverflow question, and then this implementation, and they are
>> not even the same format.  My worry is that if there is not precise
>> specification, then people are going to want to add things in the
>> future, and there will be no way to analyze such requests in a
>> principled way.
>>
>>
> I checked this and it is "prefix backslash-u hex" used by Java,
> JavaScript  or RTF -
> https://billposer.org/Software/ListOfRepresentations.html
>
> In some languages (Python), there is decoder "unicode-escape". Java has a
> method escapeJava, for conversion from unicode to ascii. I can imagine so
> these data are from Java systems exported to 8bit strings - so this
> implementation can be accepted as  referential. This format is used by
> https://docs.oracle.com/javase/8/docs/technotes/tools/unix/native2ascii.html
> tool too.
>
> Postgres can decode this format too, and the patch is based on Postgres
> implementation. I just implemented a different interface.
>
> Currently decode function does only text->bytea transformation. Maybe a
> more generic function "decode_text" and "encode_text" for similar cases can
> be better (here we need text->text transformation). But it looks like
> overengineering now.
>
> Maybe we introduce new encoding "ascii" and we can implement new
> conversions "ascii_to_utf8" and "utf8_to_ascii". It looks like the most
> clean solution. What do you think about it?
>

a better name of new encoding can be "unicode-escape" than "ascii". We use
"to_ascii" function for different use case.

set client_encoding to unicode-escape;
copy tab from xxx;
...

but it doesn't help when only a few columns from the table are in
unicode-escape format.




> Regards
>
> Pavel
>
>
>

Re: proposal: unescape_text function

Reply via email to