support for DIN SPEC 91379 encoding

2022-03-27 Thread Marco Lechner
Hi,

Does anyone here know, if postgresql supports DIN SPEC 91379 encoding?

As far as I understand it is a “new” encoding supporting all “EU characters” 
based on Unicode, but is not compliant to UTF-8. As far as I di dread, there 
are a few characters in DIN SOEC 91379 that are not within UTF-8. As DIN SPEC 
91379 seems to be a national specification (DE) it is based on a European law. 
I guess other similar international or aat least European compliant encodings 
should exist or at least other national specs that are compliant with the 
german DIN SOEC 91379.

i.A. Dr. Marco Lechner
Leiter Fachgebiet RN 1 │ Head RN 1

--
Bundesamt für Strahlenschutz │ Federal Office for Radiation Protection
Koordination Notfallschutzsysteme │ Coordination Emergency Systems │ RN 1
Rosastr. 9
D-79098 Freiburg

Tel.: +49 30 18333-6724
E-Mail: mlech...@bfs.de
www.bfs.de
🌐 Besuchen Sie unsere Website, folgen Sie uns auf 
Twitter und 
abonnieren Sie unseren 📢 Newsletter.
🔒 Informationen zum Datenschutz gemäß Artikel 
13 DSGVO
💚 E-Mail drucken? Lieber die Umwelt schonen!

--
Hinweis zu Anhängen die auf .p7m/.p7c/.p7s oder .asc/.asc.sig enden:
Die .p7?- und .asc-Dateien sind ungefährliche Signaturdateien (digitale 
Unterschriften). In E-Mail-Clients mit S/MIME Konfiguration (.p7?) oder 
PGP-Erweiterung (.asc) dienen sie zur:
- Überprüfung des Absenders
- Überprüfung einer evtl. Veränderung des Inhalts während der Übermittlung über 
das Internet
Die Signaturdateien können ebenso dazu verwendet werden dem Absender dieser 
Signatur eine E-Mail mit verschlüsseltem Inhalt zu senden. In E-Mail-Clients 
ohne S/MIME Konfiguration oder PGP-Erweiterung erscheinen die Dateien als 
Anhang und können ignoriert werden.



Re: support for DIN SPEC 91379 encoding

2022-03-27 Thread Ralf Schuchardt
Hi Marco,

On 27 Mar 2022, at 12:54, Marco Lechner wrote:

> Hi,
>
> Does anyone here know, if postgresql supports DIN SPEC 91379 encoding?
>
> As far as I understand it is a “new” encoding supporting all “EU characters” 
> based on Unicode, but is not compliant to UTF-8.  As far as I di dread, there 
> are a few characters in DIN SOEC
> 91379 that are not within UTF-8.

where did you read, that this DIN SPEC 91379 norm is incompatible with UTF-8?

In the document „String.Latin+ 1.2: eine kommentierte und erweiterte Fassung 
der DIN SPEC 91379. Inklusive einer umfangreichen Liste häufig gestellter 
Fragen. Herausgegeben von der Fachgruppe String.Latin“ linked here 
https://www.xoev.de/downloads-2316#StringLatin it is said, that the spec is a 
strict subset of unicode (E.1.6), and it is also mentioned in E.1.4, that in 
UTF-8 all unicode characters can be encoded. Therefore UTF-8 can be used to 
encode all DIN SPEC 91379 characters.
On the other hand UTF-8 strings may have characters not included in the DIN 
SPEC.

Ralf

> As DIN SPEC 91379 seems to be a national specification (DE) it is based on a 
> European law. I guess other similar international or aat least European 
> compliant encodings should exist or at least other national specs that are 
> compliant with the german DIN SOEC 91379.
>
> i.A. Dr. Marco Lechner
> Leiter Fachgebiet RN 1 │ Head RN 1
>
> --
> Bundesamt für Strahlenschutz │ Federal Office for Radiation Protection
> Koordination Notfallschutzsysteme │ Coordination Emergency Systems │ RN 1
> Rosastr. 9
> D-79098 Freiburg
>
> Tel.: +49 30 18333-6724
> E-Mail: mlech...@bfs.de
> www.bfs.de
> 🌐 Besuchen Sie unsere Website, folgen Sie uns auf 
> Twitter und 
> abonnieren Sie unseren 📢 Newsletter.
> 🔒 Informationen zum Datenschutz gemäß Artikel 
> 13 DSGVO
> 💚 E-Mail drucken? Lieber die Umwelt schonen!
>
> --
> Hinweis zu Anhängen die auf .p7m/.p7c/.p7s oder .asc/.asc.sig enden:
> Die .p7?- und .asc-Dateien sind ungefährliche Signaturdateien (digitale 
> Unterschriften). In E-Mail-Clients mit S/MIME Konfiguration (.p7?) oder 
> PGP-Erweiterung (.asc) dienen sie zur:
> - Überprüfung des Absenders
> - Überprüfung einer evtl. Veränderung des Inhalts während der Übermittlung 
> über das Internet
> Die Signaturdateien können ebenso dazu verwendet werden dem Absender dieser 
> Signatur eine E-Mail mit verschlüsseltem Inhalt zu senden. In E-Mail-Clients 
> ohne S/MIME Konfiguration oder PGP-Erweiterung erscheinen die Dateien als 
> Anhang und können ignoriert werden.




Re: support for DIN SPEC 91379 encoding

2022-03-27 Thread Alvaro Herrera
On 2022-Mar-27, Ralf Schuchardt wrote:

> where did you read, that this DIN SPEC 91379 norm is incompatible with UTF-8?
> 
> In the document „String.Latin+ 1.2: eine kommentierte und erweiterte
> Fassung der DIN SPEC 91379. Inklusive einer umfangreichen Liste häufig
> gestellter Fragen. Herausgegeben von der Fachgruppe String.Latin“
> linked here https://www.xoev.de/downloads-2316#StringLatin it is said,
> that the spec is a strict subset of unicode (E.1.6), and it is also
> mentioned in E.1.4, that in UTF-8 all unicode characters can be
> encoded. Therefore UTF-8 can be used to encode all DIN SPEC 91379
> characters.

So the remaining question is whether DIN SPEC 91379 requires an
implementation to support character U+.  If it does, then PostgreSQL
is not conformant, because that character is the only one in Unicode
that we don't support.  If U+ is not required, then PostgreSQL is
okay.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: support for DIN SPEC 91379 encoding

2022-03-27 Thread Tom Lane
Alvaro Herrera  writes:
> On 2022-Mar-27, Ralf Schuchardt wrote:
>> linked here https://www.xoev.de/downloads-2316#StringLatin it is said,
>> that the spec is a strict subset of unicode (E.1.6), and it is also
>> mentioned in E.1.4, that in UTF-8 all unicode characters can be
>> encoded. Therefore UTF-8 can be used to encode all DIN SPEC 91379
>> characters.

> So the remaining question is whether DIN SPEC 91379 requires an
> implementation to support character U+.  If it does, then PostgreSQL
> is not conformant, because that character is the only one in Unicode
> that we don't support.  If U+ is not required, then PostgreSQL is
> okay.

Hmm ... UTF8 as defined in RFC3629/STD63 [1] does not allow "all unicode
characters to be encoded".  It disallows surrogate pairs (U+D800--U+DFFF)
and code points above U+10.  We follow that spec, so depending on what
DIN 91379 *actually* says, we might have additional reasons not to be in
compliance.  I don't read German unfortunately.

regards, tom lane

[1] http://www.faqs.org/rfcs/rfc3629.html




Re: support for DIN SPEC 91379 encoding

2022-03-27 Thread Bzm@g
U+ is not part of DIN SPEC 91379.

--
Boris


> Am 27.03.2022 um 19:47 schrieb Alvaro Herrera :
> 
> On 2022-Mar-27, Ralf Schuchardt wrote:
> 
>> where did you read, that this DIN SPEC 91379 norm is incompatible with UTF-8?
>> 
>> In the document „String.Latin+ 1.2: eine kommentierte und erweiterte
>> Fassung der DIN SPEC 91379. Inklusive einer umfangreichen Liste häufig
>> gestellter Fragen. Herausgegeben von der Fachgruppe String.Latin“
>> linked here https://www.xoev.de/downloads-2316#StringLatin it is said,
>> that the spec is a strict subset of unicode (E.1.6), and it is also
>> mentioned in E.1.4, that in UTF-8 all unicode characters can be
>> encoded. Therefore UTF-8 can be used to encode all DIN SPEC 91379
>> characters.
> 
> So the remaining question is whether DIN SPEC 91379 requires an
> implementation to support character U+.  If it does, then PostgreSQL
> is not conformant, because that character is the only one in Unicode
> that we don't support.  If U+ is not required, then PostgreSQL is
> okay.
> 
> -- 
> Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
> 
> 





Performance issues on FK Triggers after replacing a primary column

2022-03-27 Thread Per Kaminsky
Hi there,

i recently stumbled upon a performance issue which i can't really understand.
The issue occured when i (roughly) did the following without a commit in 
between:

  *   Replace the PK column of a table A which has a referencing table B - I 
have removed the FK from the referencing tables B and have recreated them 
afterwards
  *   Now following i am working in one of the referencing tables B, updating 
columns. This takes an extremely large amount of time. This means, e.g. 
updating 1000 rows would now need 35-40 seconds.
  *   The "explain" tells, that the Foreign Key trigger in B referencing A 
causes this mishap.
  *   Re-creating the Index in B for the column referencing A does not cause 
any performance gain.
  *   If i again remove the FK to A from B this again shrinks back to some 
milliseconds.

The question is, what does cause the FK trigger to be less performant than 
recreating the FK constraint? If executed on 100k or even 1m rows the operation 
takes hours or even days.

Thank you very much.
Sincerely, Per Kaminsky


Re: Performance issues on FK Triggers after replacing a primary column

2022-03-27 Thread Adrian Klaver

On 3/27/22 09:30, Per Kaminsky wrote:

Hi there,

i recently stumbled upon a performance issue which i can't 
really understand.
The issue occured when i (roughly) did the following without a commit in 
between:


  * Replace the PK column of a table A which has a referencing table B -
I have removed the FK from the referencing tables B and have
recreated them afterwards
  * Now following i am working in one of the referencing tables B,
updating columns. This takes an extremely large amount of time. This
means, e.g. updating 1000 rows would now need 35-40 seconds.
  * The "explain" tells, that the Foreign Key trigger in B referencing A
causes this mishap.


Post the query and the explain.

Also have you run vacuum and/or analyze on the tables involved?


  * Re-creating the Index in B for the column referencing A does not
cause any performance gain.
  * If i again remove the FK to A from B this again shrinks back to some
milliseconds.

The question is, what does cause the FK trigger to be less performant 
than recreating the FK constraint? If executed on 100k or even 1m rows 
the operation takes hours or even days.


Thank you very much.
Sincerely, Per Kaminsky




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: support for DIN SPEC 91379 encoding

2022-03-27 Thread Peter J. Holzer
On 2022-03-27 14:06:25 -0400, Tom Lane wrote:
> Alvaro Herrera  writes:
> > On 2022-Mar-27, Ralf Schuchardt wrote:
> >> linked here https://www.xoev.de/downloads-2316#StringLatin it is said,
> >> that the spec is a strict subset of unicode (E.1.6), and it is also
> >> mentioned in E.1.4, that in UTF-8 all unicode characters can be
> >> encoded. Therefore UTF-8 can be used to encode all DIN SPEC 91379
> >> characters.
> 
> > So the remaining question is whether DIN SPEC 91379 requires an
> > implementation to support character U+.  If it does, then PostgreSQL
> > is not conformant, because that character is the only one in Unicode
> > that we don't support.  If U+ is not required, then PostgreSQL is
> > okay.
> 
> Hmm ... UTF8 as defined in RFC3629/STD63 [1] does not allow "all unicode
> characters to be encoded".  It disallows surrogate pairs (U+D800--U+DFFF)
> and code points above U+10.

From section 2.4 Code Points and Characters of the Unicode Standard,
Version 14.0 - Core Specification:

| In the Unicode Standard, the codespace consists of the integers from 0
| to 10 16, com- prising 1,114,112 code points available for
| assigning the repertoire of abstract characters.

So there are no characters above U+10.

Also,

| Not all assigned code points represent abstract characters; only
| Graphic, Format, Control and Private-use do. Surrogates and
| Noncharacters are assigned code points but are not assigned to
| abstract characters.

So Surrogates aren't characters either.

UTF-8 can indeed be used to encode "all unicode characters".

> We follow that spec, so depending on what DIN 91379 *actually* says,
> we might have additional reasons not to be in compliance.  I don't
> read German unfortunately.

It defines minimal character set that IT systems which process personal
and company names in the EU must accept. Basically Latin, Greek and
Cyrillic letters, digits and some symbols and interpunctation.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature