Invalid byte sequence when importing Oracle BLOB
Hi, The source is an Oracle 12 db with this encoding: NLS_LANG = AMERICAN_AMERICA.AL32UTF8 NLS_NCHAR = AL32UTF8 The destination is an (RDS) Postgresql 12.5 with encoding UTF8, and is being loaded through COPY commands generated by ora2pg. The source table has a BLOB column (I think they are scanned images) which I'm loading into a Postgresql bytea column. Seven times out of about 60M rows, I get this error: Psql:909242: ERROR: invalid byte sequence for encoding "UTF8": 0xed 0xaf 0xbf Is there anything I can change on the Postgresql side to make these records import correctly? -- Angular momentum makes the world go 'round.
Re: BRIN index on timestamptz
Isn't a btree subject to these effects ? So when I update ENUMS for each timestamptz, btree indexes are less susceptible to the effects than BRIN indexes ? Thanks. On Sat, Apr 24, 2021 at 9:05 PM Mohan Radhakrishnan < radhakrishnan.mo...@gmail.com> wrote: > > Why not use a btree index for the >timestamptz column? > There are some capabilities our team lacks. Due to that autovacuum tuning > mechanisms isn't considered at all. It may be in the future. > > I know about basic MVCC though. BRIN was an option as the characteristics > you describe match the requirements. > > 1. Only recent rows are updated. One or two ENUMS, 4-5 states per ENUM > for each timestamptz. > 2.ENUMs are not indexed. Will that help too ? That is probably an > unrelated question. > > Btree may be the default option. > > Thanks. > > > On Saturday, April 24, 2021, Michael Lewis wrote: > >> >> >> On Sat, Apr 24, 2021, 1:27 AM Mohan Radhakrishnan < >> radhakrishnan.mo...@gmail.com> wrote: >> >>> What's your question exactly? If you have confidence that correlation >>> will remain high (insert only table, or occasional cluster/repack with >>> cluster is done), then BRIN can be a good fit. If you do updates and >>> deletes and new tuples (inserts and updates) come in and fill in those gaps >>> left behind in early pages even though timestamp is high, then correlation >>> will go down and brin will no longer be a good fit. >>> >>> Note- timestamp *with* timezone is recommended. >>> >>> The timestamptz isn't deleted or updated. It is only inserted. Another >>> ENUM column will be updated. >>> It looks like I should use brin. We also have other history tables like >>> this. >>> >>> Thanks. >>> >> >> That's not a correct conclusion. Reply all btw. >> >> Updating any value in the row means a new version of the row is inserted >> and old one is marked expired and will be cleaned up by vacuum after no >> transactions might need that row version (tuple). Research a bit about how >> MVCC is implemented in Postgres. >> >> If those updates would only be on recent rows and autovacuum is tuned >> aggressively to keep the maintenance task under control, then the >> correlation may remain high as only recent rows are being updated. If the >> updates go back a long ways, then BRIN is unlikely to be a good fit. Maybe >> it still could be if table fillfactor is lowered a bit and the enum is not >> indexed so you get mostly HOT (heap only tuple) updates, but I wouldn't >> count on it. >> >> Why not use a btree index for the timestamptz column? >> >>
Re: BRIN index on timestamptz
On 2021-04-26 17:23:49 +0530, Mohan Radhakrishnan wrote: > Isn't a btree subject to these effects ? So when I update ENUMS for each > timestamptz, btree indexes are less susceptible > to the effects than BRIN indexes ? A btree index contains one entry for each record which points to that records. If you select a small range of values via a btree index in the worst case you will have one random seek per row. This is not ideal, but doesn't matter much if the number of records is low. A BRIN index contains a minimum and maximum value per range of blocks. In the worst case (each block range contains a totally random sample of values) the minimum for each block range will be near the minimum of the whole table and the maximum of each block range will be near the maximum for the whole table. So when searching, the BRIN index will exclude very few block ranges. So a BRIN index will work best when each block range contains only a small range of indexed values. If you index on a timestamp this will work nicely if you either don't update rows at all after inserting them or only update them for a short time relative to the total time spanned by the table. So if your table contains say records from the last year and records are normally only updated after one or two days after being created that would probably still work quite well. If there is a substantial number of records which is still updated after a year, it probably won't work at all. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Invalid byte sequence when importing Oracle BLOB
On 2021-04-26 06:49:18 -0500, Ron wrote: > The destination is an (RDS) Postgresql 12.5 with encoding UTF8, and is being > loaded through COPY commands generated by ora2pg. > > The source table has a BLOB column (I think they are scanned images) which > I'm loading into a Postgresql bytea column. > > Seven times out of about 60M rows, I get this error: > Psql:909242: ERROR: invalid byte sequence for encoding "UTF8": 0xed 0xaf 0xbf Decoding UTF8 doesn't make sense for a bytea column. How does that data look like in the file generated by ora2pg? hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Invalid byte sequence when importing Oracle BLOB
On 4/26/21 7:32 AM, Peter J. Holzer wrote: On 2021-04-26 06:49:18 -0500, Ron wrote: The destination is an (RDS) Postgresql 12.5 with encoding UTF8, and is being loaded through COPY commands generated by ora2pg. The source table has a BLOB column (I think they are scanned images) which I'm loading into a Postgresql bytea column. Seven times out of about 60M rows, I get this error: Psql:909242: ERROR: invalid byte sequence for encoding "UTF8": 0xed 0xaf 0xbf Decoding UTF8 doesn't make sense for a bytea column. How does that data look like in the file generated by ora2pg? I thought it was weird, too, but COPY has to read text, no? Anyway, here are the first 8 lines (beware line wrapping) of the input file: SET client_encoding TO 'UTF8'; SET synchronous_commit TO off; SET search_path = strans,public; TRUNCATE TABLE mv_response_attachment_old; COPY mv_response_attachment_old (response_attachement_id,binary_data,employer_response_id,attachment_id_code,file_type,attachment_desc,attachment_size,file_name,partition_date,prior_incident_id,part_date) FROM STDIN; 1583201 \\x255044462d312e330d25e2e3cfd30d0a31362030206f... It looks like a bog-standard COPY command, with which I've imported a lot of data. -- Angular momentum makes the world go 'round.
Re: Invalid byte sequence when importing Oracle BLOB
On 2021-04-26 07:45:26 -0500, Ron wrote: > On 4/26/21 7:32 AM, Peter J. Holzer wrote: > > On 2021-04-26 06:49:18 -0500, Ron wrote: > > > The destination is an (RDS) Postgresql 12.5 with encoding UTF8, and is > > > being > > > loaded through COPY commands generated by ora2pg. > > > > > > The source table has a BLOB column (I think they are scanned images) which > > > I'm loading into a Postgresql bytea column. > > > > > > Seven times out of about 60M rows, I get this error: > > > Psql:909242: ERROR: invalid byte sequence for encoding "UTF8": 0xed 0xaf > > > 0xbf > > Decoding UTF8 doesn't make sense for a bytea column. How does that data > > look like in the file generated by ora2pg? > > I thought it was weird, too, but COPY has to read text, no? Yes, but data for a bytea column would normally be encoded in hex or something like that ... > COPY mv_response_attachment_old > (response_attachement_id,binary_data,employer_response_id,attachment_id_code,file_type,attachment_desc,attachment_size,file_name,partition_date,prior_incident_id,part_date) > FROM STDIN; > 1583201 \\x255044462d312e330d25e2e3cfd30d0a31362030206f... ... Yes, like this. There are only hex digits (plus \ and x) in the column, nothing which would require decoding UTF-8. My guess is that the error is actually in the data for another column. I'd try to identify the broken records and check whether they contain some other strange content. hp -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Invalid byte sequence when importing Oracle BLOB
El día lunes, abril 26, 2021 a las 06:49:18a. m. -0500, Ron escribió: > Hi, > > The source is an Oracle 12 db with this encoding: > NLS_LANG = AMERICAN_AMERICA.AL32UTF8 > NLS_NCHAR = AL32UTF8 > > The destination is an (RDS) Postgresql 12.5 with encoding UTF8, and is being > loaded through COPY commands generated by ora2pg. > > The source table has a BLOB column (I think they are scanned images) which > I'm loading into a Postgresql bytea column. > > Seven times out of about 60M rows, I get this error: > Psql:909242: ERROR: invalid byte sequence for encoding "UTF8": 0xed 0xaf 0xbf This error can't be caused by a BLOB or bytea column. Only char or vchar columns can cause (and did cause) such errors in our Oracle/Sybase to PostgreSQL migrations. matthias -- Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045 Public GnuPG key: http://www.unixarea.de/key.pub
Re: impact of version upgrade on fts
Thanks, I'll check it out. *“None of you has faith until he loves for his brother or his neighbor what he loves for himself.”* On Sun, Apr 25, 2021 at 2:32 PM Robert Treat wrote: > On Sun, Apr 25, 2021 at 11:27 AM Adrian Klaver > wrote: > > > > On 4/25/21 5:28 AM, Malik Rumi wrote: > > > Greetings. > > > > > > I am about to do a long overdue upgrade to the latest version, which I > > > believe is 13.2. However, I have full text search in my current install > > > (9.4) and I am wondering if there are any special provisions I need to > > > take to make sure that is not lost in the transition? If this would be > > > true for any other packages, please advise and or point me to the place > > > in the docs where I can read up on this. I looked and did not see > > > anything, which just proves I didn't know what to look for. Thanks. > > > > FTS used to be a separate extension(tsearch2) before version Postgres > > version 8.3. Since then it has been integrated into the core code, so it > > would not be lost. That being said it would be advisable to read the > > release notes for 9.5 --> 13 to see what changed in the core code. > > > > You should also check the release notes / process of any "non-core" > extensions you might be using, for example PostGIS has had a number of > changes and you'll need to upgrade the extension itself to work in the > new version of Postgres. Specifics around that will also depend on how > you instead to run your upgrade process. > > > Robert Treat > https://xzilla.net >
Re: Invalid byte sequence when importing Oracle BLOB
On 4/26/21 7:58 AM, Peter J. Holzer wrote: On 2021-04-26 07:45:26 -0500, Ron wrote: On 4/26/21 7:32 AM, Peter J. Holzer wrote: On 2021-04-26 06:49:18 -0500, Ron wrote: The destination is an (RDS) Postgresql 12.5 with encoding UTF8, and is being loaded through COPY commands generated by ora2pg. The source table has a BLOB column (I think they are scanned images) which I'm loading into a Postgresql bytea column. Seven times out of about 60M rows, I get this error: Psql:909242: ERROR: invalid byte sequence for encoding "UTF8": 0xed 0xaf 0xbf Decoding UTF8 doesn't make sense for a bytea column. How does that data look like in the file generated by ora2pg? I thought it was weird, too, but COPY has to read text, no? Yes, but data for a bytea column would normally be encoded in hex or something like that ... COPY mv_response_attachment_old (response_attachement_id,binary_data,employer_response_id,attachment_id_code,file_type,attachment_desc,attachment_size,file_name,partition_date,prior_incident_id,part_date) FROM STDIN; 1583201 \\x255044462d312e330d25e2e3cfd30d0a31362030206f... ... Yes, like this. There are only hex digits (plus \ and x) in the column, nothing which would require decoding UTF-8. My guess is that the error is actually in the data for another column. I'd try to identify the broken records and check whether they contain some other strange content. That's a good idea. Thanks. -- Angular momentum makes the world go 'round.
Re: Invalid byte sequence when importing Oracle BLOB
On Mon, 2021-04-26 at 06:49 -0500, Ron wrote: > The source is an Oracle 12 db with this encoding: > NLS_LANG = AMERICAN_AMERICA.AL32UTF8 > NLS_NCHAR = AL32UTF8 > > The destination is an (RDS) Postgresql 12.5 with encoding UTF8, and is being > loaded through COPY commands generated by ora2pg. > > The source table has a BLOB column (I think they are scanned images) which > I'm loading into a Postgresql bytea column. > > Seven times out of about 60M rows, I get this error: > Psql:909242: ERROR: invalid byte sequence for encoding "UTF8": 0xed 0xaf 0xbf > > Is there anything I can change on the Postgresql side to make these records > import correctly? The "bytea" column is probably a red herring - this error message should be caused by a "text" or "varchar" or other string data column. It is surprisingly easy to enter currupt strings into Oracle - just set client encoding to the same value as server encoding, and it won't check the integrity of your strings. If that is your problem, you must identify and fix the data in Oracle. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
DB size
Hi! I've looked around, but could only find very old answers to this question, and maybe it changed since then... I'm struggling to identify the cause of the difference in size between the sum of all relations (via pg_total_relation_size) and pg_database_size: SELECT sum(pg_total_relation_size(relid)), pg_size_pretty(sum(pg_total_relation_size(relid))) FROM pg_catalog.pg_stat_all_tables sum |pg_size_pretty| +--+ 518549716992|483 GB| SELECT pg_database_size('mydb'), pg_size_pretty(pg_database_size('mydb')) pg_database_size|pg_size_pretty| +--+ 869150909087|809 GB| There are three databases in the cluster, apart from 'mydb' (one of them is the 'postgres' database). These other two size about 8MB each. We run pg_repack weekly and recently had crashes related to disk running out of space, so my guess is something got 'lost' during repack. What can I do to recover the wasted space? Thanks! Luis R. Weck
Re: DB size
po 26. 4. 2021 v 22:25 odesílatel napsal: > > Hi! > > I've looked around, but could only find very old answers to this question, > and maybe it changed since then... > > I'm struggling to identify the cause of the difference in size between the > sum of all relations (via pg_total_relation_size) and pg_database_size: > > SELECT sum(pg_total_relation_size(relid)), > pg_size_pretty(sum(pg_total_relation_size(relid))) > FROM pg_catalog.pg_stat_all_tables > > > sum |pg_size_pretty| > +--+ > 518549716992|483 GB| > > > SELECT pg_database_size('mydb'), >pg_size_pretty(pg_database_size('mydb')) > > > pg_database_size|pg_size_pretty| > +--+ > 869150909087|809 GB| > > There are three databases in the cluster, apart from 'mydb' (one of them is > the 'postgres' database). These other two size about 8MB each. Do you have any indexes in mydb database? > > We run pg_repack weekly and recently had crashes related to disk running out > of space, so my guess is something got 'lost' during repack. > > What can I do to recover the wasted space? > > Thanks! > > Luis R. Weck > >
Re: DB size
On 2021-Apr-26, luis.robe...@siscobra.com.br wrote: > SELECT sum(pg_total_relation_size(relid)), > pg_size_pretty(sum(pg_total_relation_size(relid))) > FROM pg_catalog.pg_stat_all_tables > > > sum |pg_size_pretty| > +--+ > 518549716992|483 GB| > > > SELECT pg_database_size('mydb'), > pg_size_pretty(pg_database_size('mydb')) > > > pg_database_size|pg_size_pretty| > +--+ > 869150909087|809 GB| > > There are three databases in the cluster, apart from 'mydb' (one of > them is the 'postgres' database). These other two size about 8MB each. I would guess that there are leftover files because of those crashes you mentioned. You can probably look for files in the database subdir in the data directory that do not appear in the pg_class.relfilenode listing for the database. -- Álvaro Herrera39°49'30"S 73°17'W really, I see PHP as like a strange amalgamation of C, Perl, Shell inflex: you know that "amalgam" means "mixture with mercury", more or less, right? i.e., "deadly poison"
Re: DB size
- Mensagem original - > De: "Josef Šimánek" > Para: "luis.roberto" > Cc: "pgsql-general" > Enviadas: Segunda-feira, 26 de abril de 2021 17:40:05 > Assunto: Re: DB size > Do you have any indexes in mydb database? Yes, I do. I believe pg_total_relation_size accounts for these. These are the results for my biggest table: SELECT pg_table_size('my_table'), pg_size_pretty(pg_table_size('my_table')) AS tb_pretty, pg_indexes_size('my_table'), pg_size_pretty(pg_indexes_size('my_table')) AS idx_pretty, pg_total_relation_size('my_table'), pg_size_pretty(pg_total_relation_size('my_table')) AS total_pretty pg_table_size|tb_pretty|pg_indexes_size|idx_pretty|pg_total_relation_size|total_pretty| -+-+---+--+--++ 82016485376|76 GB|88119033856|82 GB | 170135519232|158 GB | Luis R. Weck
Re: DB size
- Mensagem original - > De: "Alvaro Herrera" > Para: "luis.roberto" > Cc: "pgsql-general" > Enviadas: Segunda-feira, 26 de abril de 2021 17:45:34 > Assunto: Re: DB size > I would guess that there are leftover files because of those crashes you > mentioned. You can probably look for files in the database subdir in > the data directory that do not appear in the pg_class.relfilenode > listing for the database. > -- > Álvaro Herrera 39°49'30"S 73°17'W > really, I see PHP as like a strange amalgamation of C, Perl, Shell > inflex: you know that "amalgam" means "mixture with mercury", > more or less, right? > i.e., "deadly poison" Thanks Alvaro! That's what I read in an old thread, back in the 8.3 days... Can you point me in the right direction as to where sould I look and how do I know which files exactly are safe to remove? Luis R. Weck
-1/0 virtualtransaction
Hi - Does anyone have any pointers on what a virtualtransaction of '-1/0' means? I'm using SSI and an example is locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode| granted | fastpath --+--+--+-+---++---+-+---+--++-++-+-- page |16384 | 468238 | 19216 | || | | | | -1/0 | | SIReadLock | t | f This is incredibly hard to search for. I see these for page, tuple and (perhaps a clue?) one relation, and I'm seeing the page and tuple locks accumulate over time. Eventually this results in an "out of shared memory" error. Any help is appreciated. Thanks, Mike
Re: -1/0 virtualtransaction
Mike Beachy writes: > Does anyone have any pointers on what a virtualtransaction of '-1/0' means? > I'm using SSI and an example is > locktype | database | relation | page | tuple | virtualxid | > transactionid | classid | objid | objsubid | virtualtransaction | pid | > mode| granted | fastpath > --+--+--+-+---++---+-+---+--++-++-+-- > page |16384 | 468238 | 19216 | || > | | | | -1/0 | | SIReadLock | > t | f Hmm, that's an invalid VXID, which would ordinarily mean that nothing is holding the lock. There is a passing mention in mvcc.sgml that SIRead locks sometimes need to be held past transaction end, so maybe what you're looking at is such a lock that is no longer associated with a specific transaction. I have to disclaim knowing much of anything about the SSI code, though. regards, tom lane
Re: -1/0 virtualtransaction
On Mon, Apr 26, 2021 at 6:16 PM Tom Lane wrote: > Hmm, that's an invalid VXID, which would ordinarily mean that nothing > is holding the lock. There is a passing mention in mvcc.sgml that > SIRead locks sometimes need to be held past transaction end, so maybe > what you're looking at is such a lock that is no longer associated > with a specific transaction. I have to disclaim knowing much of > anything about the SSI code, though. > Hmm, yeah. I had seen discussion of this "held past transaction end" but didn't know what to make of it. The "Serializable Snapshot Isolation in PostgreSQL" paper ( https://drkp.net/papers/ssi-vldb12.pdf) makes me think this is a reasonable line of inquiry. Thanks, Mike
About to find all foreign tables are being used by sproc and view and function
Hi Respected Team I need to find foreign tables used in function and sproc and view How to find all foreign tables being used by sproc,view,function Thanks & Regards Durgamahesh Manne
Re: -1/0 virtualtransaction
On Mon, 2021-04-26 at 17:45 -0400, Mike Beachy wrote: > Does anyone have any pointers on what a virtualtransaction of '-1/0' means? > > I'm using SSI and an example is > > locktype | database | relation | page | tuple | virtualxid | > transactionid | classid | objid | objsubid | virtualtransaction | pid | > mode| granted | fastpath > --+--+--+-+---++---+-+---+--++-++-+-- > page |16384 | 468238 | 19216 | || > | | | | -1/0 | | SIReadLock | t > | f > > This is incredibly hard to search for. > > I see these for page, tuple and (perhaps a clue?) one relation, and I'm > seeing the page and tuple locks accumulate over time. > Eventually this results in an "out of shared memory" error. Not sure, but do you see prepared transactions in "pg_prepared_xacts"? Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: DB size
Hi, Le lun. 26 avr. 2021 à 22:59, a écrit : > > - Mensagem original - > > De: "Alvaro Herrera" > > Para: "luis.roberto" > > Cc: "pgsql-general" > > Enviadas: Segunda-feira, 26 de abril de 2021 17:45:34 > > Assunto: Re: DB size > > > I would guess that there are leftover files because of those crashes you > > mentioned. You can probably look for files in the database subdir in > > the data directory that do not appear in the pg_class.relfilenode > > listing for the database. > > > -- > > Álvaro Herrera 39°49'30"S 73°17'W > > really, I see PHP as like a strange amalgamation of C, Perl, > Shell > > inflex: you know that "amalgam" means "mixture with mercury", > > more or less, right? > > i.e., "deadly poison" > > > Thanks Alvaro! > > That's what I read in an old thread, back in the 8.3 days... Can you point > me in the right direction as to where sould I look and how do I know which > files exactly are safe to remove? > > pg_orphaned is probably what you need. See https://github.com/bdrouvot/pg_orphaned for details. -- Guillaume.
Re: DB size
On Mon, 2021-04-26 at 16:45 -0400, Alvaro Herrera wrote: > I would guess that there are leftover files because of those crashes you > mentioned. You can probably look for files in the database subdir in > the data directory that do not appear in the pg_class.relfilenode > listing for the database. Not all tables have their "relfilenode" set: SELECT relfilenode FROM pg_class WHERE relname = 'pg_class'; relfilenode ═ 0 (1 row) Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com