Working around, or with, bitmap heap scan?

2018-10-31 Thread James A. Robinson
Hello,

I'm newly exposed to a Postgres 9.4 database system, and am
trying to understand how I might optimize a query that is taking
a long time to return.

What I'm observing is an uncached query that takes much much
longer to complete, sometimes minutes longer, when
enable_bitmapscan is true.  Even after the query result is
cached, it seems to consistently be 300 ms slower when
enable_bitmapscan is true (so taking 900 ms to return instead of
600 ms).

This is on an AWS RDS Postgres 9.4 instance, backed by SSD, with
work_mem set to 1 gb (I'm not sure if that is a default or if
someone set it to that value on purpose).  Autovacuum is on and
an ANALYSIS has been run against the db.

I'm hoping for some guidance from experts here on whether or not
I might be able to change something to get the query plan
selection to use the faster index scan that appears to be running
when the enable_bitmapscan is false.  Or perhaps I should be
trying to craft a new index to reduce the chance that it needs
the bitmap?

I tried seeing whether or not reducing the random_page_cost from
the default 1.1 to 1.0 effected any change, and it does not.

Here's the part of the EXPLAIN that appears to change when
enable_bitmapscan is true:

->  Hash Left Join  (cost=24696.39..594784.63 rows=11732 width=281)
Hash Cond: ((cr.cs_id)::text = (cm.raw_cs_id)::text)
->  Hash Join  (cost=24680.76..594368.09 rows=11732 width=135)
  Hash Cond: (cr.resource_id = r_1.id)
  ->  Bitmap Heap Scan on cs_resource cr  (cost=6999.99..569213.13
rows=588549 width=27)
  Recheck Cond: ((cs_id)::text = ANY
('{example.org,_zero_row_}'::text[]))
  ->  Bitmap Index Scan on cs_resource_by_cs_idx
(cost=0.00..6852.86 rows=588549 width=0)
Index Cond: ((cs_id)::text = ANY
('{example.org,_zero_row_}'::text[]))
  ->  Hash  (cost=17576.61..17576.61 rows=8333 width=124)
  ->  Hash Right Join  (cost=4132.63..17576.61 rows=8333 width=124)
Hash Cond: (pr.id = r_1.id)
->  Seq Scan on pub_resource pr  (cost=0.00..8135.29
rows=418029 width=69)
->  Hash  (cost=4028.47..4028.47 rows=8333 width=63)
->  Index Scan using "idx_1" on resource r_1
(cost=0.42..4028.47 rows=8333 width=63)
  Index Cond: ((pl_id = 6) AND ((data_type_id)::text =
'DATABASE'::text))
->  Hash  (cost=12.50..12.50 rows=250 width=292)
  ->  Seq Scan on cs_mappings cm  (cost=0.00..12.50 rows=250 width=292)

compared to when enable_bitmapscan is false:

->  Hash Left Join  (cost=17509.48..648717.08 rows=11732 width=281)
Hash Cond: (r_1.id = pr.id)
->  Hash Left Join  (cost=4148.83..635121.79 rows=11732 width=228)
  Hash Cond: ((cr.cs_id)::text = (cm.raw_cs_id)::text)
  ->  Hash Join  (cost=4133.20..634705.25 rows=11732 width=82)
  Hash Cond: (cr.resource_id = r_1.id)
  ->  Index Scan using "idx_2" on cs_resource cr
(cost=0.57..623098.44 rows=588549 width=27)
Index Cond: ((cs_id)::text = ANY
('{example.org,_zero_row_}'::text[]))
  ->  Hash  (cost=4028.47..4028.47 rows=8333 width=63)
->  Index Scan using "idx_1" on resource r_1
(cost=0.42..4028.47 rows=8333 width=63)
Index Cond: ((pl_id = 6) AND ((data_type_id)::text =
'DATABASE'::text))
  ->  Hash  (cost=12.50..12.50 rows=250 width=292)
  ->  Seq Scan on cs_mappings cm  (cost=0.00..12.50 rows=250 width=292)
->  Hash  (cost=8135.29..8135.29 rows=418029 width=69)
  ->  Seq Scan on pub_resource pr  (cost=0.00..8135.29 rows=418029 width=69)

I'm not sure if the following details will be important to know,
but opting for more detail, here's an example of the query (which
returns 183 rows):

> EXPLAIN SELECT
data_type_id, title,
pub_name,
pl_name,
pl_id,
rpt_month,
cs_dsp_name,
cs_id,
grp_id ,
zpe,
sect_type,
sr_fed,
sr_reg,
r_v,
r_x_c,
prop_id
FROM report_view
WHERE pl_id IN (6)
AND data_type_id = 'DATABASE'
AND rpt_month between '2018-06-01' and '2018-06-30'
AND cs_id IN ( 'example.org' , '_zero_row_')
ORDER BY data_type_id ASC, CAST(title AS VARCHAR(256)) ASC, rpt_month ASC;

The underlying report_view that backs this query is... well, I'm
honestly having a very hard time wrapping my head around it (w/
original developer long gone).

I'll also note the tables it hits are massive, e.g., 265 gb for the
cs_resource_event table and 65 gb for the cs_resource table.

> \d+ report_view
   View "public.report_view"
Column |   Type   | Modifiers |
Storage  | Description
---+--+---+--+-
 data_type_id  | character varying(32)|   | extended |
 title | text |   | extended |
 zpe   | integer  |   | plain|
 pub_name  | character varying|   | extended |
 pl_id | bigint 

Re: Copy data from DB2 (Linux) to PG

2018-11-01 Thread James A. Robinson
On Thu, Nov 1, 2018 at 10:28 AM Ravi Krishna  wrote:
>
> I have a project to develop a script/tool to copy data from DB2 to PG.  The 
> approach I am thinking is
>
> 1. Export data from db2 in a text file, with, say pipe as delimiter.
> 2. Load the data from the text file to PG using COPY command.

I've never used it, but there is this in case it's helpful:
https://github.com/dalibo/db2topg/

> In order to make it faster I can parallelize export and load with upto X 
> number of tables concurrently.
>
> Is there a tool in which I can avoid first exporting and then loading.  I 
> think one way I can do it is by
> using pipes by which I can export from db2 to a pipe and simultaneously load 
> it to PG using COPY.
>
> Any other tool for this job?

Is the DB2 side static or being actively updated?



Re: Copy data from DB2 (Linux) to PG

2018-11-01 Thread James A. Robinson
On Thu, Nov 1, 2018 at 10:50 AM Ravi Krishna  wrote:
> [...] What I need is a constant refresh.
> We plan to use it daily to replicate data from db2 to pg.

Perhaps you've already considered and discarded the idea, but your use
case made me think back to when I was looking at AWS SCT as a way to
migrate ourselves off a legacy DB into MySQL.  It looks as though it
has support for converting from DB2 to Postgres as well, and if I
recall correctly it had facilities for ongoing replication.

https://docs.aws.amazon.com/SchemaConversionTool/latest/userguide/CHAP_Source.DB2LUW.html



Re: Working around, or with, bitmap heap scan?

2018-11-02 Thread James A. Robinson
Thank you.  Do you need the entire output of 'EXPLAIN (ANALYZE,
BUFFERS) ...', or will just the sub-section, as I emailed for the
plain 'EXPLAIN ...' I initially e-mailed. do?



Re: Working around, or with, bitmap heap scan?

2018-11-02 Thread James A. Robinson
Well, so as to not spam the list here's a link to a folder with copies
of the output:

https://drive.google.com/open?id=1lSlx7UMUMNgRft2B3Rq2zc4WIRJLLOqU
On Fri, Nov 2, 2018 at 4:12 AM James A. Robinson  wrote:
>
> Thank you.  Do you need the entire output of 'EXPLAIN (ANALYZE,
> BUFFERS) ...', or will just the sub-section, as I emailed for the
> plain 'EXPLAIN ...' I initially e-mailed. do?



Re: Working around, or with, bitmap heap scan?

2018-11-02 Thread James A. Robinson
On Fri, Nov 2, 2018 at 6:21 AM Laurenz Albe  wrote:
> I have no idea how to improve that, sorry.

Heh, thank you for looking.  From your response I've got a vision of a
medical drama where the doctor looks over some test results and sadly
informs the patient "I'm sorry, there's nothing more we can do." :)