from:"Neto pr"

comparison between 2 execution plans

2018-05-05 Thread Neto pr

Dear all

Could you help me understand these two execution plans for the same query
(query 3 benchmark TPCH www.tpc.org/tpch), executed in two different
environments of Postgresql, as described below:
Execution Plan 1:
- https://explain.depesz.com/s/Ughh
- Postgresql version 10.1 (default) with index on l_shipdate (table
lineitem)

Execution Plan 2:
- https://explain.depesz.com/s/7Zb7
- Postgresql version 9.5 (version with source code changed by me) with
index on l_orderkey (table lineitem).

Some doubts
- Difference between GroupAggregate and Finalize GroupAggregate
- because some algorithms show measurements on "Disk" and others on
"Memory" example:
 - External sort Disk: 52784kB
 - quicksort Memory: 47770kB

Because one execution plan was much smaller than the other, considering
that the query is the same and the data are the same.
--
select
l_orderkey,
sum(l_extendedprice * (1 - l_discount)) as revenue,
o_orderdate,
o_shippriority
from
customer,
orders,
lineitem
where
c_mktsegment = 'HOUSEHOLD'
and c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate < date '1995-03-21'
and l_shipdate > date '1995-03-21'
group by
l_orderkey,
o_orderdate,
o_shippriority
order by
revenue desc,
o_orderdate
--

best regards
Neto

Re: comparison between 2 execution plans

2018-05-05 Thread Neto pr

Further information is These plans were generated by the EXPLAIN ANALYZE
command, and the time of plan 1 (Postgresql 10.1 )  was 4.7 minutes and
plan 2 (postgresql 9.5 changed)  was 2.95 minutes.

2018-05-05 6:26 GMT-07:00 Neto pr :

> Dear all
>
> Could you help me understand these two execution plans for the same query
> (query 3 benchmark TPCH www.tpc.org/tpch), executed in two different
> environments of Postgresql, as described below:
> Execution Plan 1:
> - https://explain.depesz.com/s/Ughh
> - Postgresql version 10.1 (default) with index on l_shipdate (table
> lineitem)
>
> Execution Plan 2:
> - https://explain.depesz.com/s/7Zb7
> - Postgresql version 9.5 (version with source code changed by me) with
> index on l_orderkey (table lineitem).
>
> Some doubts
> - Difference between GroupAggregate and Finalize GroupAggregate
> - because some algorithms show measurements on "Disk" and others on
> "Memory" example:
>  - External sort Disk: 52784kB
>  - quicksort Memory: 47770kB
>
> Because one execution plan was much smaller than the other, considering
> that the query is the same and the data are the same.
> --
> select
> l_orderkey,
> sum(l_extendedprice * (1 - l_discount)) as revenue,
> o_orderdate,
> o_shippriority
> from
> customer,
> orders,
> lineitem
> where
> c_mktsegment = 'HOUSEHOLD'
> and c_custkey = o_custkey
> and l_orderkey = o_orderkey
> and o_orderdate < date '1995-03-21'
> and l_shipdate > date '1995-03-21'
> group by
> l_orderkey,
> o_orderdate,
> o_shippriority
> order by
> revenue desc,
> o_orderdate
> --
>
> best regards
> Neto
>

Re: comparison between 2 execution plans

2018-05-05 Thread Neto pr

Dear,


2018-05-05 9:57 GMT-07:00 Adrian Klaver :

> On 05/05/2018 06:26 AM, Neto pr wrote:
>
>> Dear all
>>
>> Could you help me understand these two execution plans for the same query
>> (query 3 benchmark TPCH www.tpc.org/tpch <http://www.tpc.org/tpch>),
>> executed in two different environments of Postgresql, as described below:
>> Execution Plan 1:
>> - https://explain.depesz.com/s/Ughh
>> - Postgresql version 10.1 (default) with index on l_shipdate (table
>> lineitem)
>>
>> Execution Plan 2:
>> - https://explain.depesz.com/s/7Zb7
>> - Postgresql version 9.5 (version with source code changed by me) with
>>
>
> It might help if you explained what 'version with source code changed by
> me' means?
>

Postgresql with modified source code, is that I modified some internal
functions of cost (source code) and parameters in Postgresql.conf so that
it is possible for the DBMS to differentiate cost of read (random and
sequence) and write (random and sequence), this is because reading in SSDs' and
more than 400 times faster than HDD. This is due to academic research that
I am doing.



>
> Also the schema for the table lineitem from both instances might help shed
> light.
>

see schema of the tables below:
https://docs.snowflake.net/manuals/_images/sample-data-tpch-schema.png

I am using 40g scale, in this way the lineitem table has (40 * 6 million)
240 million of the rows.


> Any reason why the index changed between instances?
>
>
>
> index on l_orderkey (table lineitem).
>>
>> Some doubts
>> - Difference between GroupAggregate and Finalize GroupAggregate
>> - because some algorithms show measurements on "Disk" and others on
>> "Memory" example:
>>   - External sort Disk: 52784kB
>>   - quicksort Memory: 47770kB
>>
>> Because one execution plan was much smaller than the other, considering
>> that the query is the same and the data are the same.
>> --
>> select
>>  l_orderkey,
>>  sum(l_extendedprice * (1 - l_discount)) as revenue,
>>  o_orderdate,
>>  o_shippriority
>> from
>>  customer,
>>  orders,
>>  lineitem
>> where
>>  c_mktsegment = 'HOUSEHOLD'
>>  and c_custkey = o_custkey
>>  and l_orderkey = o_orderkey
>>  and o_orderdate < date '1995-03-21'
>>  and l_shipdate > date '1995-03-21'
>> group by
>>  l_orderkey,
>>  o_orderdate,
>>  o_shippriority
>> order by
>>  revenue desc,
>>  o_orderdate
>> --
>>
>> best regards
>> Neto
>>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Regards
Neto

Estimate time without running the query

2018-09-13 Thread Neto pr

Dear all,
Only a doubt.
The Explain  command only estimates the cost of execution of a
query, and does not estimate time for execution.
I would like know if exists  some way to estimate the time, without running
the query?

Best Regards
[]`s Neto

Re: [External] RE: Estimate time without running the query

2018-09-13 Thread Neto pr

Em qui, 13 de set de 2018 às 18:49, Vijaykumar Jain 
escreveu:

> explain analyze would *run* the query and it can be dangerous if it is a
> DML statement like insert/update/delete 😊
>
>
>
> If you still want to go with explain analyze,
>
> You can do
>
>
>
> begin;
>
> explain analyze ;
>
> rollback;
>
>
>

Dear all,

The problem is that using the explain analyze  I have to wait for
the query to execute.
I would like to estimate the time without having to wait for the query
execution.
Does anyone know how to estimate the time without waiting for the query to
be executed?

Best regards
Neto


> thanks,
>
> Vijay
>
>
>
> *From: *Johnes Castro 
> *Date: *Friday, September 14, 2018 at 3:12 AM
> *To: *Neto pr , PostgreSQL General <
> pgsql-gene...@postgresql.org>
> *Subject: *[External] RE: Estimate time without running the query
>
>
>
> Hi netoprbr,
>
>
>
> Use a command explain analyse.
>
>
>
> Best Regards.
>
> Johnes Castro
> --
>
> *De:* Neto pr 
> *Enviado:* quinta-feira, 13 de setembro de 2018 19:38
> *Para:* PostgreSQL General
> *Assunto:* Estimate time without running the query
>
>
>
> Dear all,
> Only a doubt.
> The Explain  command only estimates the cost of execution of a
> query, and does not estimate time for execution.
> I would like know if exists  some way to estimate the time, without
> running the query?
>
> Best Regards
>
> []`s Neto
>

Re: [External] RE: Estimate time without running the query

2018-09-13 Thread Neto pr

Em qui, 13 de set de 2018 às 19:53, David G. Johnston <
david.g.johns...@gmail.com> escreveu:

> On Thu, Sep 13, 2018 at 3:30 PM, Neto pr  wrote:
>
>> The problem is that using the explain analyze  I have to wait for
>> the query to execute.
>> I would like to estimate the time without having to wait for the query
>> execution.
>> Does anyone know how to estimate the time without waiting for the query
>> to be executed?
>>
>
> On the machine in question you have to experiment to obtain data to
> construct a formula to convert cost to time.  Then when using the function
> remember that lots of things can play into individual executions taking
> more time (and sometimes less too I suspect) such as locks, caching,
> physical data locality.
>
> It seems more useful to log actual execution times and look for trends.
> If you are writing a query odds are it needs to be run regardless of how
> efficient it may be - or used in a relative comparison to an alternate
> query.
>
>
Okay, David, but does not it have some SQL statement that returns a time
estimate, without having to execute the query?



> David J.
>
>

query causes connection termination

2017-11-22 Thread Neto pr

Dear all,
when executing a query, it causes the database to close the connection.
See the error reported by the SQL TOOL DBEAVER tool:

-  DBEAVER SQL tool-
An I / O error occurred while sending to the backend.
   java.io.EOFException:
---

I tried to execute the query in PSQL but the same thing happens. The query
is the 19 of the TPC-H Benchmark.

---PSQL Cliente Sql --
tpch40gnorssd=# EXPLAIN (ANALYZE)  select sum(l_extendedprice* (1 -
l_discount)) as revenue
tpch40gnorssd-#  fromlineitem, part

tpch40gnorssd-#  where   (

tpch40gnorssd(#  part.p_partkey = lineitem.l_partkey

tpch40gnorssd(#  and part.p_brand = 'Brand#54'

tpch40gnorssd(#  and part.p_container in ('SM CASE', 'SM
BOX', 'SM PACK', 'SM PKG')
tpch40gnorssd(#  and lineitem.l_quantity >= 4 and
lineitem.l_quantity <= 4 + 10
tpch40gnorssd(#  and part.p_size between 1 and 5

tpch40gnorssd(#  and lineitem.l_shipmode in ('AIR', 'AIR
REG')
tpch40gnorssd(#  and lineitem.l_shipinstruct = 'DELIVER IN
PERSON'
tpch40gnorssd(#  )

tpch40gnorssd-#  or

tpch40gnorssd-#  (

tpch40gnorssd(#  part.p_partkey = lineitem.l_partkey

tpch40gnorssd(#  and part.p_brand = 'Brand#51'

tpch40gnorssd(#  and part.p_container in ('MED BAG', 'MED
BOX', 'MED PKG', 'MED PACK')
tpch40gnorssd(#  and lineitem.l_quantity >= 11 and
lineitem.l_quantity <= 11 + 10
tpch40gnorssd(#  and part.p_size between 1 and 10

tpch40gnorssd(#  and lineitem.l_shipmode in ('AIR', 'AIR
REG')
tpch40gnorssd(#  and lineitem.l_shipinstruct = 'DELIVER IN
PERSON'
tpch40gnorssd(#  )

tpch40gnorssd-#  or

tpch40gnorssd-#  (

tpch40gnorssd(#  part.p_partkey = lineitem.l_partkey

tpch40gnorssd(#  and part.p_brand = 'Brand#21'

tpch40gnorssd(#  and part.p_container in ('LG CASE', 'LG
BOX', 'LG PACK', 'LG PKG')
tpch40gnorssd(#  and lineitem.l_quantity >= 28 and
lineitem.l_quantity <= 28 + 10
tpch40gnorssd(#  and part.p_size between 1 and 15

tpch40gnorssd(#  and lineitem.l_shipmode in ('AIR', 'AIR
REG')

tpch40gnorssd(#  and lineitem.l_shipinstruct = 'DELIVER IN
PERSON'

tpch40gnorssd(#  );



server closed the connection unexpectedly


This probably means the server terminated abnormally


before or while processing the request.


The connection to the server was lost. Attempting reset: Failed.


!>


!>


However, when executing an Explain query, no error is reported.


- EXPLAIN ONLY 

Finalize Aggregate  (cost=280394.81..280394.82 rows=1 width=32)
   ->  Gather  (cost=280394.59..280394.80 rows=2 width=32)
 Workers Planned: 2
 ->  Partial Aggregate  (cost=279394.59..279394.60 rows=1 width=32)
   ->  Nested Loop  (cost=29935.44..279381.95 rows=1685
width=12)
 ->  Parallel Bitmap Heap Scan on part
(cost=29934.87..48103.87 rows=7853 width=30)
   Recheck Cond: (((p_brand = 'Brand#54'::bpchar)
AND (p_size <= 5) AND (p_size >= 1) AND (p_container = ANY ('{"SM CASE","SM
BOX","SM PACK","SM PKG"}'::bpchar[]))) OR ((p_brand =
 'Brand#51'::bpchar) AND (p_size <= 10) AND (p_size >= 1) AND (p_container
= ANY ('{"MED BAG","MED BOX","MED PKG","MED PACK"}'::bpchar[]))) OR
((p_brand = 'Brand#21'::bpchar) AND (p_size <= 15) AND (p_si
ze >= 1) AND (p_container = ANY ('{"LG CASE","LG BOX","LG PACK","LG
PKG"}'::bpchar[]
   ->  BitmapOr  (cost=29934.87..29934.87
rows=18861 width=0)
 ->  BitmapAnd  (cost=9559.76..9559.76
rows=3140 width=0)
   ->  Bitmap Index Scan on
idx_p_brand_p_size  (cost=0.00..508.37 rows=31035 width=0)
 Index Cond: ((p_brand =
'Brand#54'::bpchar) AND (p_size <= 5) AND (p_size >= 1))
   ->  Bitmap Index Scan on
idx_p_containerpart000  (cost=0.00..9041.72 rows=809333 width=0)
 Index Cond: (p_container = ANY
('{"SM CASE","SM BOX","SM PACK","SM PKG"}'::bpchar[]))
 ->  BitmapAnd  (cost=9837.67..9837.67
rows=6022 width=0)
   ->  Bitmap Index Scan on
idx_p_brand_p_size  (cost=0.00..997.27 rows=60947 width=0)
 Index Cond: ((p_brand =
'Brand#51'::bpchar) AND (p_size <= 10) AND (p_size >= 1))
   ->  Bitmap Index Scan on
idx_p_

Re: query causes connection termination

2017-11-22 Thread Neto pr

Only complementing
I use postgresql version 10.
However the postgresql.conf file has standard settings.
My server is a 2.8 GHz Xeon (4 core) and SSDs disc.

2017-11-22 21:12 GMT-03:00 Neto pr :

> Dear all,
> when executing a query, it causes the database to close the connection.
> See the error reported by the SQL TOOL DBEAVER tool:
>
> -  DBEAVER SQL tool-
> An I / O error occurred while sending to the backend.
>java.io.EOFException:
> 
> ---
>
> I tried to execute the query in PSQL but the same thing happens. The query
> is the 19 of the TPC-H Benchmark.
>
> ---PSQL Cliente Sql --
> tpch40gnorssd=# EXPLAIN (ANALYZE)  select sum(l_extendedprice* (1 -
> l_discount)) as revenue
> tpch40gnorssd-#  fromlineitem, part
>
> tpch40gnorssd-#  where   (
>
> tpch40gnorssd(#  part.p_partkey = lineitem.l_partkey
>
> tpch40gnorssd(#  and part.p_brand = 'Brand#54'
>
> tpch40gnorssd(#  and part.p_container in ('SM CASE', 'SM
> BOX', 'SM PACK', 'SM PKG')
> tpch40gnorssd(#  and lineitem.l_quantity >= 4 and
> lineitem.l_quantity <= 4 + 10
> tpch40gnorssd(#  and part.p_size between 1 and 5
>
> tpch40gnorssd(#  and lineitem.l_shipmode in ('AIR', 'AIR
> REG')
> tpch40gnorssd(#  and lineitem.l_shipinstruct = 'DELIVER IN
> PERSON'
> tpch40gnorssd(#  )
>
> tpch40gnorssd-#  or
>
> tpch40gnorssd-#  (
>
> tpch40gnorssd(#  part.p_partkey = lineitem.l_partkey
>
> tpch40gnorssd(#  and part.p_brand = 'Brand#51'
>
> tpch40gnorssd(#  and part.p_container in ('MED BAG', 'MED
> BOX', 'MED PKG', 'MED PACK')
> tpch40gnorssd(#  and lineitem.l_quantity >= 11 and
> lineitem.l_quantity <= 11 + 10
> tpch40gnorssd(#  and part.p_size between 1 and 10
>
> tpch40gnorssd(#  and lineitem.l_shipmode in ('AIR', 'AIR
> REG')
> tpch40gnorssd(#  and lineitem.l_shipinstruct = 'DELIVER IN
> PERSON'
> tpch40gnorssd(#  )
>
> tpch40gnorssd-#  or
>
> tpch40gnorssd-#  (
>
> tpch40gnorssd(#  part.p_partkey = lineitem.l_partkey
>
> tpch40gnorssd(#  and part.p_brand = 'Brand#21'
>
> tpch40gnorssd(#  and part.p_container in ('LG CASE', 'LG
> BOX', 'LG PACK', 'LG PKG')
> tpch40gnorssd(#  and lineitem.l_quantity >= 28 and
> lineitem.l_quantity <= 28 + 10
> tpch40gnorssd(#  and part.p_size between 1 and 15
>
> tpch40gnorssd(#  and lineitem.l_shipmode in ('AIR', 'AIR
> REG')
>
> tpch40gnorssd(#  and lineitem.l_shipinstruct = 'DELIVER IN
> PERSON'
>
> tpch40gnorssd(#  );
>
>
>
> server closed the connection unexpectedly
>
>
> This probably means the server terminated abnormally
>
>
> before or while processing the request.
>
>
> The connection to the server was lost. Attempting reset: Failed.
>
>
> !>
>
>
> !>
> 
>
> However, when executing an Explain query, no error is reported.
>
>
> - EXPLAIN ONLY 
>
> Finalize Aggregate  (cost=280394.81..280394.82 rows=1 width=32)
>->  Gather  (cost=280394.59..280394.80 rows=2 width=32)
>  Workers Planned: 2
>  ->  Partial Aggregate  (cost=279394.59..279394.60 rows=1 width=32)
>->  Nested Loop  (cost=29935.44..279381.95 rows=1685
> width=12)
>  ->  Parallel Bitmap Heap Scan on part
> (cost=29934.87..48103.87 rows=7853 width=30)
>Recheck Cond: (((p_brand = 'Brand#54'::bpchar)
> AND (p_size <= 5) AND (p_size >= 1) AND (p_container = ANY ('{"SM CASE","SM
> BOX","SM PACK","SM PKG"}'::bpchar[]))) OR ((p_brand =
>  'Brand#51'::bpchar) AND (p_size <= 10) AND (p_size >= 1) AND (p_container
> = ANY ('{"MED BAG","MED BOX","MED PKG","MED PACK"}'::bpchar[]))) OR
> ((p_brand = 'Brand#21'::bpchar) AND (p_size <= 15) AND (p_si
> ze >=

Re: query causes connection termination

2017-11-22 Thread Neto pr

Another fact is that when executing the query without the command EXPLAIN
ANALYZE, the result is usually returned after a few minutes.
I do not understand, because when using the EXPLAIN ANALYZE command the
dbms closes the connection.
Anyone have any tips on why this occurs?

2017-11-22 21:19 GMT-03:00 Neto pr :

> Only complementing
> I use postgresql version 10.
> However the postgresql.conf file has standard settings.
> My server is a 2.8 GHz Xeon (4 core) and SSDs disc.
>
> 2017-11-22 21:12 GMT-03:00 Neto pr :
>
>> Dear all,
>> when executing a query, it causes the database to close the connection.
>> See the error reported by the SQL TOOL DBEAVER tool:
>>
>> -  DBEAVER SQL tool-
>> An I / O error occurred while sending to the backend.
>>java.io.EOFException:
>> 
>> ---
>>
>> I tried to execute the query in PSQL but the same thing happens. The
>> query is the 19 of the TPC-H Benchmark.
>>
>> ---PSQL Cliente Sql --
>> tpch40gnorssd=# EXPLAIN (ANALYZE)  select sum(l_extendedprice* (1 -
>> l_discount)) as revenue
>> tpch40gnorssd-#  fromlineitem, part
>>
>> tpch40gnorssd-#  where   (
>>
>> tpch40gnorssd(#  part.p_partkey = lineitem.l_partkey
>>
>> tpch40gnorssd(#  and part.p_brand = 'Brand#54'
>>
>> tpch40gnorssd(#  and part.p_container in ('SM CASE', 'SM
>> BOX', 'SM PACK', 'SM PKG')
>> tpch40gnorssd(#  and lineitem.l_quantity >= 4 and
>> lineitem.l_quantity <= 4 + 10
>> tpch40gnorssd(#  and part.p_size between 1 and 5
>>
>> tpch40gnorssd(#  and lineitem.l_shipmode in ('AIR', 'AIR
>> REG')
>> tpch40gnorssd(#  and lineitem.l_shipinstruct = 'DELIVER
>> IN PERSON'
>> tpch40gnorssd(#  )
>>
>> tpch40gnorssd-#  or
>>
>> tpch40gnorssd-#  (
>>
>> tpch40gnorssd(#  part.p_partkey = lineitem.l_partkey
>>
>> tpch40gnorssd(#  and part.p_brand = 'Brand#51'
>>
>> tpch40gnorssd(#  and part.p_container in ('MED BAG', 'MED
>> BOX', 'MED PKG', 'MED PACK')
>> tpch40gnorssd(#  and lineitem.l_quantity >= 11 and
>> lineitem.l_quantity <= 11 + 10
>> tpch40gnorssd(#  and part.p_size between 1 and 10
>>
>> tpch40gnorssd(#  and lineitem.l_shipmode in ('AIR', 'AIR
>> REG')
>> tpch40gnorssd(#  and lineitem.l_shipinstruct = 'DELIVER
>> IN PERSON'
>> tpch40gnorssd(#  )
>>
>> tpch40gnorssd-#  or
>>
>> tpch40gnorssd-#  (
>>
>> tpch40gnorssd(#  part.p_partkey = lineitem.l_partkey
>>
>> tpch40gnorssd(#  and part.p_brand = 'Brand#21'
>>
>> tpch40gnorssd(#  and part.p_container in ('LG CASE', 'LG
>> BOX', 'LG PACK', 'LG PKG')
>> tpch40gnorssd(#  and lineitem.l_quantity >= 28 and
>> lineitem.l_quantity <= 28 + 10
>> tpch40gnorssd(#  and part.p_size between 1 and 15
>>
>> tpch40gnorssd(#  and lineitem.l_shipmode in ('AIR', 'AIR
>> REG')
>>
>> tpch40gnorssd(#  and lineitem.l_shipinstruct = 'DELIVER
>> IN PERSON'
>>
>> tpch40gnorssd(#  );
>>
>>
>>
>> server closed the connection unexpectedly
>>
>>
>> This probably means the server terminated abnormally
>>
>>
>> before or while processing the request.
>>
>>
>> The connection to the server was lost. Attempting reset: Failed.
>>
>>
>> !>
>>
>>
>> !>
>> 
>>
>> However, when executing an Explain query, no error is reported.
>>
>>
>> - EXPLAIN ONLY 
>>
>> Finalize Aggregate  (cost=280394.81..280394.82 rows=1 width=32)
>>->  Gather  (cost=280394.59..280394.80 rows=2 width=32)
>>  Workers Planned: 2
>>  ->  Partial Aggregate  (cost=279394.59..279394.60 rows=1
>> width=32)
>>-

why SSD is slower than HDD SAS 15K ?

2018-01-15 Thread Neto pr

Hello all,
Someone help me analyze the two execution plans below (Explain ANALYZE
used), is the  query 9 of TPC-H benchmark [1].
I'm using two servers HP Intel Xeon 2.8GHz/4-core - Memory 8GB. O.S.
Debian8, using EXT4 filesystem.

Server 1
- HDD SAS 15 Krpm - 320 GB (Location where O.S. Debian and Postgresql are
installed).

Server 2
- Samsung Evo SSD 500 GB (Location where Postgresql is Installed)
- HDD Sata 7500 Krpm - 1TB (Location where O.S Debian is installed)

My DBMS parameters presents in postgresql.conf is default, but in SSD I
have changed random_page_cost = 1.0.

I do not understand, because running on an HDD SAS a query used half the
time. I explain better, in HDD spends on average 12 minutes the query
execution and on SSD spent 26 minutes.
I think maybe the execution plan is using more write operations, and so the
HDD SAS 15Krpm has been faster.
I checked that the temporary tablespace pg_default is on the SSD in server
2, because when running show temp_tablespaces in psql returns empty, will
be in the default directory, where I installed the DBMS in:
/media/ssd500gb/opt/pgv101norssd/data.

Anyway, I always thought that an SSD would be equal or faster, but in the
case and four more cases we have here, it lost a lot for the HDDs.

Any help in understanding, is welcome

Best Regards
Neto

- Query execution Time on SSD ---
execution 1: 00:23:29
execution 2: 00:28:38
execution 3: 00:27:32
execution 4: 00:27:54
execution 5: 00:27:35
execution 6: 00:26:19
Average: 26min 54 seconds

Query execution Time on HDD SAS 15K
--
execution 1: 00:12:44
execution 2: 00:12:30
execution 3: 00:12:47
execution 4: 00:13:02
execution 5: 00:13:00
execution 6: 00:12:47
Average: 12 minutes 48 seconds

- EXECUTION PLAN (ANALYZE, BUFFERS) on SSD
Storage--

Finalize GroupAggregate  (cost=1588.33..15980046.69 rows=60150
width=66) (actual time=1569793.025..1573969.614 rows=175 loops=1)  Group
Key: nation.n_name, (date_part(_year_::text,
(orders.o_orderdate)::timestamp without time zone))  Buffers: shared
hit=1237677 read=2399403, temp read=1186697 written=1183781  ->  Gather
Merge  (cost=1588.33..15977791.06 rows=120300 width=66) (actual
time=1569767.662..1573968.933 rows=525 loops=1)Workers Planned:
2Workers Launched: 2Buffers: shared hit=1237677
read=2399403, temp read=1186697 written=1183781->  Partial
GroupAggregate  (cost=15821228.31..15962905.44 rows=60150 width=66) (actual
time=1547834.941..1552040.073 rows=175 loops=3)  Group Key:
nation.n_name, (date_part(_year_::text, (orders.o_orderdate)::timestamp
without time zone))  Buffers: shared hit=3522992 read=7371656,
temp read=3551003 written=3542253  ->  Sort
(cost=15821228.31..15838806.37 rows=7031225 width=57) (actual
time=1547819.849..1548887.629 rows=4344197 loops=3)Sort
Key: nation.n_name, (date_part(_year_::text,
(orders.o_orderdate)::timestamp without time zone)) DESC
Sort Method: external merge  Disk: 321648kBBuffers:
shared hit=3522992 read=7371656, temp read=3551003
written=3542253->  Hash Join
(cost=4708859.28..14719466.13 rows=7031225 width=57) (actual
time=1220169.593..1541279.300 rows=4344197
loops=3)  Hash Cond: (lineitem.l_suppkey =
supplier.s_suppkey)  Buffers: shared hit=3522922
read=7371656, temp read=3220661 written=3211373
->  Hash Join  (cost=4683017.71..14434606.65 rows=7071075 width=43) (actual
time=1142575.564..1535092.395 rows=4344197
loops=3)Hash Cond: (lineitem.l_orderkey =
orders.o_orderkey)Buffers: shared
hit=3503999 read=7362903, temp read=3114233
written=3104987->  Hash Join
(cost=1993687.71..11297331.33 rows=7071075 width=47) (actual
time=275104.573..1213552.106 rows=4344197
loops=3)  Hash Cond:
((lineitem.l_suppkey = partsupp.ps_suppkey) AND (lineitem.l_partkey =
partsupp.ps_partkey))  Buffers: shared
hit=1478115 read=6073916, temp read=2369833
written=2366725  ->  Hash Join
(cost=273201.71..9157213.44 rows=7071075 width=45) (actual
time=24569.390..895992.716 rows=4344197
loops=3)Hash Cond:
(lineitem.l_partkey =
part.p_partkey)Buffers: shared
hit=314284 read=5038767, temp read=1742656
written=1742614->  Parallel Seq
Scan on lineitem  (cost=0.00..5861333.20 rows=15120 width=41) (actual
time=0.147..712469.002 rows=80004097
loops=3)  Buffers: shared
hit=482 read=4860800->  Hash
(cost=263921.00..2639

Re: why SSD is slower than HDD SAS 15K ?

2018-01-15 Thread Neto pr

2018-01-15 9:13 GMT-08:00 Vick Khera :

> Try random page cost 1.1. Way back when I started using SSD we had a
> discussion here and came to the conclusion that it should be ever so
> slightly higher than sequential page cost.
>
>
Very good tip, I'm running the query with random_page_cost = 1.1, but
notice that there are no secondary indexes on my database.
The test you were doing is to first run the query without index, and then
create an index to check the performance improvement.
But what I reported that having problem, is that the execution of the query
without index in a SAS HDD is being much faster, than the query (without
index) in the SSD, and I found this very strange, see below:
- Query execution Time on SSD - Average: 26min 54 seconds
- Query execution Time on HDD SAS 15K - Average: 12 minutes 48 seconds


> It is very hard to read your query plans (maybe gmail is wrapping them
> funny or you need to use a fixed font on them or share them from
> https://explain.depesz.com), but they do look substantially different. My
> guess is that with the random page cost = sequential page cost you are
> tricking Pg into using more sequential scans than index searches.
>

The problem is that this plan is saved in the database, I have a Java
application that executes 6 times a query and saves the result of Explain
Analyze to a table in my Database. Upon regaining the execution plan, it
loses the line breaks, unfortunately.  I'm checking how to change the java
application, I sent a question in the java forum because I do not know how
to solve this other problem yet:  My question in Java forum:
https://stackoverflow.com/questions/48267819/save-line-break-in-database-in-text-field

Re: why SSD is slower than HDD SAS 15K ?

2018-01-15 Thread Neto pr

Dear Merlin
2018-01-15 11:16 GMT-08:00 Merlin Moncure :

> On Mon, Jan 15, 2018 at 7:38 AM, Neto pr  wrote:
> > Hello all,
> > Someone help me analyze the two execution plans below (Explain ANALYZE
> > used), is the  query 9 of TPC-H benchmark [1].
> > I'm using two servers HP Intel Xeon 2.8GHz/4-core - Memory 8GB. O.S.
> > Debian8, using EXT4 filesystem.
> >
> > Server 1
> > - HDD SAS 15 Krpm - 320 GB (Location where O.S. Debian and Postgresql are
> > installed).
> >
> > Server 2
> > - Samsung Evo SSD 500 GB (Location where Postgresql is Installed)
> > - HDD Sata 7500 Krpm - 1TB (Location where O.S Debian is installed)
> >
> > My DBMS parameters presents in postgresql.conf is default, but in SSD I
> have
> > changed random_page_cost = 1.0.
> >
> > I do not understand, because running on an HDD SAS a query used half the
> > time. I explain better, in HDD spends on average 12 minutes the query
> > execution and on SSD spent 26 minutes.
> > I think maybe the execution plan is using more write operations, and so
> the
> > HDD SAS 15Krpm has been faster.
> > I checked that the temporary tablespace pg_default is on the SSD in
> server
> > 2, because when running show temp_tablespaces in psql returns empty,
> will be
> > in the default directory, where I installed the DBMS in:
> > /media/ssd500gb/opt/pgv101norssd/data.
> >
> > Anyway, I always thought that an SSD would be equal or faster, but in the
> > case and four more cases we have here, it lost a lot for the HDDs.
>
> Generally for reading data, yes, but you changed the query plan also.
> To get to the bottom of this let's get SSD performance numbers for
> both plans and HDD performance numbers for both plans.  You're trying
> to measure device performance about are probably measuring the
> relative efficiencies of the generated plans.
>
>
Very good tip. I discovered that my SAS HDD drive has a transfer rate of
12Gb/s versus 6Gb/s of the SSD. Because of that reason the difference in
performance occurred.  See below:

SSD: Samsung 500 GB SATA III 6Gb/s - Model: 850 Evo
http://www.samsung.com/semiconductor/minisite/ssd/product/consumer/850evo/

HDD: HPE 300GB 12G SAS Part-Number: 737261-B21
https://h20195.www2.hpe.com/v2/GetPDF.aspx%2Fc04111744.pdf

I intend to do my experiment, between HDD and SSD, abandon the SAS HDD and
use a SATA HDD, to compare with the SATA SSD.
I will use your strategy to put the OS and DBMS on the same disk, when it
is SSD and separate on the HDD.
Best Regards
Neto

> merlin
>

Re: why SSD is slower than HDD SAS 15K ?

2018-01-15 Thread Neto pr

Dear NTPT

2018-01-15 16:54 GMT-08:00 NTPT :

> I bet this is a ssd partition alignment problem there are erase block size
> of 3mb and this should be taken in account, when You partition ssd drive,
> creating a raid and filesystem etc...
>

That is a good observation. I believe the block size was set by default
when I formatted the drive. I use Debian 64bits version 8, and all disks
are with ext4 file system. What size block do you suggest for SSD and HDD?

Neto


> -- Původní e-mail --
> Od: Merlin Moncure 
> Komu: Neto pr 
> Datum: 15. 1. 2018 20:17:17
> Předmět: Re: why SSD is slower than HDD SAS 15K ?
>
> On Mon, Jan 15, 2018 at 7:38 AM, Neto pr  wrote:
> > Hello all,
> > Someone help me analyze the two execution plans below (Explain ANALYZE
> > used), is the query 9 of TPC-H benchmark [1].
> > I'm using two servers HP Intel Xeon 2.8GHz/4-core - Memory 8GB. O.S.
> > Debian8, using EXT4 filesystem.
> >
> > Server 1
> > - HDD SAS 15 Krpm - 320 GB (Location where O.S. Debian and Postgresql
> are
> > installed).
> >
> > Server 2
> > - Samsung Evo SSD 500 GB (Location where Postgresql is Installed)
> > - HDD Sata 7500 Krpm - 1TB (Location where O.S Debian is installed)
> >
> > My DBMS parameters presents in postgresql.conf is default, but in SSD I
> have
> > changed random_page_cost = 1.0.
> >
> > I do not understand, because running on an HDD SAS a query used half the
> > time. I explain better, in HDD spends on average 12 minutes the query
> > execution and on SSD spent 26 minutes.
> > I think maybe the execution plan is using more write operations, and so
> the
> > HDD SAS 15Krpm has been faster.
> > I checked that the temporary tablespace pg_default is on the SSD in
> server
> > 2, because when running show temp_tablespaces in psql returns empty,
> will be
> > in the default directory, where I installed the DBMS in:
> > /media/ssd500gb/opt/pgv101norssd/data.
> >
> > Anyway, I always thought that an SSD would be equal or faster, but in
> the
> > case and four more cases we have here, it lost a lot for the HDDs.
>
> Generally for reading data, yes, but you changed the query plan also.
> To get to the bottom of this let's get SSD performance numbers for
> both plans and HDD performance numbers for both plans. You're trying
> to measure device performance about are probably measuring the
> relative efficiencies of the generated plans.
>
> merlin
>
>

SSD filesystem aligned to DBMS

2018-01-16 Thread Neto pr

Hi all

Sorry, but I'm not sure that this doubt is appropriate for this list, but I
do need to prepare the file system of an SSD disk in a way that pointed me
to, which would be a way optimized SSD
 to work. I have a disk: SSD: Samsung 500 GB SATA III 6Gb/s - Model: 850 Evo
http://www.samsung.com/semiconductor/minisite/ssd/product/consumer/850evo/

One person on the list me said that should be partition aligned to 3072 not
default 2048, to start on erase block bounduary. And fs block should be 8kb.

Can you give me a hint of what program I could do this. I have already used
fdisk but I do not know how to do this in Fdisk. I used Linux Debian
8(Jessie) 64b with Ext4 File system.

If you prefer, just reply to me, since the subject would not be about
postgresql itself. netopr...@gmail.com

Best Regards
Neto

Re: SSD filesystem aligned to DBMS

2018-01-16 Thread Neto pr

2018-01-16 8:50 GMT-08:00 Michael Loftis :

>
> On Tue, Jan 16, 2018 at 08:02 Scott Marlowe 
> wrote:
>
>> On Tue, Jan 16, 2018 at 7:47 AM, Neto pr  wrote:
>> > Hi all
>> >
>> > Sorry, but I'm not sure that this doubt is appropriate for this list,
>> but I
>> > do need to prepare the file system of an SSD disk in a way that pointed
>> me
>> > to, which would be a way optimized SSD
>> >  to work. I have a disk: SSD: Samsung 500 GB SATA III 6Gb/s - Model:
>> 850 Evo
>> > http://www.samsung.com/semiconductor/minisite/ssd/
>> product/consumer/850evo/
>> >
>> > One person on the list me said that should be partition aligned to 3072
>> not
>> > default 2048, to start on erase block bounduary. And fs block should be
>> 8kb.
>> >
>> > Can you give me a hint of what program I could do this. I have already
>> used
>> > fdisk but I do not know how to do this in Fdisk. I used Linux Debian
>> > 8(Jessie) 64b with Ext4 File system.
>>
>> fdisk is pretty old and can't handle larger disks. You can get a fair
>> bit of control over the process with parted, but it takes some getting
>> used to. As far as I know, linux's ext4 has a maximum block size of
>> 4k. I can't imagine alignment matters to SSDs and I would take any
>> advice as such with a large grain of salt and then if I had questions
>> about performance I'd test it to see. I'm willing to bet a couple
>> bucks it makes ZERO difference.
>
>
> Alignment definitely makes a difference for writes. It can also make a
> difference for random reads as well since the underlying read may not line
> up to the hardware add in a read ahead (at drive or OS Level) and you’re
> reading far more data in the drive than the OS asks for.
>
> Stupidly a lot of this isn’t published by a lot of SSD manufacturers, but
> through benchmarks it shows up.
>
> Another potential difference here with SAS vs SATA is the maximum queue
> depth supported by the protocol and drive.
>
> SSD drives also do internal housekeeping tasks for wear leveling on
> writing.
>
> I’ve seen SSD drives benchmark with 80-90MB sequential read or write,
> change the alignment, and you’ll get 400+ on the same drive with sequential
> reads (changing nothing else)
>
> A specific example https://www.servethehome.com/ssd-
> alignment-quickly-benchmark-ssd/
>
>

Hi all
Searching I checked that In past, proper alignment required manual
calculation and intervention when partitioning. Many of the common
partition tools now handle partition alignment automatically.
For sample,  on an already partitioned disk, you can use *parted* (
https://wiki.archlinux.org/index.php/GNU_Parted#Check_alignment  )
to verify the alignment of a partition on a device in LInux S.O.  This
example I ran i my Samsung SSD 500GB 850 Evo, see below:

---BEGIN PARTED TOOL
-
root@hp2ml110deb:parted /dev/sdb
(parted) print list
Model: ATA Samsung SSD 850 (scsi)
Disk /dev/sdb: 500GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  EndSize   File system  Flags
 1  0.00B  500GB  500GB  ext4

Model: ATA MB1000GCWCV (scsi)
Disk /dev/sda: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End SizeFile system Name  Flags
 1  1049kB  538MB   537MB   fat32 boot, esp
 2  538MB   992GB   991GB   ext4
 3  992GB   1000GB  8319MB  linux-swap(v1)

(parted) select /dev/sdb
Using /dev/sdb
(parted) align-check
alignment type(min/opt)  [optimal]/minimal? opt
Partition number? 1
1 aligned
(parted)
 END
--

Regards
Neto





>
>>
>> >
>> > If you prefer, just reply to me, since the subject would not be about
>> > postgresql itself. netopr...@gmail.com
>>
>> No this affects everybody who uses SSDs so let's keep it on list if we
>> can.
>>
>> --
>
> "Genius might be described as a supreme capacity for getting its possessors
> into trouble of all kinds."
> -- Samuel Butler
>

How to measure query time - with warm up and cached data

2018-01-21 Thread Neto pr

Hi all,
I need to know the actual execution time of a query, but considering that
the data is already cached. I also need to make sure that cached data from
other queries is cleared.
I believe that in order to know the real time of a query it will be
necessary to "warm up" the data to be inserted in cache.

Below are the steps suggested by a DBA for me:

Step 1- run ANALYZE on all tables involved before the test;
Step 2- restart the DBMS (to clear the DBMS cache);
Step 3- erase the S.O. cache;
Step 4- execute at least 5 times the same query.

After the actual execution time of the query, it would have to take the
time of the query that is in the "median" among all.

Example:

Execution 1: 07m 58s
Execution 2: 14m 51s
Execution 3: 17m 59s
Execution 4: 17m 55s
Execution 5: 17m 07s

In this case to calculate the median, you must first order each execution
by its time:
Execution 1: 07m 58s
Execution 2: 14m 51s
Execution 5: 17m 07s
Execution 4: 17m 55s
Execution 3: 17m 59s

In this example the median would be execution 5 (17m 07s). Could someone
tell me if this is a good strategy ?
Due to being a scientific work, if anyone has a reference of any article or
book on this subject, it would be very useful.

Best Regards
Neto

Re: How to measure query time - with warm up and cached data

2018-01-21 Thread Neto pr

2018-01-21 13:53 GMT-08:00 Peter J. Holzer :

> On 2018-01-21 12:45:54 -0800, Neto pr wrote:
> > I need to know the actual execution time of a query, but considering
> that the
> > data is already cached. I also need to make sure that cached data from
> other
> > queries is cleared.
> > I believe that in order to know the real time of a query it will be
> necessary
> > to "warm up" the data to be inserted in cache.
> >
> > Below are the steps suggested by a DBA for me:
> >
> > Step 1- run ANALYZE on all tables involved before the test;
> > Step 2- restart the DBMS (to clear the DBMS cache);
> > Step 3- erase the S.O. cache;
>
> Did you mean "OS cache" (operating system cache)?
>
>
Yes, Operating System cache...  S.O. = Sistema Operacional in portuguese,
it was a translation error!!

To restart the DBMS and clear the cache of O.S. I execute this commands in
linux Debian8.

/etc/init.d/pgsql stop
sync

echo "clear cache !!"

echo 3 > /proc/sys/vm/drop_caches
/etc/init.d/pgsql start


> Step 4- execute at least 5 times the same query.
> >
> > After the actual execution time of the query, it would have to take the
> time of
> > the query that is in the "median" among all.
>
> If you do this, clearing the caches before the tests will probably have
> little
> effekt. The first query will fill the cache with the data needed for
> your query (possibly evicting other data) and the next 4 will work on
> the cached data.


Yes, I believe that the first execution can be discarded, because the data
is accommodating in the cache ... the ideal is considered only the others
after the first one.


> Whether the cache was empty or full before the first
> query will make little difference to the median, because the first query
> will almost certainly be discarded as an outlier.
>
> Flushing out caches is very useful if you want to measure performance
> without caches (e.g. if you want to determine what the performance
> impact of a server reboot is).
>
>
> > Example:
> >
> > Execution 1: 07m 58s
> > Execution 2: 14m 51s
> > Execution 3: 17m 59s
> > Execution 4: 17m 55s
> > Execution 5: 17m 07s
>
> Are these real measurements or did you make them up? They look weird.
> Normally the first run is by far the slowest, then the others are very
> similar, sometimes with a slight improvement (especially between the 2nd
> and 3rd). But in your case it is just the opposite.
>


Yes, they are real information from TPC-H query 9.
I can not understand why in several tests I have done here, the first
execution is executed faster, even without indexes, and theoretically
without cache.

If someone wants to see the execution plans and other information the
worksheet with results is at the following link:
https://sites.google.com/site/eletrolareshop/repository/Result_80gb-SSD-10_exec_v4.ods

I thought it was because my CPU was working with variance .. but I
configured the BIOS it as " OS Control"  and in " Performance" CPU mode in
Linux Debian8. See below:
---

user1@hp110deb8:~/Desktop$ cpufreq-info | grep 'current CPU fr'
  current CPU frequency is 2.80 GHz.
  current CPU frequency is 2.80 GHz.
  current CPU frequency is 2.80 GHz.
  current CPU frequency is 2.80 GHz.
--

Apparently the processor is not working variably now.
Any idea why the first execution can be faster in many cases?

Best Regards
Neto






> > [cleardot]
>
> Sending Webbugs to a mailinglist?
>
> hp
>
> --
>_  | Peter J. Holzer| we build much bigger, better disasters now
> |_|_) || because we have much more sophisticated
> | |   | h...@hjp.at | management tools.
> __/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
>

comparison between 2 execution plans

Re: comparison between 2 execution plans

Re: comparison between 2 execution plans

Estimate time without running the query

Re: [External] RE: Estimate time without running the query

Re: [External] RE: Estimate time without running the query

query causes connection termination

Re: query causes connection termination

Re: query causes connection termination

why SSD is slower than HDD SAS 15K ?

Re: why SSD is slower than HDD SAS 15K ?

Re: why SSD is slower than HDD SAS 15K ?

Re: why SSD is slower than HDD SAS 15K ?

SSD filesystem aligned to DBMS

Re: SSD filesystem aligned to DBMS

How to measure query time - with warm up and cached data

Re: How to measure query time - with warm up and cached data

17 matches

Site Navigation

Mail list logo

Footer information