Re: Why HDD performance is better than SSD in this case

2018-07-18 Thread Fabio Pardi
Hi Neto,

RAID 0 to store production data should never be used. Never a good idea, in my 
opinion.

Simple reason is that when you lose one disk, you lose everything.

If your goal is to bench the disk, go for single disk.

If you want to be closer to a production setup, go for RAID 10, or pick a RAID 
setup close to what your needs and capabilities are (more reads? more writes? 
SSD? HDD? cache? ...? )

If you only have 2 disks, your obliged (redundant) choice is RAID 1.

regards,

fabio pardi


On 18/07/18 03:24, Neto pr wrote:
>
>> As side note: why to run a test on a setup you can never use on production?
>>
>> regards,
>>
>> fabio pardi
>>
>
> Can you just explain why you said it below?
>
> "As side note: why to run a test on a setup you can never use on production?"
>
> You think that a RAID ZERO configuration for a DBMS is little used?
> Which one do you think would be good? I accept suggestions because I
> am in the middle of a work for my
> research of the postgraduate course and I can change the environment
> to something that is more useful and really used in real production
> environments.
>
> Best Regards
> []`s Neto




Re: Why HDD performance is better than SSD in this case

2018-07-18 Thread Nicolas Charles

Le 18/07/2018 à 03:16, Neto pr a écrit :

2018-07-17 22:13 GMT-03:00 Neto pr :

2018-07-17 20:04 GMT-03:00 Mark Kirkwood :

Ok, so dropping the cache is good.

How are you ensuring that you have one test setup on the HDDs and one on the
SSDs? i.e do you have 2 postgres instances? or are you using one instance
with tablespaces to locate the relevant tables? If the 2nd case then you
will get pollution of shared_buffers if you don't restart between the HHD
and SSD tests. If you have 2 instances then you need to carefully check the
parameters are set the same (and probably shut the HDD instance down when
testing the SSD etc).


Dear  Mark
To ensure that the test is honest and has the same configuration the
O.S. and also DBMS, my O.S. is installed on the SSD and DBMS as well.
I have an instance only of DBMS and two database.
- a database called tpch40gnorhdd with tablespace on the HDD disk.
- a database called tpch40gnorssd with tablespace on the SSD disk.
See below:

postgres=# \l
 List of databases
  Name  |  Owner   | Encoding |   Collate   |Ctype|
Access privileges
---+--+--+-+-+---
  postgres  | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
  template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
=c/postgres  +
|  |  | | |
postgres=CTc/postgres
  template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
=c/postgres  +
|  |  | | |
postgres=CTc/postgres
  tpch40gnorhdd | user1| UTF8 | en_US.UTF-8 | en_US.UTF-8 |
  tpch40gnorssd | user1| UTF8 | en_US.UTF-8 | en_US.UTF-8 |
(5 rows)

postgres=#

After 7 query execution in a database tpch40gnorhdd I restart the DBMS
(/etc/init.d/pg101norssd restart and drop cache of the O.S.) and go to
execution test with the database tpch40gnorssd.
You think in this case there is pollution of shared_buffers?
Why do you think having O.S. on SSD is bad? Do you could explain better?

Best regards
[]`s Neto


+1 information about EVO SSD Samsung:

  Model: 850 Evo 500 GB SATA III 6Gb/s -
http://www.samsung.com/semiconductor/minisite/ssd/product/consumer/850evo/
As stated on his ML on january, Samsung 850 Evo is not a particularly 
fast SSD - especially it's not really consistent in term of performance 
( see https://www.anandtech.com/show/8747/samsung-ssd-850-evo-review/5 
and https://www.anandtech.com/bench/product/1913 ). This is not a 
product for professional usage, and you should not expect great 
performance from it - as reported by these benchmark, you can have a 
34ms latency in very intensive usage:
ATSB - The Destroyer (99th Percentile Write Latency)99th Percentile 
Latency in Microseconds - Lower is Better *34923


*Even average write latency of the Samsung 850 Evo is 3,3 ms in 
intensive workload


Why are you using this type of SSD for your benchmark ? What do you plan 
to achieve ?





I can see a couple of things in your setup that might pessimize the SDD
case:
- you have OS on the SSD - if you tests make the system swap then this will
wreck the SSD result
- you have RAID 0 SSD...some of the cheaper ones slow down when you do this.
maybe test with a single SSD

regards
Mark

On 18/07/18 01:04, Neto pr wrote (note snippage):


(echo 3> / proc / sys / vm / drop_caches;

discs:
- 2 units of Samsung Evo SSD 500 GB (mounted on ZERO RAID)
- 2 SATA 7500 Krpm HDD units - 1TB (mounted on ZERO RAID)

- The Operating System and the Postgresql DBMS are installed on the SSD
disk.






Re: Why HDD performance is better than SSD in this case

2018-07-18 Thread Mark Kirkwood
Ok, so you are using 1 instance and tablespaces. Also I see you are 
restarting the instance between HDD and SSD tests, so all good there.


The point I made about having the OS on the SSD's means that if these 
tests make your system swap, and your swap device is on the SSDs (which 
is probably is by default), then swap activity will compete with db 
access activity for IOPS on your SSDs and spoil the results of your test 
(i.e slow down your SSDs).


You can check this using top, sar or iostat to see *if* you are swapping 
during the tests.


Ideally you would design your setup to use 3 separate devices:

- one device (or array) for os, swap, tmp etc

- one device (HDD array) for you 'HDD' tablespace

- one device (SDD array) for your 'SDD' tablespace

regards

Mark


On 18/07/18 13:13, Neto pr wrote:


Dear  Mark
To ensure that the test is honest and has the same configuration the
O.S. and also DBMS, my O.S. is installed on the SSD and DBMS as well.
I have an instance only of DBMS and two database.
- a database called tpch40gnorhdd with tablespace on the HDD disk.
- a database called tpch40gnorssd with tablespace on the SSD disk.
See below:

postgres=# \l
 List of databases
  Name  |  Owner   | Encoding |   Collate   |Ctype|
Access privileges
---+--+--+-+-+---
  postgres  | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
  template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
=c/postgres  +
|  |  | | |
postgres=CTc/postgres
  template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
=c/postgres  +
|  |  | | |
postgres=CTc/postgres
  tpch40gnorhdd | user1| UTF8 | en_US.UTF-8 | en_US.UTF-8 |
  tpch40gnorssd | user1| UTF8 | en_US.UTF-8 | en_US.UTF-8 |
(5 rows)

postgres=#

After 7 query execution in a database tpch40gnorhdd I restart the DBMS
(/etc/init.d/pg101norssd restart and drop cache of the O.S.) and go to
execution test with the database tpch40gnorssd.
You think in this case there is pollution of shared_buffers?
Why do you think having O.S. on SSD is bad? Do you could explain better?







Re: Why HDD performance is better than SSD in this case

2018-07-18 Thread George Neuner

On Wed, 18 Jul 2018 09:46:32 +0200, Fabio Pardi 
wrote:

RAID 0 to store production data should never be used. Never a good 
idea, in my opinion.


RAID 0  by itself  should never be used.  Combined with other RAID 
levels, it can boost performance without sacrificing reliability.

https://en.wikipedia.org/wiki/Nested_RAID_levels


Personally, I don't like RAID 0 + ? schemes because they use too many 
disks (with associated reliability issues).  The required performance 
usually can be achieved in other ways.  But YMMV.

George



Re: Why HDD performance is better than SSD in this case

2018-07-18 Thread Imre Samu
>Model: 850 Evo 500 GB SATA III 6Gb/s -

please check the SSD *"DRIVE HEALTH STATUS"* and the* "S.M.A.R.T values of
specified disk" * 
for example -  with the "smartctl" tool  ( https://www.smartmontools.org/
)  ( -x "Show all information for device" )
Expected  output with "Samsung SSD 850 EVO 500GB"
https://superuser.com/questions/1169810/smart-data-of-a-new-ssd

Regards,
  Imre





Neto pr  ezt írta (időpont: 2018. júl. 18., Sze, 3:17):

> 2018-07-17 22:13 GMT-03:00 Neto pr :
> > 2018-07-17 20:04 GMT-03:00 Mark Kirkwood  >:
> >> Ok, so dropping the cache is good.
> >>
> >> How are you ensuring that you have one test setup on the HDDs and one
> on the
> >> SSDs? i.e do you have 2 postgres instances? or are you using one
> instance
> >> with tablespaces to locate the relevant tables? If the 2nd case then you
> >> will get pollution of shared_buffers if you don't restart between the
> HHD
> >> and SSD tests. If you have 2 instances then you need to carefully check
> the
> >> parameters are set the same (and probably shut the HDD instance down
> when
> >> testing the SSD etc).
> >>
> > Dear  Mark
> > To ensure that the test is honest and has the same configuration the
> > O.S. and also DBMS, my O.S. is installed on the SSD and DBMS as well.
> > I have an instance only of DBMS and two database.
> > - a database called tpch40gnorhdd with tablespace on the HDD disk.
> > - a database called tpch40gnorssd with tablespace on the SSD disk.
> > See below:
> >
> > postgres=# \l
> > List of databases
> >  Name  |  Owner   | Encoding |   Collate   |Ctype|
> > Access privileges
> >
> ---+--+--+-+-+---
> >  postgres  | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
> >  template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
> > =c/postgres  +
> >|  |  | | |
> > postgres=CTc/postgres
> >  template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
> > =c/postgres  +
> >|  |  | | |
> > postgres=CTc/postgres
> >  tpch40gnorhdd | user1| UTF8 | en_US.UTF-8 | en_US.UTF-8 |
> >  tpch40gnorssd | user1| UTF8 | en_US.UTF-8 | en_US.UTF-8 |
> > (5 rows)
> >
> > postgres=#
> >
> > After 7 query execution in a database tpch40gnorhdd I restart the DBMS
> > (/etc/init.d/pg101norssd restart and drop cache of the O.S.) and go to
> > execution test with the database tpch40gnorssd.
> > You think in this case there is pollution of shared_buffers?
> > Why do you think having O.S. on SSD is bad? Do you could explain better?
> >
> > Best regards
> > []`s Neto
> >
>
> +1 information about EVO SSD Samsung:
>
>  Model: 850 Evo 500 GB SATA III 6Gb/s -
> http://www.samsung.com/semiconductor/minisite/ssd/product/consumer/850evo/
>
>
> >> I can see a couple of things in your setup that might pessimize the SDD
> >> case:
> >> - you have OS on the SSD - if you tests make the system swap then this
> will
> >> wreck the SSD result
> >> - you have RAID 0 SSD...some of the cheaper ones slow down when you do
> this.
> >> maybe test with a single SSD
> >>
> >> regards
> >> Mark
> >>
> >> On 18/07/18 01:04, Neto pr wrote (note snippage):
> >>
> >>> (echo 3> / proc / sys / vm / drop_caches;
> >>>
> >>> discs:
> >>> - 2 units of Samsung Evo SSD 500 GB (mounted on ZERO RAID)
> >>> - 2 SATA 7500 Krpm HDD units - 1TB (mounted on ZERO RAID)
> >>>
> >>> - The Operating System and the Postgresql DBMS are installed on the SSD
> >>> disk.
> >>>
> >>>
> >>
>
>


Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)

2018-07-18 Thread Robert Haas
On Sat, Jul 7, 2018 at 4:01 PM, Andres Freund  wrote:
> FWIW, here's a rebased version of this patch. Could probably be polished
> further. One might argue that we should do a bit more wide ranging
> changes, to convert scanint8 and pg_atoi to be also unified. But it
> might also just be worthwhile to apply without those, given the
> performance benefit.

Wouldn't hurt to do that one too, but might be OK to just do this
much.  Questions:

1. Why the error message changes?  If there's a good reason, it should
be done as a separate commit, or at least well-documented in the
commit message.

2. Does the likely/unlikely stuff make a noticeable difference?

3. If this is a drop-in replacement for pg_atoi, why not just recode
pg_atoi this way -- or have it call this -- and leave the callers
unchanged?

4. Are we sure this is faster on all platforms, or could it work out
the other way on, say, BSD?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company