[PERFORM] Delete me

2006-01-31 Thread 2386429

Can you delete me from the mail list Please?


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Huge Data sets, simple queries

2006-01-31 Thread Luke Lonergan
Jim,

On 1/30/06 12:25 PM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote:

> Why divide by 2? A good raid controller should be able to send read
> requests to both drives out of the mirrored set to fully utilize the
> bandwidth. Of course, that probably won't come into play unless the OS
> decides that it's going to read-ahead fairly large chunks of the table
> at a time...

I've not seen one that does, nor would it work in the general case IMO.  In
RAID1 writes are duplicated and reads come from one of the copies.  You
could alternate read service requests to minimize rotational latency, but
you can't improve bandwidth.

- Luke   



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Huge Data sets, simple queries

2006-01-31 Thread Kevin

Luke Lonergan wrote:

Jim,

On 1/30/06 12:25 PM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote:

  

Why divide by 2? A good raid controller should be able to send read
requests to both drives out of the mirrored set to fully utilize the
bandwidth. Of course, that probably won't come into play unless the OS
decides that it's going to read-ahead fairly large chunks of the table
at a time...



I've not seen one that does, nor would it work in the general case IMO.  In
RAID1 writes are duplicated and reads come from one of the copies.  You
could alternate read service requests to minimize rotational latency, but
you can't improve bandwidth.

- Luke   

  
For Solaris's software raid, the default settings for raid-1 sets is:  
round-robin read, parallel write.   I assumed this mean't it would give 
similar read performance to raid-0, but I've never benchmarked it.


-Kevin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Delete me

2006-01-31 Thread Richard Huxton

[EMAIL PROTECTED] wrote:

Can you delete me from the mail list Please?


Go to the website.
Click "community"
Click "mailing lists"
On the left-hand side click "Subscribe"
Fill in the form, changing action to "unsubscribe"

HTH

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Huge Data sets, simple queries

2006-01-31 Thread Jim C. Nasby
On Tue, Jan 31, 2006 at 09:00:30AM -0800, Luke Lonergan wrote:
> Jim,
> 
> On 1/30/06 12:25 PM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote:
> 
> > Why divide by 2? A good raid controller should be able to send read
> > requests to both drives out of the mirrored set to fully utilize the
> > bandwidth. Of course, that probably won't come into play unless the OS
> > decides that it's going to read-ahead fairly large chunks of the table
> > at a time...
> 
> I've not seen one that does, nor would it work in the general case IMO.  In
> RAID1 writes are duplicated and reads come from one of the copies.  You
> could alternate read service requests to minimize rotational latency, but
> you can't improve bandwidth.

(BTW, I did some testing that seems to confirm this)

Why couldn't you double the bandwidth? If you're doing a largish read
you should be able to do something like have drive a read the first
track, drive b the second, etc. Of course that means that the controller
or OS would have to be able to stitch things back together.

As for software raid, I'm wondering how well that works if you can't use
a BBU to allow write caching/re-ordering...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Delete me

2006-01-31 Thread Jim C. Nasby
On Tue, Jan 31, 2006 at 06:33:14PM +, Richard Huxton wrote:
> [EMAIL PROTECTED] wrote:
> >Can you delete me from the mail list Please?
> 
> Go to the website.
> Click "community"
> Click "mailing lists"
> On the left-hand side click "Subscribe"
> Fill in the form, changing action to "unsubscribe"

Or take a look at the header that's included with every single message
sent to the list...

List-Unsubscribe: 
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Huge Data sets, simple queries

2006-01-31 Thread Luke Lonergan
Jeffrey,

On 1/31/06 12:03 PM, "Jeffrey W. Baker" <[EMAIL PROTECTED]> wrote:

> Then you've not seen Linux.

:-D

> Linux does balanced reads on software
> mirrors.  I'm not sure why you think this can't improve bandwidth.  It
> does improve streaming bandwidth as long as the platter STR is more than
> the bus STR.

... Prove it.

- Luke



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Huge Data sets, simple queries

2006-01-31 Thread Luke Lonergan
Jim,

On 1/31/06 11:21 AM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote:

> (BTW, I did some testing that seems to confirm this)
> 
> Why couldn't you double the bandwidth? If you're doing a largish read
> you should be able to do something like have drive a read the first
> track, drive b the second, etc. Of course that means that the controller
> or OS would have to be able to stitch things back together.

It's because your alternating reads are skipping in chunks across the
platter.  Disks work at their max internal rate when reading sequential
data, and the cache is often built to buffer a track-at-a-time, so
alternating pieces that are not contiguous has the effect of halving the max
internal sustained bandwidth of each drive - the total is equal to one
drive's sustained internal bandwidth.

This works differently for RAID0, where the chunks are allocated to each
drive and laid down contiguously on each, so that when they're read back,
each drive runs at it's sustained sequential throughput.

The alternating technique in mirroring might improve rotational latency for
random seeking - a trick that Tandem exploited, but it won't improve
bandwidth.
 
> As for software raid, I'm wondering how well that works if you can't use
> a BBU to allow write caching/re-ordering...

Works great with standard OS write caching.

- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Huge Data sets, simple queries

2006-01-31 Thread PFC



Linux does balanced reads on software
mirrors.  I'm not sure why you think this can't improve bandwidth.  It
does improve streaming bandwidth as long as the platter STR is more than
the bus STR.


... Prove it.



(I have a software RAID1 on this desktop machine)

	It's a lot faster than a single disk for random reads when more than 1  
thread hits the disk, because it distributes reads to both disks. Thus,  
applications start faster, and the machine is more reactive even when the  
disk is thrashing. Cron starting a "updatedb" is less painful. It's cool  
for desktop use (and of course it's more reliable).


	However large reads (dd-style) are just the same speed as 1 drive. I  
guess you'd need a humongous readahead in order to read from both disks.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Huge Data sets, simple queries

2006-01-31 Thread Jim C. Nasby
On Tue, Jan 31, 2006 at 02:52:57PM -0800, Luke Lonergan wrote:
> It's because your alternating reads are skipping in chunks across the
> platter.  Disks work at their max internal rate when reading sequential
> data, and the cache is often built to buffer a track-at-a-time, so
> alternating pieces that are not contiguous has the effect of halving the max
> internal sustained bandwidth of each drive - the total is equal to one
> drive's sustained internal bandwidth.
> 
> This works differently for RAID0, where the chunks are allocated to each
> drive and laid down contiguously on each, so that when they're read back,
> each drive runs at it's sustained sequential throughput.
> 
> The alternating technique in mirroring might improve rotational latency for
> random seeking - a trick that Tandem exploited, but it won't improve
> bandwidth.

Or just work in multiples of tracks, which would greatly reduce the
impact of delays from seeking.

> > As for software raid, I'm wondering how well that works if you can't use
> > a BBU to allow write caching/re-ordering...
> 
> Works great with standard OS write caching.

Well, the only problem with that is if the machine crashes for any
reason you risk having the database corrupted (or at best losing some
committed transactions).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Huge Data sets, simple queries

2006-01-31 Thread Luke Lonergan
PFC,

On 1/31/06 3:11 PM, "PFC" <[EMAIL PROTECTED]> wrote:

>> ... Prove it.
>> 
> 
> (I have a software RAID1 on this desktop machine)
> 
> It's a lot faster than a single disk for random reads when more than 1
> thread hits the disk, because it distributes reads to both disks. Thus,
> applications start faster, and the machine is more reactive even when the
> disk is thrashing. Cron starting a "updatedb" is less painful. It's cool
> for desktop use (and of course it's more reliable).

Exactly - improved your random seeks.

> However large reads (dd-style) are just the same speed as 1 drive. I
> guess you'd need a humongous readahead in order to read from both disks.

Nope - won't help.

- Luke



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Huge Data sets, simple queries

2006-01-31 Thread Jeffrey W. Baker
On Tue, 2006-01-31 at 09:00 -0800, Luke Lonergan wrote:
> Jim,
> 
> On 1/30/06 12:25 PM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote:
> 
> > Why divide by 2? A good raid controller should be able to send read
> > requests to both drives out of the mirrored set to fully utilize the
> > bandwidth. Of course, that probably won't come into play unless the OS
> > decides that it's going to read-ahead fairly large chunks of the table
> > at a time...
> 
> I've not seen one that does, nor would it work in the general case IMO.  In
> RAID1 writes are duplicated and reads come from one of the copies.  You
> could alternate read service requests to minimize rotational latency, but
> you can't improve bandwidth.

Then you've not seen Linux.  Linux does balanced reads on software
mirrors.  I'm not sure why you think this can't improve bandwidth.  It
does improve streaming bandwidth as long as the platter STR is more than
the bus STR.

-jwb

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Huge Data sets, simple queries

2006-01-31 Thread Luke Lonergan
Jim,

On 1/31/06 3:12 PM, "Jim C. Nasby" <[EMAIL PROTECTED]> wrote:

>> The alternating technique in mirroring might improve rotational latency for
>> random seeking - a trick that Tandem exploited, but it won't improve
>> bandwidth.
> 
> Or just work in multiples of tracks, which would greatly reduce the
> impact of delays from seeking.

So, having rediscovered the facts underlying the age-old RAID10 versus RAID5
debate we're back to the earlier points.

RAID10 is/was the best option when latency / random seek was the predominant
problem to be solved, RAID5/50 is best where read bandwidth is needed.
Modern developments in fast CPUs for write checksumming have made RAID5/50 a
viable alternative to RAID10 even when there is moderate write / random seek
workloads and fast read is needed.

>> 
>> Works great with standard OS write caching.
> 
> Well, the only problem with that is if the machine crashes for any
> reason you risk having the database corrupted (or at best losing some
> committed transactions).

So, do you routinely turn off Linux write caching?  If not, then there's no
difference.

- Luke



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] partitioning and locking problems

2006-01-31 Thread Marc Morin
We have a large database system designed around partitioning.  Our
application is characterized with
 
- terabytes of data
- billions of rows in dozens of base tables (and 100s of paritions)
- 24x7 insert load of new data that cannot be stopped, data is time
sensitive.
- periodic reports that can have long running queries with query times
measured in hours
 
We have 2 classes of "maintenance" activities that are causing us
problems:
- periodically we need to change an insert rule on a view to point to a
different partition.
- periodically we need to delete data that is no longer needed.
Performed via truncate.
 
Under both these circumstances (truncate and create / replace rule) the
locking behaviour of these commands can cause locking problems for us.
The scenario is best illustrated as a series of steps:
 

1- long running report is running on view
2- continuous inserters into view into a table via a rule
3- truncate or rule change occurs, taking an exclusive lock.
Must wait for #1 to finish.
4- new reports and inserters must now wait for #3.
5- now everyone is waiting for a single query in #1.   Results
in loss of insert data granularity (important for our application).

 
Would like to understand the implications of changing postgres'
code/locking for rule changes and truncate to not require locking out
select statements?  
 
The following is a simplified schema to help illustrate the problem.
 

create table a_1
(
pkey int primary key
);
create table a_2
(
pkey int primary key
);
 
create view a as select * from a_1 union all select * from a_2;
 
create function change_rule(int) returns void as
'
begin
execute ''create or replace rule insert as on insert to a do
instead insert into a_''||$1||''(pkey) values(NEW.pkey)'';
end;
' language plpgsql;
 
-- change rule, execute something like the following
periodically
select change_rule(1);

 
We've looked at the code and the rule changes appear "easy" but we are
concerned about the required changes for truncate.
 
Thanks
Marc


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Storing Digital Video

2006-01-31 Thread Rodrigo Madera
I am concerned with performance issues involving the storage of DV on
a database.

I though of some options, which would be the most advised for speed?

1) Pack N frames inside a "container" and store the container to the db.
2) Store each frame in a separate record in the table "frames".
3) (type something here)

Thanks for the help,

Rodrigo

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Storing Digital Video

2006-01-31 Thread Matt Davies | Postgresql List

Rodrigo Madera wrote:


I am concerned with performance issues involving the storage of DV on
a database.

I though of some options, which would be the most advised for speed?

1) Pack N frames inside a "container" and store the container to the db.
2) Store each frame in a separate record in the table "frames".
3) (type something here)

Thanks for the help,

 



My experience has been that this is a very bad idea. Many people want to 
store all sorts of data in a database such as email messages, pictures, 
etc... The idea of a relational database is to perform queries against 
data. If you are needing to just store data then store it on a disk and 
use the database as the indexer of the data.


Keep in mind the larger the database the slower some operations become.

Unless you are operating on the frame data (which you either store as 
blobs or hex-encoded data) I'd recommend you store the data on a hard 
drive and let the database store meta data about the video such as path 
information, run time, author, etc...


We do this on an application storing close to a million images and the 
performance is impressive.
   1. we don't have to do any sort of data manipulation storing the 
data in or retrieving the data out of the database.
   2. our database is compact and extremely fast  - it is using the 
database for what it was designed for - relational queries.


My $0.02


Rodrigo

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


 




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Huge Data sets, simple queries

2006-01-31 Thread Steinar H. Gunderson
On Tue, Jan 31, 2006 at 12:47:10PM -0800, Luke Lonergan wrote:
>> Linux does balanced reads on software
>> mirrors.  I'm not sure why you think this can't improve bandwidth.  It
>> does improve streaming bandwidth as long as the platter STR is more than
>> the bus STR.
> ... Prove it.

FWIW, this is on Ultra160 disks (Seagate 1rpm) on a dual Opteron running
Linux 2.6.14.3:

cassarossa:~# grep md1 /proc/mdstat 
md1 : active raid1 sdf6[1] sda6[0]
cassarossa:~# dd if=/dev/sda6 of=/dev/null bs=8k count=40
[system at about 35% wait for I/O and 15% system, according to top]
40+0 records in
40+0 records out
327680 bytes transferred in 54,488154 seconds (60137842 bytes/sec)
[system at about 45% wait for I/O and 7% system -- whoa?]
40+0 records in
40+0 records out
327680 bytes transferred in 52,523771 seconds (62386990 bytes/sec)

I'm not sure if it _refutes_ the assertion that the Linux RAID-1 driver can
do balancing of sequential reads, but it certainly doesn't present very much
evidence in that direction. BTW, sda and sdf are on different channels of a
dual-channel (onboard, connected via PCI-X) Adaptec board, so I doubt the bus
is the limiting factor.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Huge Data sets, simple queries

2006-01-31 Thread Luke Lonergan
Steinar,

On 1/31/06 5:26 PM, "Steinar H. Gunderson" <[EMAIL PROTECTED]> wrote:

> cassarossa:~# grep md1 /proc/mdstat
> md1 : active raid1 sdf6[1] sda6[0]
> cassarossa:~# dd if=/dev/sda6 of=/dev/null bs=8k count=40
> [system at about 35% wait for I/O and 15% system, according to top]
> 40+0 records in
> 40+0 records out
> 327680 bytes transferred in 54,488154 seconds (60137842 bytes/sec)
> [system at about 45% wait for I/O and 7% system -- whoa?]
> 40+0 records in
> 40+0 records out
> 327680 bytes transferred in 52,523771 seconds (62386990 bytes/sec)
> 
> I'm not sure if it _refutes_ the assertion that the Linux RAID-1 driver can
> do balancing of sequential reads, but it certainly doesn't present very much
> evidence in that direction. BTW, sda and sdf are on different channels of a
> dual-channel (onboard, connected via PCI-X) Adaptec board, so I doubt the bus
> is the limiting factor.

Yep - 2MB/s is noise.  Run a RAID0, you should get 120MB/s.

Incidentally, before this thread took a turn to RAID10 vs. RAID5, the
question of HW RAID adapter versus SW RAID was the focus.  I routinely see
numbers like 20MB/s coming from HW RAID adapters on Linux, so it's nice to
see someone post a decent number using SW RAID.

We're very happy with the 3Ware HW RAID adapters, but so far they're the
only ones (I have two Arecas but I mistakenly ordered PCI-E so I can't test
them :-( 

- Luke



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Huge Data sets, simple queries

2006-01-31 Thread Jim C. Nasby
On Tue, Jan 31, 2006 at 03:19:38PM -0800, Luke Lonergan wrote:
> > Well, the only problem with that is if the machine crashes for any
> > reason you risk having the database corrupted (or at best losing some
> > committed transactions).
> 
> So, do you routinely turn off Linux write caching?  If not, then there's no
> difference.

My thought was about fsync on WAL; if you're doing much writing then
a good raid write cache with BBU will improve performance.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Sequential scan being used despite indexes

2006-01-31 Thread James Russell
Hi there,

I'm running a simple query with 2 inner joins (say A, B and C). Each of
the join columns has indexes. If I run queries that join just A and B,
or just B and C, postgres uses indexes. But if I run "A join B join C"
then the "B join C" part starts using a sequential scan and I can't
figure out why.

Here's the query, which basically retrieves all meta-data for all
messages in a given forum. The relationship is pretty simple. Forums
contain threads, which contain messages, which each have associated
meta-data:

SELECT message.message_id, message_meta_data.value
FROM thread
    JOIN message USING (thread_id)
    JOIN message_meta_data ON (message.message_id=message_meta_data.message_id)
WHERE thread.forum_id=123;

Explaining:
Hash Join  (cost=337.93..1267.54 rows=180 width=35)
Hash Cond: ("outer".message_id = "inner".message_id)
->  Seq Scan on message_meta_data  (cost=0.00..739.19 rows=37719 width=30)
->  Hash  (cost=337.79..337.79 rows=57 width=13)
    ->  Nested Loop  (cost=0.00..337.79 rows=57 width=13)
  -> 
Index Scan using thread_forum_id_idx on thread (cost=0.00..41.61
rows=13 width=4)
 Index Cond: (forum_id = 6)
  -> 
Index Scan using message_thread_id_idx on message (cost=0.00..22.72
rows=5 width=17)

Index Cond: ("outer".thread_id = message.thread_id)

As you can see, the message and message_meta_data tables use a Seq
Scan. The only way I can think of forcing it to use the Index Scan in
all cases would be to use two separate nested queries: The outer query
would retrieve the list of messages in the forum, and the inner query
would retrieve the list of metadata for an individual message.
Obviously I want to avoid having to do that if possible.

Any ideas?

Many thanks if you can help.

James


Re: [PERFORM] Sequential scan being used despite indexes

2006-01-31 Thread Joshua D. Drake




Explaining:
Hash Join  (cost=337.93..1267.54 rows=180 width=35)
Hash Cond: ("outer".message_id = "inner".message_id)
->  Seq Scan on message_meta_data  (cost=0.00..739.19 rows=37719 width=30)
->  Hash  (cost=337.79..337.79 rows=57 width=13)
->  Nested Loop  (cost=0.00..337.79 rows=57 width=13)
  ->  Index Scan using thread_forum_id_idx on thread 
(cost=0.00..41.61 rows=13 width=4)

 Index Cond: (forum_id = 6)
  ->  Index Scan using message_thread_id_idx on message 
(cost=0.00..22.72 rows=5 width=17)

 Index Cond: ("outer".thread_id = message.thread_id)

As you can see, the message and message_meta_data tables use a Seq 
Scan. The only way I can think of forcing it to use the Index Scan in 
all cases would be to use two separate nested queries: The outer query 
would retrieve the list of messages in the forum, and the inner query 
would retrieve the list of metadata for an individual message. 
Obviously I want to avoid having to do that if possible.


Any ideas?

What does explain analyze say?

Joshua D. Drake




Many thanks if you can help.

James



--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Sequential scan being used despite indexes

2006-01-31 Thread Michael Fuhr
On Tue, Jan 31, 2006 at 07:29:51PM -0800, Joshua D. Drake wrote:
> > Any ideas?
>
> What does explain analyze say?

Also, have the tables been vacuumed and analyzed?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Huge Data sets, simple queries

2006-01-31 Thread Jeffrey W. Baker
On Tue, 2006-01-31 at 12:47 -0800, Luke Lonergan wrote:
> Jeffrey,
> 
> On 1/31/06 12:03 PM, "Jeffrey W. Baker" <[EMAIL PROTECTED]> wrote:
> > Linux does balanced reads on software
> > mirrors.  I'm not sure why you think this can't improve bandwidth.  It
> > does improve streaming bandwidth as long as the platter STR is more than
> > the bus STR.
> 
> ... Prove it.

It's clear that Linux software RAID1, and by extension RAID10, does
balanced reads, and that these balanced reads double the bandwidth.  A
quick glance at the kernel source code, and a trivial test, proves the
point.

In this test, sdf and sdg are Seagate 15k.3 disks on a single channel of
an Adaptec 39320, but the enclosure, and therefore the bus, is capable
of only Ultra160 operation.

# grep md0 /proc/mdstat 
md0 : active raid1 sdf1[0] sdg1[1]

# dd if=/dev/md0 of=/dev/null bs=8k count=40 skip=0  & 
  dd if=/dev/md0 of=/dev/null bs=8k count=40 skip=40
40+0 records in
40+0 records out
327680 bytes transferred in 48.243362 seconds (67922298 bytes/sec)
40+0 records in
40+0 records out
327680 bytes transferred in 48.375897 seconds (67736211 bytes/sec)

That's 136MB/sec, for those following along at home.  With only two
disks in a RAID1, you can nearly max out the SCSI bus.

# dd if=/dev/sdf1 of=/dev/null bs=8k count=40 skip=0  & 
  dd if=/dev/sdf1 of=/dev/null bs=8k count=40 skip=40
40+0 records in
40+0 records out
327680 bytes transferred in 190.413286 seconds (17208883 bytes/sec)
40+0 records in
40+0 records out
327680 bytes transferred in 192.096232 seconds (17058117 bytes/sec)

That, on the other hand, is only 34MB/sec.  With two threads, the RAID1
is 296% faster.

# dd if=/dev/md0 of=/dev/null bs=8k count=40 skip=0   & 
  dd if=/dev/md0 of=/dev/null bs=8k count=40 skip=40  & 
  dd if=/dev/md0 of=/dev/null bs=8k count=40 skip=80  & 
  dd if=/dev/md0 of=/dev/null bs=8k count=40 skip=120 &
40+0 records in
40+0 records out
327680 bytes transferred in 174.276585 seconds (18802296 bytes/sec)
40+0 records in
40+0 records out
327680 bytes transferred in 181.581893 seconds (18045852 bytes/sec)
40+0 records in
40+0 records out
327680 bytes transferred in 183.724243 seconds (17835425 bytes/sec)
40+0 records in
40+0 records out
327680 bytes transferred in 184.209018 seconds (17788489 bytes/sec)

That's 71MB/sec with 4 threads...

# dd if=/dev/sdf1 of=/dev/null bs=8k count=10 skip=0   & 
  dd if=/dev/sdf1 of=/dev/null bs=8k count=10 skip=40  & 
  dd if=/dev/sdf1 of=/dev/null bs=8k count=10 skip=80  & 
  dd if=/dev/sdf1 of=/dev/null bs=8k count=10 skip=120 &
10+0 records in
10+0 records out
81920 bytes transferred in 77.489210 seconds (10571794 bytes/sec)
10+0 records in
10+0 records out
81920 bytes transferred in 87.628000 seconds (9348610 bytes/sec)
10+0 records in
10+0 records out
81920 bytes transferred in 88.912989 seconds (9213502 bytes/sec)
10+0 records in
10+0 records out
81920 bytes transferred in 90.238705 seconds (9078144 bytes/sec)

Only 36MB/sec for the single disk.  96% advantage for the RAID1.

# dd if=/dev/md0 of=/dev/null bs=8k count=5 skip=0 & 
  dd if=/dev/md0 of=/dev/null bs=8k count=5 skip=40  & 
  dd if=/dev/md0 of=/dev/null bs=8k count=5 skip=80  & 
  dd if=/dev/md0 of=/dev/null bs=8k count=5 skip=120 & 
  dd if=/dev/md0 of=/dev/null bs=8k count=5 skip=160 & 
  dd if=/dev/md0 of=/dev/null bs=8k count=5 skip=200 & 
  dd if=/dev/md0 of=/dev/null bs=8k count=5 skip=240 & 
  dd if=/dev/md0 of=/dev/null bs=8k count=5 skip=280 &
5+0 records in
5+0 records out
40960 bytes transferred in 35.289648 seconds (11606803 bytes/sec)
5+0 records in
5+0 records out
40960 bytes transferred in 42.653475 seconds (9602969 bytes/sec)
5+0 records in
5+0 records out
40960 bytes transferred in 43.524714 seconds (9410745 bytes/sec)
5+0 records in
5+0 records out
40960 bytes transferred in 45.151705 seconds (9071640 bytes/sec)
5+0 records in
5+0 records out
40960 bytes transferred in 47.741845 seconds (8579476 bytes/sec)
5+0 records in
5+0 records out
40960 bytes transferred in 48.600533 seconds (8427891 bytes/sec)
5+0 records in
5+0 records out
40960 bytes transferred in 48.758726 seconds (8400548 bytes/sec)
5+0 records in
5+0 records out
40960 bytes transferred in 49.679275 seconds (8244887 bytes/sec)

66MB/s with 8 threads.

# dd if=/dev/sdf1 of=/dev/null bs=8k count=5 skip=0 & 
  dd if=/dev/sdf1 of=/dev/null bs=8k count=5 skip=40  & 
  dd if=/dev/sdf1 of=/dev/null bs=8k count=5 skip=80  & 
  dd if=/dev/sdf1 of=/dev/null bs=8k count=5 skip=120 & 
  dd if=/dev/sdf1 of=/dev/null bs=8k count=5 skip=160 & 
  dd if=/dev/sdf1 of=/dev/nu

Re: [PERFORM] Sequential scan being used despite indexes

2006-01-31 Thread James Russell
[Sorry, my last reply didn't go to the list]

Reading about this issue further in the FAQ, it seems that I should
ensure that Postgres has adequate and accurate information about the
tables in question by regularly running VACUUM ANALYZE, something I don't do currently.



I disabled SeqScan as per the FAQ, and it indeed was a lot slower so Postgres was making the right choice in this case.
Many thanks,

James


Re: [PERFORM] Sequential scan being used despite indexes

2006-01-31 Thread Christopher Kings-Lynne
Reading about this issue further in the FAQ, it seems that I should 
ensure that Postgres has adequate and accurate information about the 
tables in question by regularly running VACUUM ANALYZE, something I 
don't do currently.


Well then you'll get rubbish performance always in PostgreSQL...

I strongly suggest you run autovacuum if you don't really understand 
PostgreSQL vacuuming/analyzing.


Chris


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Sequential scan being used despite indexes

2006-01-31 Thread Michael Fuhr
On Wed, Feb 01, 2006 at 01:33:08PM +0900, James Russell wrote:
> Reading about this issue further in the FAQ, it seems that I should ensure
> that Postgres has adequate and accurate information about the tables in
> question by regularly running VACUUM ANALYZE, something I don't do
> currently.

Many people use a cron job (or the equivalent) to run VACUUM ANALYZE
at regular intervals; some also use the pg_autovacuum daemon, which
is a contrib module in 8.0 and earlier and part of the backend as of
8.1.

How often to vacuum/analyze depends on usage.  Once per day is
commonly cited, but busy tables might need it more often than that.
Just recently somebody had a table that could have used vacuuming
every five minutes or less (all records were updated every 30
seconds); pg_autovacuum can be useful in such cases.

> I disabled SeqScan as per the FAQ, and it indeed was a lot slower so
> Postgres was making the right choice in this case.

The planner might be making the right choice given the statistics
it has, but it's possible that better statistics would lead to a
different plan, perhaps one where an index scan would be faster.

What happens if you run VACUUM ANALYZE on all the tables, then run
the query again with EXPLAIN ANALYZE?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] partitioning and locking problems

2006-01-31 Thread Tom Lane
"Marc Morin" <[EMAIL PROTECTED]> writes:
> Would like to understand the implications of changing postgres'
> code/locking for rule changes and truncate to not require locking out
> select statements?  

It won't work...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Huge Data sets, simple queries

2006-01-31 Thread Luke Lonergan
Jeffrey,

On 1/31/06 8:09 PM, "Jeffrey W. Baker" <[EMAIL PROTECTED]> wrote:
>> ... Prove it.
> I think I've proved my point.  Software RAID1 read balancing provides
> 0%, 300%, 100%, and 100% speedup on 1, 2, 4, and 8 threads,
> respectively.  In the presence of random I/O, the results are even
> better.
> Anyone who thinks they have a single-threaded workload has not yet
> encountered the autovacuum daemon.

Good data - interesting case.  I presume from your results that you had to
make the I/Os non-overlapping (the "skip" option to dd) in order to get the
concurrent access to work.  Why the particular choice of offset - 3.2GB in
this case?

So - the bandwidth doubles in specific circumstances under concurrent
workloads - not relevant to "Huge Data sets, simple queries", but possibly
helpful for certain kinds of OLTP applications.

- Luke 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings