[PERFORM] restoring to wrong encoding db

2004-09-03 Thread Vivek Khera
I was just copying a database that was in UNICODE encoding into a new
db for some testing.  I hand't realized it was UNICODE and when it hit
some funky chinese data (from some spam that came in...) it errored
out with a string too long for a varchar(255).

The dump was created on PG 7.4.3 with "pg_dump -Fc"

The db was created with "createdb rt3"

The restore was to PG 7.4.5 with "pg_restore --verbose -d rt3 rt3.dump"


Is there some way for the dump to notice that the encoding is wrong in
the db into which it is being restored?  Once I created the rt3 db
with encoding='UNICODE' it worked just fine.  Should there be some
kind of check like that?


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] fsync vs open_sync

2004-09-03 Thread Merlin Moncure
> > There is also the fact that NTFS is a very slow filesystem, and
> > Linux is
> > a lot better than Windows for everything disk, caching and IO related.
> Try
> > to copy some files in NTFS and in ReiserFS...
> 
> I'm not so sure I would agree with such a blanket generalization.  I find
> NTFS to be very fast, my main complaint is fragmentation issues...I bet
> NTFS is better than ext3 at most things (I do agree with you about the
> cache, thoughO.

Ok, you were right.  I made some tests and NTFS is just not very good in the general 
case.  I've seen some benchmarks for Reiser4 that are just amazing.

Merlin

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] fsync vs open_sync

2004-09-03 Thread Pierre-Frédéric Caillaud
>There is also the fact that NTFS is a very slow filesystem, and
> Linux is
> a lot better than Windows for everything disk, caching and IO related.
Try
> to copy some files in NTFS and in ReiserFS...
I'm not so sure I would agree with such a blanket generalization.  I  
find
NTFS to be very fast, my main complaint is fragmentation issues...I bet
NTFS is better than ext3 at most things (I do agree with you about the
cache, thoughO.
Ok, you were right.  I made some tests and NTFS is just not very good in  
the general case.  I've seen some benchmarks for Reiser4 that are just  
amazing.
	As a matter of fact I was again amazed today.
	I was looking into a way to cache database queries for a website (not  
yet) written in Python. The purpose was to cache long queries like those  
used to render forum pages (which is the typical slow query, selecting  
from a big table where records are rather random and LIMIT is used to cut  
the result in pages).
	I wanted to save a serialized (python pickled) representation of the data  
to disk to avoid reissuing the query every time.
	In the end it took about 1 ms to load or save the data for a page with 40  
posts... then I wondered, how much does it take just to read or write the  
file ?

ReiserFS 3.6, Athlon XP 2.5G+, 512Mb DDR400
7200 RPM IDE Drive with 8MB Cache
This would be considered a very underpowered server...
22 KB files, 1000 of them :
open(), read(), close() : 10.000 files/s
open(), write(), close() : 4.000 files/s
	This is quite far from database FS activity, but it's still amazing,  
although the disk doesn't even get used. Which is what I like in Linux.  
You can write 1 files in one second and the HDD is still idle... then  
when it decides to flush it all goes to disk in one burst.

	I did make benchmarks some time ago and found that what sets Linux apart  
from Windows in terms of filesystems is :
	- very high performance filesystems like ReiserFS
	This is the obvious part ; although with a hge amount of data in  
small files accessed randomly, ReiserFS is faster but not 10x, maybe  
something like 2x NTFS. I trust Reiser4 to offer better performance, but  
not right now. Also ReiserFS lacks a defragmenter, and it gets slower  
after 1-2 years (compared to 1-2 weeks with NTFS this is still not that  
bad, but I'd like to defragment and I cant). Reiser4 will fix that  
apparently with background defragger etc.

	- caching.
	Linux disk caching is amazing. When copying a large file to the same disk  
on Windows, the drive head swaps a lot, like the OS can't decide between  
reading and writing. Linux, on the other hand, reads and writes by large  
chunks and loses a lot less time seekng. Even when reading two files at  
the same time, Linux reads ahead in large chunks (very little performance  
loss) whereas Windows seeks a lot. The read-ahead and write-back thus gets  
it a lot faster than 2x NTFS for everyday tasks like copying files,  
backing up, making archives, grepping, serving files, etc...
	My windows box was able to saturate a 100Mbps ethernet while serving one  
large FTP file on the LAN (not that impressive, it's only 10 MB/s hey!).  
However, when several simultaneous clients were trying to download  
different files which were not in the disk cache, all hell broke loose :  
lots of seeking, and bandwidth dropped to 30 Mbits/s. Not enough  
read-ahead...
	The Linux box, serving FTP, with half the RAM (256 Mb), had no problem  
pushing the 100 Mbits/s with something like 10 simultaneous connections.  
The amusing part is that I could not use the Windows box to test it  
because it would choke at such a "high" IO concurrency (writing 10  
MBytes/s to several files at once, my god).
	Of course the files which had been downloaded to the Windows box were cut  
in as many fragments as the number of disk seeks during the download...  
several hundred fragments each... my god...

	What amazes me is that it must just be some parameter somewhere and the  
Microsoft guys probably could have easily changed the read-ahead  
thresholds and time between seeks when in a multitasking environment, but  
they didn't. Why ?

	Thus people are forced to buy 1RPM SCSI drives for their LAN servers  
when an IDE raid, used with Linux, could push nearly a Gigabit...

	For database, this is different, as we're concerned about large files,  
and fsync() times... but it seems reiserfs still wins over ext3 so...

	About NTFS vs EXT3 : ext3 dies if you put a lot of files in the same  
directory. It's fast but still outperformed by reiser.

	I saw XFS fry eight 7 harddisk RAID bays. The computer was rebooted with  
the Reset button a few times because a faulty SCSI cable in the eighth  
RAID bay was making it hang. The 7 bays had no problem. When it went back  
up, all the bays were in mayhem. XFSrepair just vomited over itself and we  
got plenty of files with random data in them. Fortunately there was a  
ca