Re: [CentOS] Question about optimal filesystem with many small files.
2009/7/11 o : > >> You mentioned that the data can be retrieved from somewhere else. Is >> some part of this filename a unique key? > > The real key is up to 1023 chracters long and it's unique, but I have to trim > to 256 charactes, by this way is not unique unless I add the hash. > The fact that this 1023 file name is unique is very nice. And no trimming is needed! I think you have 2 issues to deal with: 1) you have files with unique file names unfortunatelly with lenth <= 1023 characters. Regarding filenames and paths in linux and ext3 you have: file name length limit = 254 bytes path length limit = 4096 If you try to store such a file directly, you will break the file name limit. But if you decompose the name into N chunks each of 250 characters, you will be able to preserve the file as a sequence of N - 1 nested folders plus a file with a name equal to the Nth chunk residing into the N-1th folder. Via this decomposition you will translate the unique 1023 character 'file name' into a unique 1023 character 'file path' with length lower than the path length limit 2) You suffer performance degradation when number of files in a folder goes beyond 1000. Filipe Brandenburger has suggested a slick scheme to overcome this problem, that will work perfectly without a database: quote start $ echo -n example.txt | md5sum e76faa0543e007be095bb52982802abe - Then say you take the first 4 digits of it to build the hash: e/7/6/f Then you store file example.txt at: e/7/6/f/example.txt quote end of course, "example.txt" might be a long filename: "exa . 1000 chars here .txt" so after the "hash tree" e/7/6/f you will store the file path structure described in 1). As was suggested by Les Mikesell, squid and other products have already implemented similar strategies, and you might be able to use either the algorithm or directly the code that implements it. I would spend some time investigating squid's code. I think squid has to deal with exactly same problem - cache the contents of resources whose urls might be > 254 characters. If you use this approach - no need for a database to store hashes! I did some tests on a Centos 3 system with the following script: =script start #! /bin/bash for a in a b c d e f g j; do f="" for i in `seq 1 250`; do f=$a$f done mkdir $f cd $f done pwd > some_file.txt =script end which creates a nested directory structure with and a file in it. Total file path length is > 8 * 250. I had no problems accessing this file by its full path: $ find ./ -name some\* -exec cat {} \; | wc -c 2026 ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Question about optimal filesystem with many small files.
Thanks, using directories as file names is a great idea, anyway I'm not sure if that would solve my performance issue, as the bottleneck is the disk and not mysql. I just implemented the directories names based on the hash of the file and the performance is a bit slower than before. This is the output of atop (15 secs. avg.): PRC | sys 0.53s | user 5.43s | #proc112 | #zombie0 | #exit 0 | CPU | sys 4% | user 54% | irq 2% | idle208% | wait131% | cpu | sys 1% | user 24% | irq 1% | idle 54% | cpu001 w 20% | cpu | sys 2% | user 15% | irq 1% | idle 31% | cpu002 w 52% | cpu | sys 1% | user 8% | irq 0% | idle 52% | cpu003 w 38% | cpu | sys 1% | user 7% | irq 0% | idle 71% | cpu000 w 21% | CPL | avg1 10.58 | avg56.92 | avg15 4.66 | csw19112 | intr 19135 | MEM | tot2.0G | free 49.8M | cache 157.4M | buff 116.8M | slab 122.7M | SWP | tot1.9G | free1.2G | | vmcom 2.2G | vmlim 2.9G | PAG | scan 1536 | stall 0 | | swin 9 | swout 0 | DSK | sdb | busy 91% | read 884 | write524 | avio6 ms | DSK | sda | busy 12% | read 201 | write340 | avio2 ms | NET | transport | tcpi8551 | tcpo8204 | udpi 702 | udpo 718 | NET | network | ipi 9264 | ipo 8946 | ipfrw 0 | deliv 9264 | NET | eth0 5% | pcki6859 | pcko6541 | si 5526 Kbps | so 466 Kbps | NET | lo | pcki2405 | pcko2405 | si 397 Kbps | so 397 Kbps | in sdb is the cache and in sda is all other stuff, including the mysql db files. Check that I have a lot of disk reads in sdb, but I'm really getting one file from disk for each 10 written, so my guess is that all other reads are directory listings. As I'm using the hash as directory names, (I think) this makes the linux cache slower, as the files are distributed in a more homogeneous and randomly way among the directories. The app is running a bit slower than using the file name for directory name, although I expect (not really sure) that it will be better as the number of files on disk grows (currently there are only 600k files from 15M). My current performance is around 50 file i/o per second. _ News, entertainment and everything you care about at Live.com. Get it now! http://www.live.com/getstarted.aspx ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Update Issue
Ron Blizzard wrote: Hello. > Missing Dependency: libavcodec.so.51 is needed by package > transcode-1.0.5-1.el5.rf.i386 (installed) > Missing Dependency: libx264.so.55 is needed by package > transcode-1.0.5-1.el5.rf.i386 (installed) Ask in the rpmforge list. This has nothing to do with CentOS repo. regards Olaf ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Question about optimal filesystem with many small files.
> > Thanks, using directories as file names is a great idea, anyway I'm not sure > if that would solve my performance issue, as the bottleneck is the disk and > not mysql. The situation you described initally, suffers from only one issue - too many files in one single directory. You are not the fists fighting this - see qmail maildir, see squid etc. The remedy is always one and the same - split the files into a tree folder structure. For a sample implementaition - check out squid, backup pc etc ... >I just implemented the directories names based on the hash of the file and the >performance is a bit slower than before. This is the output of atop (15 secs. >avg.): > > PRC | sys 0.53s | user 5.43s | #proc 112 | #zombie 0 | #exit 0 > | > CPU | sys 4% | user 54% | irq 2% | idle 208% | wait 131% > | > cpu | sys 1% | user 24% | irq 1% | idle 54% | cpu001 w 20% > | > cpu | sys 2% | user 15% | irq 1% | idle 31% | cpu002 w 52% > | > cpu | sys 1% | user 8% | irq 0% | idle 52% | cpu003 w 38% > | > cpu | sys 1% | user 7% | irq 0% | idle 71% | cpu000 w 21% > | > CPL | avg1 10.58 | avg5 6.92 | avg15 4.66 | csw 19112 | intr 19135 > | > MEM | tot 2.0G | free 49.8M | cache 157.4M | buff 116.8M | slab 122.7M > | > SWP | tot 1.9G | free 1.2G | | vmcom 2.2G | vmlim 2.9G > | I am under the impression that you are swapping. Out of 2GB of cache, you have just 157MB cache and 116MB buffers. What is eating the RAM? Why do you have 0.8GB swap used? You need more memory for file system cache. > PAG | scan 1536 | stall 0 | | swin 9 | swout 0 > | > DSK | sdb | busy 91% | read 884 | write 524 | avio 6 ms > | > DSK | sda | busy 12% | read 201 | write 340 | avio 2 ms > | > NET | transport | tcpi 8551 | tcpo 8204 | udpi 702 | udpo 718 > | > NET | network | ipi 9264 | ipo 8946 | ipfrw 0 | deliv 9264 > | > NET | eth0 5% | pcki 6859 | pcko 6541 | si 5526 Kbps | so 466 Kbps > | > NET | lo | pcki 2405 | pcko 2405 | si 397 Kbps | so 397 Kbps > | > > > in sdb is the cache and in sda is all other stuff, including the mysql db > files. Check that I have a lot of disk reads in sdb, but I'm really getting > one file from disk for each 10 written, so my guess is that all other reads > are directory listings. As I'm using the hash as directory names, (I think) > this makes the linux cache slower, as the files are distributed in a more > homogeneous and randomly way among the directories. > I think that linux file system cache is smart enough for this type of load. How many files per directory do you have? > The app is running a bit slower than using the file name for directory name, > although I expect (not really sure) that it will be better as the number of > files on disk grows (currently there are only 600k files from 15M). My > current performance is around 50 file i/o per second. > Something is wrong. Got to figure this out. Where did this RAM go? ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] vsftpd not able to log in
On Fri, Jul 10, 2009 at 3:17 PM, Eugene Vilensky wrote: > Hi folks, > I can't seem to log into my system via > vsftpd. All other services using PAM are fine...Am I missing something simple? > ftp> user > (username) user > 331 Please specify the password. > Password: > 530 Login incorrect. > > > # getenforce > Permissive > here is the event in /var/log/audit/audit.log: > type=USER_AUTH msg=audit(1247235151.569:9781): user pid=21052 uid=0 auid=0 > subj=root:system_r:ftpd_t:s0 msg='PAM: authentication acct="user" : > exe="/usr/sbin/vsftpd" (hostname=hostname, addr=1.2.3.4, terminal=ftp > res=failed)' > cat /etc/pam.d/vsftpd > #%PAM-1.0 > session optional pam_keyinit.so force revoke > auth required pam_listfile.so item=user sense=deny > file=/etc/vsftpd/ftpusers onerr=succeed > auth required pam_shells.so > auth include system-auth > account include system-auth > session include system-auth > session required pam_loginuid.so > # grep local /etc/vsftpd/vsftpd.conf > local_enable=YES > local_umask=022 > chroot_local_user=YES > # getsebool -a | grep ftp > allow_ftpd_anon_write --> off > allow_ftpd_full_access --> off > allow_ftpd_use_cifs --> off > allow_ftpd_use_nfs --> off > allow_tftp_anon_write --> off > ftp_home_dir --> on > ftpd_disable_trans --> off > ftpd_is_daemon --> on > httpd_enable_ftp_server --> off > tftpd_disable_trans --> off > > > ___ > CentOS mailing list > CentOS@centos.org > http://lists.centos.org/mailman/listinfo/centos > > Is the user's shell listed in /etc/shells? ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] recent rsyslog package available for CentOS?
On 07/10/2009 09:00 PM, Eric B. wrote: > I'm looking for a recent version of rsyslog. The yum repositories only show > me a version that is 2.0.6. According to the www.rsyslog.com site, they are > up to version 5 (dev), which means that I would think/assume that there > would at least be v3 or v4 available somewhere. > > Does anyone know if/where I can find something more recent than 2.0.6? I have been building and using myself much newer versions of rsyslog. Let me look at getting these into a slightly more public area. -- Karanbir Singh : http://www.karan.org/ : 2522...@icq ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] recent rsyslog package available for CentOS?
On Sat, 2009-07-11 at 13:34 +0100, Karanbir Singh wrote: > I have been building and using myself much newer versions of rsyslog. > Let me look at getting these into a slightly more public area. I've been doing the same. Works great, minus maintaining the package myself, but that's not a disaster. Regards, Ranbir -- Kanwar Ranbir Sandhu Linux 2.6.27.25-170.2.72.fc10.x86_64 x86_64 GNU/Linux 09:52:40 up 9 days, 19:26, 5 users, load average: 1.61, 1.35, 1.23 ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Question about optimal filesystem with many small files.
On Sat, 2009-07-11 at 00:01 +, o wrote: > > You mentioned that the data can be retrieved from somewhere else. Is > > some part of this filename a unique key? > > The real key is up to 1023 chracters long and it's unique, but I have to trim > to 256 charactes, by this way is not unique unless I add the hash. > > >Do you have to track this > > relationship anyway - or age/expire content? > > I have to track the long filename -> short file name realation ship. Age is > not relevant here. > > I'd try to arrange things > > so the most likely scenario would take the fewest operations. Perhaps a > > mix of hash+filename would give direct access 99+% of the time and you > > could move all copies of collisions to a different area. > > yes its a good idea, but at this point I don't want to add more complexity > tomy app, and having a separate area for collisions would make it more > complex. > > >Then you could > > keep the database mapping the full name to the hashed path but you'd > > only have to consult it when the open() attempt fails. > > As the long filename is up to 1023 chars long i can't index it with mysql (it > has a lower max limit). That's why I use the hash which is indexed). What I > do is keeping a list of just the md5 of teh cached files in memory in my app, > before going to mysql, I frist check if it's in the list (realy a RB-Tree). --- It is 1024 chars long. Witch want still help. MSSQL 2005 and up is longer, if your interested: http://msdn.microsoft.com/en-us/library/ms143432.aspx But that greatly depends on your data size 900 bytes is the limit but can be exceeded. You can use either one if you do a unique key id name for the index. File name to Unique short name. I would not store images in either one as your SELECT LIKE and Random will kill it. As much as I like DBs I have to say the flat file system is for those. John ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Question about optimal filesystem with many small files.
On Sat, 2009-07-11 at 11:48 -0400, JohnS wrote: > On Sat, 2009-07-11 at 00:01 +, o wrote: > > > You mentioned that the data can be retrieved from somewhere else. Is > > > some part of this filename a unique key? > > > > The real key is up to 1023 chracters long and it's unique, but I have to > > trim to 256 charactes, by this way is not unique unless I add the hash. > > > > >Do you have to track this > > > relationship anyway - or age/expire content? > > > > I have to track the long filename -> short file name realation ship. Age is > > not relevant here. > > > > I'd try to arrange things > > > so the most likely scenario would take the fewest operations. Perhaps a > > > mix of hash+filename would give direct access 99+% of the time and you > > > could move all copies of collisions to a different area. > > > > yes its a good idea, but at this point I don't want to add more complexity > > tomy app, and having a separate area for collisions would make it more > > complex. > > > > >Then you could > > > keep the database mapping the full name to the hashed path but you'd > > > only have to consult it when the open() attempt fails. > > > > As the long filename is up to 1023 chars long i can't index it with mysql > > (it has a lower max limit). That's why I use the hash which is indexed). > > What I do is keeping a list of just the md5 of teh cached files in memory > > in my app, before going to mysql, I frist check if it's in the list (realy > > a RB-Tree). > --- > It is 1024 chars long. Witch want still help. MSSQL 2005 and up is > longer, if your interested: > http://msdn.microsoft.com/en-us/library/ms143432.aspx > But that greatly depends on your data size 900 bytes is the limit but > can be exceeded. > > You can use either one if you do a unique key id name for the index. > File name to Unique short name. I would not store images in either one > as your SELECT LIKE and Random will kill it. As much as I like DBs I > have to say the flat file system is for those. > > John --- Just a random thought on Hashes VIA DB that none hardly give any thought about. Using Extended Stored Procedures like:MSSQL. You can make your on hashes on the file insert. USE master; EXEC sp_extendedproc 'your_md5', 'your_md5.dll' Of course you will have to create your own .DLL to to do the Hashing. Then create your on functions: SELECT dbo.your_md5('YourHash'); Direct: EXEC master.dbo.your_md5 'YourHash' However I have not a clue that this is even doable in MySQL. John ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Update Issue
On Sat, Jul 11, 2009 at 3:20 AM, Olaf Mueller wrote: > Ron Blizzard wrote: > > Hello. > >> Missing Dependency: libavcodec.so.51 is needed by package >> transcode-1.0.5-1.el5.rf.i386 (installed) >> Missing Dependency: libx264.so.55 is needed by package >> transcode-1.0.5-1.el5.rf.i386 (installed) > Ask in the rpmforge list. This has nothing to do with CentOS repo. > > > regards > Olaf Will do so. Thanks. -- RonB -- Using CentOS 5.3 ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] Firefox 3.5 Issues
I downloaded Firefox 3.5 from the M. Harris site and, for the most part, have had good luck with it. But I have also had hard crashes that take down CentOS, not just Firefox. It happened to me twice on eBay (on the same page) -- and now I can replicate it as many times as I want by going to... http://wiki.centos.org/Newsletter ...and choosing one of the two newsletters, linked there. I'm writing for a couple reasons. I'm curious to see if this is only my problem, or if other have experienced it or can replicate it. And I was also wondering how I can remove Firefox 3.5 and its associated support files and get back to the regular repository. Firefox 3.5, for the most part, is very stable and is snappier than 3.0.11 -- only two sites have pages have crashed for me, but when it crashes it takes down my whole system and that's not good. And, if someone could mention this on the forum (there is a thread about Firefox 3.5) I would appreciate it. I can't log on to the forums. I am supposed to ask for another password, but I can't remember what my original email address was (2 or 3 years ago). Thanks for any pointers. -- RonB -- Using CentOS 5.3 ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Firefox 3.5 Issues
At Sat, 11 Jul 2009 18:23:18 -0500 CentOS mailing list wrote: > > I downloaded Firefox 3.5 from the M. Harris site and, for the most > part, have had good luck with it. But I have also had hard crashes > that take down CentOS, not just Firefox. > > It happened to me twice on eBay (on the same page) -- and now I can > replicate it as many times as I want by going to... > > http://wiki.centos.org/Newsletter > > ...and choosing one of the two newsletters, linked there. > > I'm writing for a couple reasons. > > I'm curious to see if this is only my problem, or if other have > experienced it or can replicate it. I have heard about various problems people over in MS-Windows land have had with Firefox 3.5. It appears that FF 3.5 is not quite ready for production systems. > > And I was also wondering how I can remove Firefox 3.5 and its > associated support files and get back to the regular repository. rpm -hUv --oldpackage firefox-3.0.10-.i386.rpm Were there other dependcies? If so, you may have roll them back too. > > Firefox 3.5, for the most part, is very stable and is snappier than > 3.0.11 -- only two sites have pages have crashed for me, but when it > crashes it takes down my whole system and that's not good. > > And, if someone could mention this on the forum (there is a thread > about Firefox 3.5) I would appreciate it. I can't log on to the > forums. I am supposed to ask for another password, but I can't > remember what my original email address was (2 or 3 years ago). > > Thanks for any pointers. > -- Robert Heller -- 978-544-6933 Deepwoods Software-- Download the Model Railroad System http://www.deepsoft.com/ -- Binaries for Linux and MS-Windows hel...@deepsoft.com -- http://www.deepsoft.com/ModelRailroadSystem/ ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Firefox 3.5 Issues
Ron Blizzard wrote: > I downloaded Firefox 3.5 from the M. Harris site and, for the most > part, have had good luck with it. But I have also had hard crashes > that take down CentOS, not just Firefox. > > It happened to me twice on eBay (on the same page) -- and now I can > replicate it as many times as I want by going to... > > http://wiki.centos.org/Newsletter > > ...and choosing one of the two newsletters, linked there. > > I'm writing for a couple reasons. > > I'm curious to see if this is only my problem, or if other have > experienced it or can replicate it. Confirmed - also i386 mharris packaging. Tring x86_64 (rebuild of his src.rpm) momentarily ... ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Firefox 3.5 Issues
Ron Blizzard wrote: > I downloaded Firefox 3.5 from the M. Harris site and, for the most > part, have had good luck with it. But I have also had hard crashes > that take down CentOS, not just Firefox. > > It happened to me twice on eBay (on the same page) -- and now I can > replicate it as many times as I want by going to... > > http://wiki.centos.org/Newsletter > > ...and choosing one of the two newsletters, linked there. > > I'm writing for a couple reasons. > > I'm curious to see if this is only my problem, or if other have > experienced it or can replicate it. My x86_64 build does not crash on the newsletters, interestingly enough. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Firefox 3.5 Issues
Michael A. Peters wrote: >> I'm writing for a couple reasons. >> >> I'm curious to see if this is only my problem, or if other have >> experienced it or can replicate it. > > Confirmed - also i386 mharris packaging. update - it didn't actually bring the OS down, it brought X11 down - though it looked like CentOS was down. Pressing the power button (thinkpad) resulted in X11 restarting followed by a clean shutdown. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Firefox 3.5 Issues
Michael A. Peters wrote: > Michael A. Peters wrote: > >>> I'm writing for a couple reasons. >>> >>> I'm curious to see if this is only my problem, or if other have >>> experienced it or can replicate it. >> Confirmed - also i386 mharris packaging. > > update - it didn't actually bring the OS down, it brought X11 down - > though it looked like CentOS was down. Pressing the power button > (thinkpad) resulted in X11 restarting followed by a clean shutdown. My own i386 build of firefox also does not crash (built when I did x86_64 build but I never installed it until now). Only difference in spec file was I added pcre-devel to BuildRequires before building it - but the my build also has all the language packs. Not sure if that's the difference or not. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Firefox 3.5 Issues
Ron Blizzard wrote: > I downloaded Firefox 3.5 from the M. Harris site and, for the most > part, have had good luck with it. But I have also had hard crashes > that take down CentOS, not just Firefox. > > It happened to me twice on eBay (on the same page) -- and now I can > replicate it as many times as I want by going to... > > http://wiki.centos.org/Newsletter > > ...and choosing one of the two newsletters, linked there. > > FWIW (about $0.0002), Firefox 3.5 on Windows XP (32bit) doesn't crash on either newsletter. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Is there an openssh security problem?
I think if you use double authentication (both keys and a password) and put your SSH server on a different port then you are doing the best you can. You hope to prevent a 0-day but you cannot fully protect yourself... James On Fri, Jul 10, 2009 at 7:06 PM, Rob Townley wrote: > On Fri, Jul 10, 2009 at 9:33 AM, Peter Kjellstrom wrote: > > On Friday 10 July 2009, Rob Kampen wrote: > >> Coert Waagmeester wrote: > > ... > >> > it only allows one NEW connection to ssh per minute. > >> > > >> > That is also a good protection right? > > ... > >> Not really protection - rather a deterrent - it just makes it slower for > >> the script kiddies that try brute force attacks > > > > Basically it's not so much about protection in the end as it is about > keeping > > your secure-log readable. Or maybe also a sense of being secure... > > > > It's always good to limit your exposure but you really have to weigh cost > > against the win. Two examples: > > > > Limit from which hosts you can login to a server: > > Configuration cost: trivial setup (one iptables line) > > Additional cost: between no impact and some impact depending on your > habits > > Positive effect: 99.9+% of all scans and login attempts are now gone > > Verdict: Clear win as long as the set of servers are easily identifiable > > > > Elaborate knocking/blocking setup: > > Configuration cost: significant (include keeping it up-to-date) > > Additional cost: setup of clients for knocking, use of -p XXX for new > port > > Positive effect: "standard scans" will probably miss but not air tight > > Verdict: Harder to judge, I think it's often not worth it > > > > Other things worth looking into are, for example, access.conf > (pam_access.so) > > and ensuring that non-trivial passwords are used. > > > > my €0.02, > > Peter > > > > ___ > > CentOS mailing list > > CentOS@centos.org > > http://lists.centos.org/mailman/listinfo/centos > > > > > > Virtual Networks are such as tinc-vpn.org or hamachi create an > encrypted network only accessible to members of the virtual network. > So if your server's virtual nic has an address of 5.4.3.2, then the > only other host that may see your server would be your laptop with > address 5.4.3.3. No other internet hosts would even see 5.4.3.2... > It is like IPSec, but much easier. > ___ > CentOS mailing list > CentOS@centos.org > http://lists.centos.org/mailman/listinfo/centos > -- http://www.goldwatches.com ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos