Mike, it sounds like what you need is the commercial version. Sorry to be a "sales guy" on the community list but I just point this out FYI that this is the kind of problem the commercial version was designed to address.
Mark On 01/19/2011 08:10 PM, Mike Svoboda wrote: > Sounds good to me with the feature request! But to solve the problem I am > trying to accomplish.... > > The only workaround I can think would be to split my "client list" up into 4 > or 5 separate files, and then execute the copy_from against a file based > upon a class based time? > > Or does anyone else have a more elegant solution on how to grab data from > hundreds of clients? This seems to be a scalability problem someone must > have hit before. > > > > > On 1/19/11 10:59 AM, "Seva Gluschenko" <seva.glusche...@gmail.com> wrote: > >> Mike, >> >> cf-aget holds the socket because of connection caching which seems >> quite reasonable for an ordinary agent tasks. You've just put it into >> condition where caching harms. If we were discussing the ideal model, >> cf-agent would detect the maximum number of file descriptors available >> and enforce cache expiration upon reaching some high watermark. Since >> such model would require too much efforts to implement, more >> straightforward solution is to create feature request about some sort >> of keepalive => "false"; option in body copy_from. >> >> 2011/1/19 Mike Svoboda <msvob...@linkedin.com>: >>> I¹ve enabled my Cfengine infrastructure to perform 2 way data transfers. My >>> clients are configured to run cf-serverd, so my Master Policy Server can >>> login to pull some files off of each machine. To accomplish this, I execute >>> two policies. >>> >>> policy 1 extracts all the clients the MPS has seen from the lastseen >>> database and dumps this info to a text file. >>> policy 2 reads the text file, and instructs cf-agent on the MPS to loop >>> through and pull down data from each ³client machine² from >>> /var/cfengine/outgoing. >>> >>> Here¹s policy 2 which executes on my MPS. >>> >>> bundle agent grab_client_cfreport_output >>> { >>> vars: >>> "host_array_size" >>> int => readstringarray("host_array", >>> "/export/apps/cfengine-client-data/active_clients.txt","#[^\n]*","[\n]",99999 >>> 9999,9999999); >>> "real_machine_name" slist => getindices("host_array"); >>> >>> files: >>> # This transfers all reporting data from the clients to the Master >>> Policy Server >>> "/export/apps/cfengine-client-data/$(real_machine_name)" >>> handle => "grab_client_data", >>> copy_from => >>> remote_copy("/var/cfengine/outgoing","$(real_machine_name)"), >>> depth_search => recurse("inf"), >>> action => immediate; >>> } >>> ######################################################### >>> body copy_from remote_copy(sourcedir,sourceserver) >>> { >>> source => "$(sourcedir)"; >>> servers => { "$(sourceserver)" }; >>> copy_backup => "false"; >>> purge => "false"; >>> trustkey => "true"; >>> collapse_destination_dir => "true"; >>> encrypt => "true"; >>> } >>> >>> >>> >>> >>> When this executes, cf-agent holds open a socket for each client it connects >>> to. It doesn¹t close the socket when it moves onto the next machine. The >>> downside of this, is that this master policy server has to reach out and >>> grab data from 900 clients, which means I end up with a TON of open file >>> descriptors with socket information. >>> >>> I¹ve raised ulimit n (open file descriptors) to 2048, but cf-agent doesn¹t >>> seem very happy. >>> # ulimit -a >>> core file size (blocks, -c) unlimited >>> data seg size (kbytes, -d) unlimited >>> file size (blocks, -f) unlimited >>> open files (-n) 2048 >>> pipe size (512 bytes, -p) 10 >>> stack size (kbytes, -s) 10240 >>> cpu time (seconds, -t) unlimited >>> max user processes (-u) 16357 >>> virtual memory (kbytes, -v) unlimited >>> >>> >>> >>> Through about 500 client transfers, cf-agent is happy. Then I start >>> hitting these messages below. The file transfer still succeeds, but it >>> looks nasty. >>> >>> >>> -> Updated >>> /export/apps/cfengine-client-data/ela4-cs44.prod/monitor_summary.html from >>> source /var/cfengine/outgoing/reports/monitor_summary.html on ela4-cs44.prod >>> -> Updated >>> /export/apps/cfengine-client-data/ela4-cs44.prod/performance.html from >>> source /var/cfengine/outgoing/reports/performance.html on ela4-cs44.prod >>> -> Updated /export/apps/cfengine-client-data/ela4-cs44.prod/lastseen.html >>> from source /var/cfengine/outgoing/reports/lastseen.html on ela4-cs44.prod >>> -> Updated /export/apps/cfengine-client-data/ela4-cs44.prod/classes.html >>> from source /var/cfengine/outgoing/reports/classes.html on ela4-cs44.prod >>> -> Updated /export/apps/cfengine-client-data/ela4-cs44.prod/class_notes >>> from source /var/cfengine/outgoing/reports/class_notes on ela4-cs44.prod >>> -> Updated /export/apps/cfengine-client-data/ela4-cs44.prod/audit.html from >>> source /var/cfengine/outgoing/reports/audit.html on ela4-cs44.prod >>> -> Updated >>> /export/apps/cfengine-client-data/ela4-be174.prod/monitor_summary.html from >>> source /var/cfengine/outgoing/reports/monitor_summary.html on >>> ela4-be174.prod >>> -> Updated >>> /export/apps/cfengine-client-data/ela4-be174.prod/performance.html from >>> source /var/cfengine/outgoing/reports/performance.html on ela4-be174.prod >>> -> Updated /export/apps/cfengine-client-data/ela4-be174.prod/lastseen.html >>> from source /var/cfengine/outgoing/reports/lastseen.html on ela4-be174.prod >>> -> Updated /export/apps/cfengine-client-data/ela4-be174.prod/classes.html >>> from source /var/cfengine/outgoing/reports/classes.html on ela4-be174.prod >>> -> Updated /export/apps/cfengine-client-data/ela4-be174.prod/class_notes >>> from source /var/cfengine/outgoing/reports/class_notes on ela4-be174.prod >>> -> Updated /export/apps/cfengine-client-data/ela4-be174.prod/audit.html >>> from source /var/cfengine/outgoing/reports/audit.html on ela4-be174.prod >>> -> Copying from >>> ela4-be520.prod:/var/cfengine/outgoing/reports/monitor_summary.html >>> -> Copying from >>> ela4-be520.prod:/var/cfengine/outgoing/reports/performance.html >>> -> Copying from >>> ela4-be520.prod:/var/cfengine/outgoing/reports/lastseen.html >>> -> Copying from ela4-be520.prod:/var/cfengine/outgoing/reports/hashes.html >>> -> Copying from ela4-be520.prod:/var/cfengine/outgoing/reports/classes.html >>> -> Copying from ela4-be520.prod:/var/cfengine/outgoing/reports/class_notes >>> -> Copying from ela4-be520.prod:/var/cfengine/outgoing/reports/audit.html >>> -> Copying from ela4-be520.prod:/var/cfengine/outgoing/cm.conf >>> Couldn't find a public key (/var/cfengine/ppkeys/root-ela4-be298.prod.pub) - >>> use cf-key to get one >>> !!! System error for fopen: "Too many open files" >>> -> Trusting server identity, promise to accept key from >>> ela4-be298.prod=172.17.135.198 >>> -> Updated >>> /export/apps/cfengine-client-data/ela4-be298.prod/monitor_summary.html from >>> source /var/cfengine/outgoing/reports/monitor_summary.html on >>> ela4-be298.prod >>> -> Updated >>> /export/apps/cfengine-client-data/ela4-be298.prod/performance.html from >>> source /var/cfengine/outgoing/reports/performance.html on ela4-be298.prod >>> -> Updated /export/apps/cfengine-client-data/ela4-be298.prod/lastseen.html >>> from source /var/cfengine/outgoing/reports/lastseen.html on ela4-be298.prod >>> -> Updated /export/apps/cfengine-client-data/ela4-be298.prod/classes.html >>> from source /var/cfengine/outgoing/reports/classes.html on ela4-be298.prod >>> -> Updated /export/apps/cfengine-client-data/ela4-be298.prod/audit.html >>> from source /var/cfengine/outgoing/reports/audit.html on ela4-be298.prod >>> Couldn't read file /var/cfengine/cfagent.ela4-41105-js01.prod.log for >>> editing >>> !!! System reports error for fopen: "Too many open files" >>> Couldn't find a public key (/var/cfengine/ppkeys/root-ela4-be420.prod.pub) - >>> use cf-key to get one >>> !!! System error for fopen: "Too many open files" >>> -> Trusting server identity, promise to accept key from >>> ela4-be420.prod=172.17.137.192 >>> -> Updated >>> /export/apps/cfengine-client-data/ela4-be420.prod/performance.html from >>> source /var/cfengine/outgoing/reports/performance.html on ela4-be420.prod >>> -> Updated /export/apps/cfengine-client-data/ela4-be420.prod/lastseen.html >>> from source /var/cfengine/outgoing/reports/lastseen.html on ela4-be420.prod >>> -> Updated /export/apps/cfengine-client-data/ela4-be420.prod/classes.html >>> from source /var/cfengine/outgoing/reports/classes.html on ela4-be420.prod >>> -> Updated /export/apps/cfengine-client-data/ela4-be420.prod/audit.html >>> from source /var/cfengine/outgoing/reports/audit.html on ela4-be420.prod >>> Couldn't read file /var/cfengine/cfagent.ela4-41105-js01.prod.log for >>> editing >>> !!! System reports error for fopen: "Too many open files" >>> Couldn't find a public key (/var/cfengine/ppkeys/root-ela4-ss143.prod.pub) - >>> use cf-key to get one >>> >>> >>> Running a pfiles on cf-agent, here¹s all the open sockets I see. They >>> aren¹t being released after every client transfer, so it piles up in >>> cf-agent. >>> >>> >>> $ pfiles 5610 >>> 5610: /var/cfengine/bin/cf-agent -I -K >>> Current rlimit: 2048 file descriptors >>> 0: S_IFCHR mode:0620 dev:286,0 ino:12582918 uid:3378 gid:7 rdev:24,1 >>> O_RDWR|O_NOCTTY|O_LARGEFILE >>> /devices/pseudo/pts@0:1 >>> 1: S_IFCHR mode:0620 dev:286,0 ino:12582918 uid:3378 gid:7 rdev:24,1 >>> O_RDWR|O_NOCTTY|O_LARGEFILE >>> /devices/pseudo/pts@0:1 >>> 2: S_IFCHR mode:0620 dev:286,0 ino:12582918 uid:3378 gid:7 rdev:24,1 >>> O_RDWR|O_NOCTTY|O_LARGEFILE >>> /devices/pseudo/pts@0:1 >>> 3: S_IFDOOR mode:0444 dev:295,0 ino:56 uid:0 gid:0 size:0 >>> O_RDONLY|O_LARGEFILE FD_CLOEXEC door to nscd[215] >>> /var/run/name_service_door >>> 4: S_IFREG mode:0644 dev:30,131 ino:8466 uid:0 gid:1 size:16384 >>> O_RDWR|O_CREAT|O_LARGEFILE FD_CLOEXEC >>> /var/cfengine/cf_Audit.db >>> .... >>> ........ >>> ... >>> 758: S_IFSOCK mode:0666 dev:293,0 ino:2553 uid:0 gid:0 size:0 >>> O_RDWR >>> SOCK_STREAM >>> SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(232.193.0.0) >>> sockname: AF_INET 172.17.130.245 port: 41914 >>> peername: AF_INET 172.17.137.174 port: 5308 >>> 759: S_IFSOCK mode:0666 dev:293,0 ino:17601 uid:0 gid:0 size:0 >>> O_RDWR >>> SOCK_STREAM >>> SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(232.193.0.0) >>> sockname: AF_INET 172.17.130.245 port: 41915 >>> peername: AF_INET 172.17.137.73 port: 5308 >>> 760: S_IFSOCK mode:0666 dev:293,0 ino:44388 uid:0 gid:0 size:0 >>> O_RDWR >>> SOCK_STREAM >>> SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(232.193.0.0) >>> sockname: AF_INET 172.17.130.245 port: 41916 >>> peername: AF_INET 172.17.138.159 port: 5308 >>> >>> >>> >>> Anyways, is there a way to instruct cf-agent to close the socket when the >>> copy_from is complete, or does anyone else have a better approach for what I >>> am trying to accomplish? >>> >>> Thanks >>> Mike >>> _______________________________________________ >>> Help-cfengine mailing list >>> Help-cfengine@cfengine.org >>> https://cfengine.org/mailman/listinfo/help-cfengine >>> >>> >> >> > > _______________________________________________ > Help-cfengine mailing list > Help-cfengine@cfengine.org > https://cfengine.org/mailman/listinfo/help-cfengine _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine