Mike,

it sounds like what you need is the commercial version. Sorry to be a
"sales guy" on the community list but I just point this out FYI that
this is the kind of problem the commercial version was designed to address.

Mark

On 01/19/2011 08:10 PM, Mike Svoboda wrote:
> Sounds good to me with the feature request!  But to solve the problem I am
> trying to accomplish....
> 
> The only workaround I can think would be to split my "client list" up into 4
> or 5 separate files, and then execute the copy_from against a file based
> upon a class based time?
> 
> Or does anyone else have a more elegant solution on how to grab data from
> hundreds of clients?  This seems to be a scalability problem someone must
> have hit before.  
> 
> 
> 
> 
> On 1/19/11 10:59 AM, "Seva Gluschenko" <seva.glusche...@gmail.com> wrote:
> 
>> Mike,
>>
>> cf-aget holds the socket because of connection caching which seems
>> quite reasonable for an ordinary agent tasks. You've just put it into
>> condition where caching harms. If we were discussing the ideal model,
>> cf-agent would detect the maximum number of file descriptors available
>> and enforce cache expiration upon reaching some high watermark. Since
>> such model would require too much efforts to implement, more
>> straightforward solution is to create feature request about some sort
>> of keepalive => "false"; option in body copy_from.
>>
>> 2011/1/19 Mike Svoboda <msvob...@linkedin.com>:
>>> I¹ve enabled my Cfengine infrastructure to perform 2 way data transfers.  My
>>> clients are configured to run cf-serverd, so my Master Policy Server can
>>> login to pull some files off of each machine.  To accomplish this, I execute
>>> two policies.
>>>
>>> policy 1 extracts all the clients the MPS has seen from the lastseen
>>> database and dumps this info to a text file.
>>> policy 2 reads the text file, and instructs cf-agent on the MPS to loop
>>> through and pull down data from each ³client machine² from
>>> /var/cfengine/outgoing.
>>>
>>> Here¹s policy 2 which executes on my MPS.
>>>
>>> bundle agent grab_client_cfreport_output
>>> {
>>> vars:
>>>         "host_array_size"
>>>                 int => readstringarray("host_array",
>>> "/export/apps/cfengine-client-data/active_clients.txt","#[^\n]*","[\n]",99999
>>> 9999,9999999);
>>>         "real_machine_name"     slist   =>      getindices("host_array");
>>>
>>> files:
>>>         # This transfers all reporting data from the clients to the Master
>>> Policy Server
>>>         "/export/apps/cfengine-client-data/$(real_machine_name)"
>>>                 handle                          =>      "grab_client_data",
>>>                 copy_from                       =>
>>>      remote_copy("/var/cfengine/outgoing","$(real_machine_name)"),
>>>                 depth_search                    =>      recurse("inf"),
>>>                 action                          =>      immediate;
>>> }
>>> #########################################################
>>> body copy_from remote_copy(sourcedir,sourceserver)
>>> {
>>>         source          =>      "$(sourcedir)";
>>>         servers         =>      { "$(sourceserver)" };
>>>         copy_backup     =>      "false";
>>>         purge           =>      "false";
>>>         trustkey        =>      "true";
>>>         collapse_destination_dir        =>      "true";
>>>         encrypt         =>      "true";
>>> }
>>>
>>>
>>>
>>>
>>> When this executes, cf-agent holds open a socket for each client it connects
>>> to.  It doesn¹t close the socket when it moves onto the next machine.  The
>>> downside of this, is that this master policy server has to reach out and
>>> grab data from 900 clients, which means I end up with a TON of open file
>>> descriptors with socket information.
>>>
>>> I¹ve raised ulimit ­n (open file descriptors) to 2048, but cf-agent doesn¹t
>>> seem very happy.
>>> # ulimit -a
>>> core file size        (blocks, -c) unlimited
>>> data seg size         (kbytes, -d) unlimited
>>> file size             (blocks, -f) unlimited
>>> open files                    (-n) 2048
>>> pipe size          (512 bytes, -p) 10
>>> stack size            (kbytes, -s) 10240
>>> cpu time             (seconds, -t) unlimited
>>> max user processes            (-u) 16357
>>> virtual memory        (kbytes, -v) unlimited
>>>
>>>
>>>
>>> Through about 500 client transfers, cf-agent is happy.    Then I start
>>> hitting these messages below.  The file transfer still succeeds, but it
>>> looks nasty.
>>>
>>>
>>>  -> Updated
>>> /export/apps/cfengine-client-data/ela4-cs44.prod/monitor_summary.html from
>>> source /var/cfengine/outgoing/reports/monitor_summary.html on ela4-cs44.prod
>>>  -> Updated
>>> /export/apps/cfengine-client-data/ela4-cs44.prod/performance.html from
>>> source /var/cfengine/outgoing/reports/performance.html on ela4-cs44.prod
>>>  -> Updated /export/apps/cfengine-client-data/ela4-cs44.prod/lastseen.html
>>> from source /var/cfengine/outgoing/reports/lastseen.html on ela4-cs44.prod
>>>  -> Updated /export/apps/cfengine-client-data/ela4-cs44.prod/classes.html
>>> from source /var/cfengine/outgoing/reports/classes.html on ela4-cs44.prod
>>>  -> Updated /export/apps/cfengine-client-data/ela4-cs44.prod/class_notes
>>> from source /var/cfengine/outgoing/reports/class_notes on ela4-cs44.prod
>>>  -> Updated /export/apps/cfengine-client-data/ela4-cs44.prod/audit.html from
>>> source /var/cfengine/outgoing/reports/audit.html on ela4-cs44.prod
>>>  -> Updated
>>> /export/apps/cfengine-client-data/ela4-be174.prod/monitor_summary.html from
>>> source /var/cfengine/outgoing/reports/monitor_summary.html on
>>> ela4-be174.prod
>>>  -> Updated
>>> /export/apps/cfengine-client-data/ela4-be174.prod/performance.html from
>>> source /var/cfengine/outgoing/reports/performance.html on ela4-be174.prod
>>>  -> Updated /export/apps/cfengine-client-data/ela4-be174.prod/lastseen.html
>>> from source /var/cfengine/outgoing/reports/lastseen.html on ela4-be174.prod
>>>  -> Updated /export/apps/cfengine-client-data/ela4-be174.prod/classes.html
>>> from source /var/cfengine/outgoing/reports/classes.html on ela4-be174.prod
>>>  -> Updated /export/apps/cfengine-client-data/ela4-be174.prod/class_notes
>>> from source /var/cfengine/outgoing/reports/class_notes on ela4-be174.prod
>>>  -> Updated /export/apps/cfengine-client-data/ela4-be174.prod/audit.html
>>> from source /var/cfengine/outgoing/reports/audit.html on ela4-be174.prod
>>>  -> Copying from
>>> ela4-be520.prod:/var/cfengine/outgoing/reports/monitor_summary.html
>>>  -> Copying from
>>> ela4-be520.prod:/var/cfengine/outgoing/reports/performance.html
>>>  -> Copying from
>>> ela4-be520.prod:/var/cfengine/outgoing/reports/lastseen.html
>>>  -> Copying from ela4-be520.prod:/var/cfengine/outgoing/reports/hashes.html
>>>  -> Copying from ela4-be520.prod:/var/cfengine/outgoing/reports/classes.html
>>>  -> Copying from ela4-be520.prod:/var/cfengine/outgoing/reports/class_notes
>>>  -> Copying from ela4-be520.prod:/var/cfengine/outgoing/reports/audit.html
>>>  -> Copying from ela4-be520.prod:/var/cfengine/outgoing/cm.conf
>>> Couldn't find a public key (/var/cfengine/ppkeys/root-ela4-be298.prod.pub) -
>>> use cf-key to get one
>>>  !!! System error for fopen: "Too many open files"
>>>  -> Trusting server identity, promise to accept key from
>>> ela4-be298.prod=172.17.135.198
>>>  -> Updated
>>> /export/apps/cfengine-client-data/ela4-be298.prod/monitor_summary.html from
>>> source /var/cfengine/outgoing/reports/monitor_summary.html on
>>> ela4-be298.prod
>>>  -> Updated
>>> /export/apps/cfengine-client-data/ela4-be298.prod/performance.html from
>>> source /var/cfengine/outgoing/reports/performance.html on ela4-be298.prod
>>>  -> Updated /export/apps/cfengine-client-data/ela4-be298.prod/lastseen.html
>>> from source /var/cfengine/outgoing/reports/lastseen.html on ela4-be298.prod
>>>  -> Updated /export/apps/cfengine-client-data/ela4-be298.prod/classes.html
>>> from source /var/cfengine/outgoing/reports/classes.html on ela4-be298.prod
>>>  -> Updated /export/apps/cfengine-client-data/ela4-be298.prod/audit.html
>>> from source /var/cfengine/outgoing/reports/audit.html on ela4-be298.prod
>>> Couldn't read file /var/cfengine/cfagent.ela4-41105-js01.prod.log for
>>> editing
>>>  !!! System reports error for fopen: "Too many open files"
>>> Couldn't find a public key (/var/cfengine/ppkeys/root-ela4-be420.prod.pub) -
>>> use cf-key to get one
>>>  !!! System error for fopen: "Too many open files"
>>>  -> Trusting server identity, promise to accept key from
>>> ela4-be420.prod=172.17.137.192
>>>  -> Updated
>>> /export/apps/cfengine-client-data/ela4-be420.prod/performance.html from
>>> source /var/cfengine/outgoing/reports/performance.html on ela4-be420.prod
>>>  -> Updated /export/apps/cfengine-client-data/ela4-be420.prod/lastseen.html
>>> from source /var/cfengine/outgoing/reports/lastseen.html on ela4-be420.prod
>>>  -> Updated /export/apps/cfengine-client-data/ela4-be420.prod/classes.html
>>> from source /var/cfengine/outgoing/reports/classes.html on ela4-be420.prod
>>>  -> Updated /export/apps/cfengine-client-data/ela4-be420.prod/audit.html
>>> from source /var/cfengine/outgoing/reports/audit.html on ela4-be420.prod
>>> Couldn't read file /var/cfengine/cfagent.ela4-41105-js01.prod.log for
>>> editing
>>>  !!! System reports error for fopen: "Too many open files"
>>> Couldn't find a public key (/var/cfengine/ppkeys/root-ela4-ss143.prod.pub) -
>>> use cf-key to get one
>>>
>>>
>>> Running a pfiles on cf-agent, here¹s all the open sockets I see.  They
>>> aren¹t being released after every client transfer, so it piles up in
>>> cf-agent.
>>>
>>>
>>> $ pfiles 5610
>>> 5610:   /var/cfengine/bin/cf-agent -I -K
>>>   Current rlimit: 2048 file descriptors
>>>    0: S_IFCHR mode:0620 dev:286,0 ino:12582918 uid:3378 gid:7 rdev:24,1
>>>       O_RDWR|O_NOCTTY|O_LARGEFILE
>>>       /devices/pseudo/pts@0:1
>>>    1: S_IFCHR mode:0620 dev:286,0 ino:12582918 uid:3378 gid:7 rdev:24,1
>>>       O_RDWR|O_NOCTTY|O_LARGEFILE
>>>       /devices/pseudo/pts@0:1
>>>    2: S_IFCHR mode:0620 dev:286,0 ino:12582918 uid:3378 gid:7 rdev:24,1
>>>       O_RDWR|O_NOCTTY|O_LARGEFILE
>>>       /devices/pseudo/pts@0:1
>>>    3: S_IFDOOR mode:0444 dev:295,0 ino:56 uid:0 gid:0 size:0
>>>       O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[215]
>>>       /var/run/name_service_door
>>>    4: S_IFREG mode:0644 dev:30,131 ino:8466 uid:0 gid:1 size:16384
>>>       O_RDWR|O_CREAT|O_LARGEFILE FD_CLOEXEC
>>>       /var/cfengine/cf_Audit.db
>>> ....
>>> ........
>>> ...
>>>  758: S_IFSOCK mode:0666 dev:293,0 ino:2553 uid:0 gid:0 size:0
>>>       O_RDWR
>>>         SOCK_STREAM
>>>         SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(232.193.0.0)
>>>         sockname: AF_INET 172.17.130.245  port: 41914
>>>         peername: AF_INET 172.17.137.174  port: 5308
>>>  759: S_IFSOCK mode:0666 dev:293,0 ino:17601 uid:0 gid:0 size:0
>>>       O_RDWR
>>>         SOCK_STREAM
>>>         SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(232.193.0.0)
>>>         sockname: AF_INET 172.17.130.245  port: 41915
>>>         peername: AF_INET 172.17.137.73  port: 5308
>>>  760: S_IFSOCK mode:0666 dev:293,0 ino:44388 uid:0 gid:0 size:0
>>>       O_RDWR
>>>         SOCK_STREAM
>>>         SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(232.193.0.0)
>>>         sockname: AF_INET 172.17.130.245  port: 41916
>>>         peername: AF_INET 172.17.138.159  port: 5308
>>>
>>>
>>>
>>> Anyways, is there a way to instruct cf-agent to close the socket when the
>>> copy_from is complete, or does anyone else have a better approach for what I
>>> am trying to accomplish?
>>>
>>> Thanks
>>> Mike
>>> _______________________________________________
>>> Help-cfengine mailing list
>>> Help-cfengine@cfengine.org
>>> https://cfengine.org/mailman/listinfo/help-cfengine
>>>
>>>
>>
>>
> 
> _______________________________________________
> Help-cfengine mailing list
> Help-cfengine@cfengine.org
> https://cfengine.org/mailman/listinfo/help-cfengine
_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to