Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread Sean Carolan
>> *sigh* >> awk is not "cut". What you want is >> awk '{if (/[-\.0-9a-z][-\.0-9a-z]*.com/) { print $9;}}' | sort -u I ended up using this construct in my code; this one fetches out servers that are having issues checking in with puppet: awk '{if (/Could not find default node or by name with/) {

Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread m . roth
Woodchuck wrote: > On Thu, Jun 28, 2012 at 01:30:33PM -0500, Sean Carolan wrote: >> This snippet of code pulls an array of hostnames from some log files. >> It has to parse around 3GB of log files, so I'm keen on making it as >> efficient as possible. Can you think of any way to optimize this to >

Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread Woodchuck
On Thu, Jun 28, 2012 at 01:30:33PM -0500, Sean Carolan wrote: > This snippet of code pulls an array of hostnames from some log files. > It has to parse around 3GB of log files, so I'm keen on making it as > efficient as possible. Can you think of any way to optimize this to > run faster? If the k

Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread Sean Carolan
> *sigh* > awk is not "cut". What you want is > awk '{if (/[-\.0-9a-z][-\.0-9a-z]*.com/) { print $9;}}' | sort -u > > No grep needed; awk looks for what you want *first* this way. Thanks, Mark. This is cleaner code but it benchmarked slower than awk then grep. real3m35.550s user2m7.186s

Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread m . roth
Sean Carolan wrote: > Thank you Mark and Gordon. Since the hostnames I needed to collect > are in the same field, at least in the lines of the file that are > important. I ended up using suggestions from both of you, the code is > like this now. The egrep is there to make sure whatever is in the

Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread Sean Carolan
Thank you Mark and Gordon. Since the hostnames I needed to collect are in the same field, at least in the lines of the file that are important. I ended up using suggestions from both of you, the code is like this now. The egrep is there to make sure whatever is in the 9th field looks like a doma

Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread Gordon Messmer
On 06/28/2012 12:15 PM, Gordon Messmer wrote: > You have two major performance problems in this script. First, UTF-8 > processing is slow. Second, wildcards are EXTREMELY SLOW! Naturally, you should test both on your own data. I'm amused to admit that I tested my own advice against my mail log

Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread Gordon Messmer
On 06/28/2012 11:30 AM, Sean Carolan wrote: > Can you think of any way to optimize this to run faster? > > HOSTS=() > for host in $(grep -h -o "[-\.0-9a-z][-\.0-9a-z]*.com" ${TMPDIR}/* | > sort | uniq); do > HOSTS+=("$host") > done You have two major performance problems in this script. Firs

Re: [CentOS] Optimizing grep, sort, uniq for speed

2012-06-28 Thread m . roth
Sean Carolan wrote: > This snippet of code pulls an array of hostnames from some log files. > It has to parse around 3GB of log files, so I'm keen on making it as > efficient as possible. Can you think of any way to optimize this to > run faster? > > HOSTS=() > for host in $(grep -h -o "[-\.0-9a-z