Re: python3 regex?

2016-09-10 Thread Doug OLeary
Hey, all;

thanks for the replies - reading data in one slurp vs line by line was the 
issue.  In my perl programs, when reading files, I generally do it all in one 
swell foop and will probably end up doing so again in this case due to the 
layout of the text; but, that's my issue.

Thanks again.  I appreciate the tip.

Doug O'Leary
-- 
https://mail.python.org/mailman/listinfo/python-list


iterating over multi-line string

2016-09-11 Thread Doug OLeary
Hey;

I have a multi-line string that's the result of reading a file filled with 
'dirty' text.  I read the file in one swoop to make data cleanup a bit easier - 
getting rid of extraneous tabs, spaces, newlines, etc.  That part's done.

Now, I want to collect data in each section of the data.  Sections are started 
with a specific header and end when the next header is found.

^1\. Upgrade to the latest version of Apache HTTPD
^2\. Disable insecure TLS/SSL protocol support
^3\. Disable SSLv2, SSLv3, and TLS 1.0. The best solution is to only have TLS 
1.2 enabled
^4\. Disable HTTP TRACE Method for Apache
[[snip]]

There's something like 60 lines of worthless text before that first header line 
so I thought I'd skip through them with:

x=0  # Current index
hx=1 # human readable index
rgs = '^' + str(hx) + r'\. ' + monster['vulns'][x]
hdr = re.compile(rgs)
for l in data.splitlines():
  while not hdr.match(l):
next(l)
  print(l)

which resulted in a typeerror stating that str is not an iterator.  More 
googling resulted in:

iterobj = iter(data.splitlines())

for l in iterobj:
  while not hdr.match(l):
next(iterobj)
  print(l)

I'm hoping to see that first header; however, I'm getting another error:

Traceback (most recent call last):
  File "./testies.py", line 30, in 
next(iterobj)
StopIteration

I'm not quite sure what that means... Does that mean I got to the end of data 
w/o finding my header?

Thanks for any hints/tips/suggestions.

Doug O'Leary
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: iterating over multi-line string

2016-09-11 Thread Doug OLeary
Hey;

Never mind; I finally found the meaning of stopiteration.  I guess my 
google-foo is a bit weak this morning.

Thanks

Doug
-- 
https://mail.python.org/mailman/listinfo/python-list


more python3 regex?

2016-09-11 Thread Doug OLeary
Hey

This one seems like it should be easy but I'm not getting the expected results.

I have a chunk of data over which I can iterate line by line and print out the 
expected results:

  for l in q.findall(data):
#   if re.match(r'(Name|")', l):
# continue
print(l)

$ ./testies.py | wc -l
197

I would like to skip any line that starts with 'Name' or a double quote:

$ ./testies.py | perl -ne 'print if (m{^Name} || m{^"})'
  
Name IP Address,Site,
"",,7 of 64
Name,IP Address,Site,
"",,,8 of 64
Name,IP Address,Site,
"",,,9 of 64
Name,IP Address,Site,
"",,,10 of 64
Name,IP Address,Site,
"",,,11 of 64
Name IP Address,Site,

$ ./testies.py | perl -ne 'print unless (m{^Name} || m{^"})' | wc -l
186


When I run with the two lines uncommented, *everything* gets skipped:

$ ./testies.py  
$

Same thing when I use a pre-defined pattern object:

skippers = re.compile(r'Name|"')
  for l in q.findall(data):
if skippers.match(l):
  continue
print(l)

Like I said, this seems like it should be pretty straight forward so I'm 
obviously missing something basic.  

Any hints/tips/suggestions gratefully accepted.

Doug O'Leary
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: more python3 regex?

2016-09-11 Thread Doug OLeary
Hey, all;

The print suggestion was the key clue.  Turned out my loop was slurping the 
whole of data in one big line.  Searching for a line that begins with Name when 
it's in the middle of the string is... obviously not going to work so well.

Took me a bit to get that working and, once I did, I realized I was on the 
wrong track altogether.  In perl, if possible, I will read a file entirely as 
manipulation of one large data structure is easier in some ways.  

Even with perl, though, that approach is the wrong one for this data.  While I 
learned lots, the key lesson is forcing data to match an algorithm works as 
well in python as it does in perl.  Go figure.

My 200+ script that didn't work so well is now 63 lines, including comments... 
and works perfectly.  

Outstanding!  Thanks for putting up with noob questions

Doug
-- 
https://mail.python.org/mailman/listinfo/python-list


xml parsing with lxml

2016-10-07 Thread Doug OLeary
Hey;

I'm trying to gather information from a number of weblogic configuration xml 
files using lxml.  I've found any number of tutorials on the web but they all 
seem to assume a knowledge that I apparently don't have... that, or I'm just 
being rock stupid today - that's distinct possibility too.

The xml looks like:



  Domain1
  10.3.5.0
  
[[snipp]]

[[realm children snipped]

myrealm
  
  [[snip]]

[[snip]]

   [[snip]]


  [[snip]]
   [[snip]]
  byTime
  14
  02:00
  Info

[[snip]]
40024
true
snip]]

  [[children snipped]]

${hostname}
${hostname}
40022
javac

   [[children snipped]

   [[rest snipped]
  


The tutorials all start out well enough with:

$ python 
Python 3.5.2 (default, Aug 22 2016, 09:04:07) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import etree
>>> doc = etree.parse('config.xml')

Now what?  For instance, how do I list the top level children of 
.*??  In that partial list, it'd be name, domain-version, 
security-configuration, log, and server.  

For some reason, I'm not able to make the conceptual leap to get to the first 
step of those tutorials.

The end goal of this exercise is to programatically identify weblogic clusters 
and their hosts.  

thanks

Doug O'Leary
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: xml parsing with lxml

2016-10-07 Thread Doug OLeary
On Friday, October 7, 2016 at 3:21:43 PM UTC-5, John Gordon wrote:
> root = doc.getroot()
> for child in root:
> print(child.tag)
> 

Excellent!  thank, you sir!  that'll get me started.  

Appreciate the reply.

Doug O'Leary
-- 
https://mail.python.org/mailman/listinfo/python-list


lxml and xpath(?)

2016-10-24 Thread Doug OLeary
Hey;

Reasonably new to python and incredibly new to xml much less trying to parse 
it. I need to identify cluster nodes from a series of weblogic xml 
configuration files. I've figured out how to get 75% of them; now, I'm going 
after the edge case and I'm unsure how to proceed.

Weblogic xml config files start with namespace definitions then a number of 
child elements some of which have children of their own.

The element that I'm interested in is  which will usually have a 
subelement called  containing the hostname that I'm looking for.

Following the paradigm of "we love standards, we got lots of them", this model 
doesn't work everywhere. Where it doesn't work, I need to look for a subelement 
of  called . That element contains an alias which is expanded 
in a different root child, at the same level as .

So, picture worth a 1000 words:


< [[ heinous namespace xml snipped ]] >
   [[text]]
   ...
   
  EDIServices_MS1
  ...
  EDIServices_MC1
  ...
   
   
  EDIServices_MS2
  ...
  EDIServices_MC2
  ...
   
   
 EDIServices_MC1
 
   EDIServices_MC1
   SSL
   host001
   7001
 
   
   
 EDIServices_MC2
 
   EDIServices_MC2
   host002
   7001
 
   


So, running it on 'normal' config, I get:

$ ./lxml configs/EntsvcSoa_Domain_config.xml  
EntsvcSoa_CS=> host003.myco.com
EntsvcSoa_CS   => host004.myco.com

Running it against the abi-normal config, I'm currently getting:

$ ./lxml configs/EDIServices_Domain_config.xml
EDIServices_CS => EDIServices_MC1
EDIServices_CS => EDIServices_MC2

Using the examples above, I would like to translate EDIServices_MC1 and 
EDIServices_MC2 to host001 and host002 respectively.

The primary loop is:

for server in root.findall('ns:server', namespaces):
  cs = server.find('ns:cluster', namespaces)
  if cs is None:
continue
  # cluster_name = server.find('ns:cluster', namespaces).text
  cluster_name = cs.text
  listen_address = server.find('ns:listen-address', namespaces)
  server_name = listen_address.text
  if server_name is None:
machine = server.find('ns:machine', namespaces)
if machine is None:
  continue
else:
  server_name = machine.text

  print("%-15s => %s" % (cluster_name, server_name))

(it's taken me days to write 12 lines of code... good thing I don't do this for 
a living :) )

Rephrased, I need to find the  under the  child who's 
name matches the name under the corresponding  child. From some of the 
examples on the web, I believe xpath might help but I've not been able to get 
even the simple examples working. Go figure, I just figured out what a 
namespace is...

Any hints/tips/suggestions greatly appreciated especially with complete noob 
tutorials for xpath.

Thanks for your time.

Doug O'Leary
-- 
https://mail.python.org/mailman/listinfo/python-list