Hi, list... I have written an application in Perl some time ago (I was young and needed the money) that parses multiple large text files containing nested data structures and allows the user to run quick queries on the data. (For the firewall admins among you: it's a parser and web-based query tool for CheckPoint firewall rulebases. The user can search for source and destination IPs and get the matching rules.)
The current application consists of two parts: (1) An importing process that reads and parses the large text files and writes the data in different PostgreSQL tables. (2) A web (CGI) interface that allows the user to query the collected data from the PostgreSQL database by different criteria. (I don't like PostgreSQL much due to the lack of decent tools like phpmyadmin. Pgadmin3 and Phppgadmin don't give me the feeling that I control the database. More the other way round. But PostgreSQL has a nice 'inet' data type that allows for quick matches in tables of IP addresses and networks.) However the information in the (relational) database was stored in a horribily artificial way. The SQL query is a 20-line monster with UNIONs and LEFT JOINs and negations. It's lightning fast (0.5 seconds to search over a 500 set consisting of complex rules) but neither the source code nor the database is easy to handle any more. And I'd like to have more flexibility in the kind of queries I run. So I'd like to trade the good speed by some readability and a simpler - more object-oriented - data structure. I'm currently thinking of different ways to handle that but would like to get some opinions about that: (a) See what sqlalchemy can do for me to handle the object-relational transformation and basically stay with PostgreSQL. (b) Parse the input files into one large nested Python data structure. Then write this structure to a file using "marshal" or "repr". Then I have a very clean source code like for rule in rules: for src in sources: if searched_for_src == src... (c) ...? What makes PostgreSQL less suited is the fact that CheckPoint rule bases can contain several complex objects: - Hosts (easy, they are just one IP address and can easily be compared) - Networks (nearly as easy - just see if the IP is part of the network) - Groups (slightly harder; can even be nested and contain other groups and hosts or networks) - IP ranges (10.0.0.50-10.5.25.100; not easy to parse either) I would even like to allow the users more complex queries like multiple search conditions. The query would be something like "show me all matching firewall rules where 10.0.0.5 matches the source column and 192.168.42.1 matches the destination column OR any rule where the group 'internal hosts' is mentioned in the destination column". It sounds like a database is the right job. But somehow a database is also not flexible enough. And the data is small enough (1 MB probably) that it can be read into memory. What do you think would be the right tool for the job? Thanks for sharing your thoughts. Christoph -- http://mail.python.org/mailman/listinfo/python-list