On Sun, 2008-11-30 at 02:51 +0200, Canol Gökel wrote: > How can one write an expression to match always the most inner part? I > couldn't write an expression like "match a non-greedy <p>.*</p> which > does not have a <p> inside. >
You can't write a regular expression to do this. And no, I'm not going to write an entire second-year university course in an email explaining why you can't. You will just have to take my word for it. The data structure you're describing has unbounded nested contexts. This means the you can put a structure like <p>...</p> inside itself and you can do this an unlimited number of times. The only way to correctly parse such structures is to use a finite-state automation (FSA) with a push-down stack. If you want to parse HTML, I suggest you use a module like HTML::TreeBuilder. If you want to parse XML, you should consider XML::Parser at the least. Now days, modules like XML::Twig, XML::DOM ans XML::SAX are preferred but they are built on top of XML::Parser, so you'll need to install it too. If your data does not have a common definition, you could use modules like Parse::RecDescent to simplify the creation of the FSA. But since you don't plan to use Perl, I suggest that you ask the experts in the language of your choice. They will be able to suggest the best way to solve your problem and suggest the modules and libraries to help. -- Just my 0.00000002 million dollars worth, Shawn The key to success is being too stupid to realize you can fail. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/