Nov 11, 2014

Assuming $_ contains HTML, which of the following substitutions will remove all tags in it? 1.s/<.*>//g; 2.s/<.*?>//gs; 3.s/<\/?[A-Z]\w*(?:\s+[A-Z]\w*(?:\s*=\s*(?:(["']).*?\1|[\w-.]+))?)*\s*>//gsix;

You can't do that.
If it weren't for HTML comments, improperly formatted HTML, and tags with interesting data like < SCRIPT >, you could do this. Alas, you cannot. It takes a lot more smarts, and quite frankly, a real parser

0 comments:

Post a Comment