sloppycode.net
Link extraction/parsing
Extract links from the HREF attribute.
Home
›
Code snippets
›
Perl (CGI)
›
Link extraction/parsing
This simple regular-expression based perl script shows you how to extract the links from anchor tags.
#!/usr/local/bin/perl use CGI qw(:standard); print header(); $file = "this.html"; print "file=$file<br>\n"; open(IN, $file); @lines=<IN>; $text = join "\n", @lines; @srcs=($text =~ m|src\s*=\s*\"([^\"]+)\"|ig); @hrefs=($text =~ m|href\s*=\s*\"([^\"]+)\"|ig); print "<P>list of href values<BR>\n"; $count = 1; foreach $href (@hrefs) { print "$href<BR>\n"; $count++; } #print "<P>list of src values<BR>\n"; #foreach $src (@srcs) { # print "$count src=$src<BR>\n"; # $count++; #} close(IN);
{Name}
Says:
{Date}
{Text}
› Home
› C#
› Snippets
› Articles
› Tools
› Taglines
› ASP
› Dictionary Object
› FSO
› Unix cheat sheet
› Gaming
› CSS
› Yak
› Umbraco
› About
› Contact
› Privacy
› Projects
› Search
› Sitemap
Buy on Amazon
Buy on Amazon
Buy on Amazon
Buy on Amazon