Monday, June 14, 2010

Digital Archaeology Tags

It's fairly clear that in the future archaeology will move in to the digital domain with researchers digging through hard discs, USB keys, floppy discs, tapes, EEPROMs etc. I've often though it would be nice to drop them a message every so often, or just leave a note apologizing for my appalling spelling and grammar. So I suggest the following tags, to be encoded in 8bit ASCII when possible:

DIGIARCHNOTESTART

DIGIARCHNOTEEND

Dear future archaeologists, I shall use this in my documents from now on when I want to give you some background, point you in the direction of more information, or just say Hi, so grep away!

Wednesday, June 2, 2010

Grab all hrefs from a html page which have text containing View associated with them

Quick and dirty perl program to grab all links from a webpage which have have anchors with the text "View" in them:

#!/usr/bin/perl

use constant false => 0;
use constant true => 1;

use HTML::TreeBuilder;
use HTML::FormatText;

$html = HTML::TreeBuilder->new();
$html->parse_file($ARGV[0]);

my @stuff = $html->look_down( '_tag' , 'a' );

my $seqtag = "";
my $use_next = false;

for my $i (@stuff) {
my @thing = $i->content();
my $target = $i->attr('href');
my $str = $thing[0][0];

# string contains View
if($str =~ m/View/) {
print $target . "\n";
}
}