Main features:
- Take starting URL and drill down through Groups, Meetings to retrieve information
- Match groups with Mirule Organizations
- Match meetings with Mirule meetings
- Match people with Mirule people
DONE:
- Store web pages for later mining/reruns with better algorithms
I just ran the spider on the Sibbo web site for 2009 and downloaded all protocols + attachments (2.5 GB). Yesterday's 0.5GB was just for the council data, 2.5GB includes all data.