<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Acumen Development &#187; scraping</title>
	<atom:link href="http://www.acumendevelopment.net/tag/scraping/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.acumendevelopment.net</link>
	<description>Software to inspire.</description>
	<lastBuildDate>Wed, 02 Jun 2010 16:39:25 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0-alpha</generator>
		<item>
		<title>Basic Scraping</title>
		<link>http://www.acumendevelopment.net/basic-scraping/</link>
		<comments>http://www.acumendevelopment.net/basic-scraping/#comments</comments>
		<pubDate>Sat, 22 Nov 2008 16:57:59 +0000</pubDate>
		<dc:creator>Leo Brown</dc:creator>
				<category><![CDATA[Development Processes]]></category>
		<category><![CDATA[Open Source]]></category>
		<category><![CDATA[mashups]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[scraping]]></category>

		<guid isPermaLink="false">http://www.acumendevelopment.net/?p=75</guid>
		<description><![CDATA[A short introduction to Web Page Scraping]]></description>
			<content:encoded><![CDATA[<p>While in production applications we all favour use of an API, there are a lot of situations, such as in &lsquo;<a href="http://en.wikipedia.org/wiki/Mashup_(web_application_hybrid)">Mashups</a>&rsquo; (I love how that term has been reappropriated from Jungle music) where you need to do some page scraping.</p>
<p>It&#8217;s occurred to me how these very easy techniques seem inaccessible to many people, so I thought I&#8217;d post a few bits and bobs about some basic scraping methods.</p>
<p>Here&#8217;s a bit of code I wrote to use PHP&#8217;s <a href="http://php.net/domdocument">DOMDocument class</a> to treat a HTML page as XML and fetch, in this case, the incredibly useful current world population&#8230; fantastic!</p>
<pre class="brush: php;">
&lt;?php
 // where to find population data...
 $location['url']='http://www.census.gov/ipc/www/popclockworld.html';
 $location['id']='worldnumber';

 // initialise a new document and prepare the data
 $d=new DOMDocument();
 $file = file_get_contents($location['url']);

 // get and print current world population
 $d-&gt;loadHTML($file);
 $e=$d-&gt;getElementById($location['id']);
 print $e-&gt;nodeValue;
?&gt;
</pre>
<p>Sample output: 6,738,610,278</p>
]]></content:encoded>
			<wfw:commentRss>http://www.acumendevelopment.net/basic-scraping/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk
Page Caching using memcached

Served from: www.acumendevelopment.net @ 2010-09-08 23:26:23 -->