It appears that the news sharing site, Digg, cannot be scraped. Scraping is when a site uses a script to get the content from another site.
Scraping can be used for good or for bad. Here are two scenarios: some site goes around to various news sites and takes news articles without credit – it is questionable as to whether this is acceptable even with credit. In a second case, a site owner might scrape Amazon results and provide links back to Amazon for fulfilment. Here, there would be little reason for Amazon to complain.
A better solution to scraping and one that has certainly excelled, is providing an API like a Web service or a feed. With APIs there is an arranged and hopefully consistent format for the data as well as a representation of what data the site wants other sites to use.
In the past, the Tapoll site uses PHP to read the Digg RSS feed. But now, it appears as though Digg is blocking http requests from other scripts most likely with a blanket blocking of non-Digg IP addresses. We get this message when scraping or trying to access the RSS feeds with the PHP file() command. This command works fine with most other sites like Flickr, Del.ico.us, etc.
failed to open stream: HTTP request failed!
If anybody has a work-around or any further information on this, please let us know. Perhaps we should have just checked with Digg rather than this attempt at sensationalism. But hey, I’m playing reporter!