I don’t know if anyone except me will need this script, so i put it in blog just not to loose it
Very simple function analyze $_SERVER[’HTTP_USER_AGENT’] variable and looking for crawler signature. If function founds crawler, it will return it’s name, otherwise – false.
Usage examples:
– save to database and output somethere in admin zone or on site
– save for indexing statistics and analyze it later
– use for cloacking or doorways :) ( i do not advise you to do it )
function crawlerDetect($USER_AGENT) { $crawlers = array( array('Google', 'Google'), array('msnbot', 'MSN'), array('Rambler', 'Rambler'), array('Yahoo', 'Yahoo'), array('AbachoBOT', 'AbachoBOT'), array('accoona', 'Accoona'), array('AcoiRobot', 'AcoiRobot'), array('ASPSeek', 'ASPSeek'), array('CrocCrawler', 'CrocCrawler'), array('Dumbot', 'Dumbot'), array('FAST-WebCrawler', 'FAST-WebCrawler'), array('GeonaBot', 'GeonaBot'), array('Gigabot', 'Gigabot'), array('Lycos', 'Lycos spider'), array('MSRBOT', 'MSRBOT'), array('Scooter', 'Altavista robot'), array('AltaVista', 'Altavista robot'), array('IDBot', 'ID-Search Bot'), array('eStyle', 'eStyle Bot'), array('Scrubby', 'Scrubby robot') ); foreach ($crawler as $c) { if (stristr($USER_AGENT, $c[0])) { return($c[1]); } } return false; } // example $crawler = crawlerDetect($_SERVER['HTTP_USER_AGENT']); if ($crawler ) { // it is crawler, it's name in $crawler variable } else { // usual visitor }
UPDATE:
After reading this i decide to update my code a bit. Change is connected to usage of function on high volume website.
<?php $crawlers = array( 'Google'=>'Google', 'MSN' => 'msnbot', 'Rambler'=>'Rambler', 'Yahoo'=> 'Yahoo', 'AbachoBOT'=> 'AbachoBOT', 'accoona'=> 'Accoona', 'AcoiRobot'=> 'AcoiRobot', 'ASPSeek'=> 'ASPSeek', 'CrocCrawler'=> 'CrocCrawler', 'Dumbot'=> 'Dumbot', 'FAST-WebCrawler'=> 'FAST-WebCrawler', 'GeonaBot'=> 'GeonaBot', 'Gigabot'=> 'Gigabot', 'Lycos spider'=> 'Lycos', 'MSRBOT'=> 'MSRBOT', 'Altavista robot'=> 'Scooter', 'AltaVista robot'=> 'Altavista', 'ID-Search Bot'=> 'IDBot', 'eStyle Bot'=> 'eStyle', 'Scrubby robot'=> 'Scrubby', ); function crawlerDetect($USER_AGENT) { // to get crawlers string used in function uncomment it // it is better to save it in string than use implode every time // global $crawlers // $crawlers_agents = implode('|',$crawlers); $crawlers_agents = 'Google|msnbot|Rambler|Yahoo|AbachoBOT|accoona|AcioRobot|ASPSeek|CocoCrawler|Dumbot|FAST-WebCrawler|GeonaBot|Gigabot|Lycos|MSRBOT|Scooter|AltaVista|IDBot|eStyle|Scrubby'; if ( strpos($crawlers_agents , $USER_AGENT) === false ) return false; // crawler detected // you can use it to return its name /* else { return array_search($USER_AGENT, $crawlers); } */ } // example $crawler = crawlerDetect($_SERVER['HTTP_USER_AGENT']); if ($crawler ) { // it is crawler, it's name in $crawler variable } else { // usual visitor }