小編給大家分享一下PHP代碼如何實(shí)現(xiàn)爬蟲(chóng),相信大部分人都還不怎么了解,因此分享這篇文章給大家參考一下,希望大家閱讀完這篇文章后大有收獲,下面讓我們一起去了解一下吧!
在做網(wǎng)站、網(wǎng)站建設(shè)中從網(wǎng)站色彩、結(jié)構(gòu)布局、欄目設(shè)置、關(guān)鍵詞群組等細(xì)微處著手,突出企業(yè)的產(chǎn)品/服務(wù)/品牌,幫助企業(yè)鎖定精準(zhǔn)用戶,提高在線咨詢和轉(zhuǎn)化,使成都網(wǎng)站營(yíng)銷(xiāo)成為有效果、有回報(bào)的無(wú)錫營(yíng)銷(xiāo)推廣。創(chuàng)新互聯(lián)建站專(zhuān)業(yè)成都網(wǎng)站建設(shè)10多年了,客戶滿意度97.8%,歡迎成都創(chuàng)新互聯(lián)客戶聯(lián)系。實(shí)現(xiàn)爬蟲(chóng)記錄本文從創(chuàng)建crawler 數(shù)據(jù)庫(kù),robot.php記錄來(lái)訪的爬蟲(chóng)從而將信息插入數(shù)據(jù)庫(kù)crawler,然后從數(shù)據(jù)庫(kù)中就可以獲得所有的爬蟲(chóng)信息。實(shí)現(xiàn)代碼具體如下:
數(shù)據(jù)庫(kù)設(shè)計(jì)
create table crawler ( crawler_ID bigint() unsigned not null auto_increment primary key, crawler_category varchar() not null, crawler_date datetime not null default '-- ::', crawler_url varchar() not null, crawler_IP varchar() not null )default charset=utf;
以下文件 robot.php 記錄來(lái)訪的爬蟲(chóng),并將信息寫(xiě)入數(shù)據(jù)庫(kù):
<?php $ServerName = $_SERVER["SERVER_NAME"] ; $ServerPort = $_SERVER["SERVER_PORT"] ; $ScriptName = $_SERVER["SCRIPT_NAME"] ; $QueryString = $_SERVER["QUERY_STRING"]; $serverip = $_SERVER["REMOTE_ADDR"] ; $Url="http://".$ServerName; if ($ServerPort != "") { $Url = $Url.":".$ServerPort ; } $Url=$Url.$ScriptName; if ($QueryString !="") { $Url=$Url."?".$QueryString; } $GetLocationURL=$Url ; $agent = $_SERVER["HTTP_USER_AGENT"]; $agent=strtolower($agent); $Bot =""; if (strpos($agent,"bot")>-) { $Bot = "Other Crawler"; } if (strpos($agent,"googlebot")>-) { $Bot = "Google"; } if (strpos($agent,"mediapartners-google")>-) { $Bot = "Google Adsense"; } if (strpos($agent,"baiduspider")>-) { $Bot = "Baidu"; } if (strpos($agent,"sogou spider")>-) { $Bot = "Sogou"; } if (strpos($agent,"yahoo")>-) { $Bot = "Yahoo!"; } if (strpos($agent,"msn")>-) { $Bot = "MSN"; } if (strpos($agent,"ia_archiver")>-) { $Bot = "Alexa"; } if (strpos($agent,"iaarchiver")>-) { $Bot = "Alexa"; } if (strpos($agent,"sohu")>-) { $Bot = "Sohu"; } if (strpos($agent,"sqworm")>-) { $Bot = "AOL"; } if (strpos($agent,"yodaoBot")>-) { $Bot = "Yodao"; } if (strpos($agent,"iaskspider")>-) { $Bot = "Iask"; } require("./dbinfo.php"); date_default_timezone_set('PRC'); $shijian=date("Y-m-d h:i:s", time()); // 連接到 MySQL 服務(wù)器 $connection = mysql_connect ($host, $username, $password); if (!$connection) { die('Not connected : ' . mysql_error()); } // 設(shè)置活動(dòng)的 MySQL 數(shù)據(jù)庫(kù) $db_selected = mysql_select_db($database, $connection); if (!$db_selected) { die ('Can\'t use db : ' . mysql_error()); } // 向數(shù)據(jù)庫(kù)插入數(shù)據(jù) $query = "insert into crawler (crawler_category, crawler_date, crawler_url, crawler_IP) values ('$Bot','$shijian','$GetLocationURL','$serverip')"; $result = mysql_query($query); if (!$result) { die('Invalid query: ' . mysql_error()); } ?>
成功了,現(xiàn)在訪問(wèn)數(shù)據(jù)庫(kù)即可得知什么時(shí)候哪里的蜘蛛爬過(guò)你的什么頁(yè)面。
view sourceprint? <?php include './robot.php'; include '../library/page.Class.php'; $page = $_GET['page']; include '../library/conn_new.php'; $count = $mysql -> num_rows($mysql -> query("select * from crawler")); $pages = new PageClass($count,,$_GET['page'],$_SERVER['PHP_SELF'].'?page={page}'); $sql = "select * from crawler order by "; $sql .= "crawler_date desc limit ".$pages -> page_limit.",".$pages -> myde_size; $result = $mysql -> query($sql); ?> <table width=""> <thead> <tr> <td bgcolor="#CCFFFF"></td> <td bgcolor="#CCFFFF" align="center" >爬蟲(chóng)訪問(wèn)時(shí)間</td> <td bgcolor="#CCFFFF" align="center" >爬蟲(chóng)分類(lèi)</td> <td bgcolor="#CCFFFF" align="center" >爬蟲(chóng)IP</td> <td bgcolor="#CCFFFF" align="center" >爬蟲(chóng)訪問(wèn)的URL</td> </tr> </thead> <?php while($myrow = $mysql -> fetch_array($result)){ ?> <tr> <td width=""><img src="../images/topicnew.gif" /></td> <td width="" ><? echo $myrow["crawler_date"] ?></td> <td width="" ><? echo $myrow["crawler_category"] ?></td> <td width=""><? echo $myrow["crawler_IP"] ?></td> <td width=""><? echo $myrow["crawler_url"] ?></td> </tr> <?php } ?> </table> <?php echo $pages -> myde_write(); ?>
以上是“PHP代碼如何實(shí)現(xiàn)爬蟲(chóng)”這篇文章的所有內(nèi)容,感謝各位的閱讀!相信大家都有了一定的了解,希望分享的內(nèi)容對(duì)大家有所幫助,如果還想學(xué)習(xí)更多知識(shí),歡迎關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道!
本文名稱(chēng):PHP代碼如何實(shí)現(xiàn)爬蟲(chóng)-創(chuàng)新互聯(lián)
文章分享:http://aaarwkj.com/article40/ishho.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供網(wǎng)站改版、網(wǎng)站導(dǎo)航、網(wǎng)站收錄、手機(jī)網(wǎng)站建設(shè)、搜索引擎優(yōu)化、網(wǎng)站內(nèi)鏈
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來(lái)源: 創(chuàng)新互聯(lián)
猜你還喜歡下面的內(nèi)容