SS7.0采集优化成功率99%,经常失败的请看

oyoy8629 · 发表于 2009-10-13 13:54:42

本帖最后由 oyoy8629 于 2009-10-13 14:34 编辑

本人刚刚开始学习SS.
在采集时候遇到很多问题非常烦躁,特别是采集的时候总是失败.
后来无意中发现了boss发的帖子关于supesite7采集器的刚采集就结束的修改方案
中
.发现.SS的采集是使用函数file_get_contents,这个东东,在采集读取内容时候.必须网速要非常好才可以

所以,我就换成了curl的方式采集

列表采集

在/admin/admin_robots.php中 1957行增加

function fileget($url){
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; )');
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_COOKIE, 'phphubei');
$tmpInfo = curl_exec($curl);
curl_close($curl);
return $tmpInfo;
}

复制代码

1962行

if(!empty($url)) {
if(function_exists('file_get_contents')) {
@$text = file_get_contents($url);
} else {
@$carr = file($url);
if(!empty($carr) && is_array($carr)) {
$text = implode('',$carr);
}
}
}

复制代码

改成了.

if(!empty($url)) {
@$text = fileget($url);
}

复制代码

保存即可.......

这里注意............
服务器默认是不支持curl的
需要php.ini开启扩展.
开启扩展方法是
//1.在php.ini中去掉extension=php_curl.dll前面的分号.
//若不成功复制 php_curl.dll
// libeay32.dll
// ssleay32.dll
//到 C:\WINDOWS\System32下

改完了主意重启apache.

请注意备份你的admin_robots.php以防不测.
下面是改好的文件

admin_robots.rar (17.55 KB, 下载次数: 225)
http://bbs.phphubei.com/thread-4600-1-1.html

lidq.jingwu · 发表于 2009-10-13 14:05:04

支持一个

oyoy8629 · 发表于 2009-10-13 14:32:25

米人支持

白垩纪 · 发表于 2009-10-14 21:40:31

不会用啊。。。。。。。。。。。我是合租的。不是独立主机啊。

littlehz · 发表于 2009-10-15 09:22:37

嘿嘿，补充一下，如果是Linux下支持cURL，首先安装cURL的主程序。
命令行操作

yum install -y curl curl-devel

复制代码

然后PHP编译安装的时候，加上参数 -with-curl 和 --with-curlwrappers，例如

./configure --prefix=/usr/local/webserver/php --with-config-file-path=/usr/local/webserver/php/etc --with-mysql=/usr/local/webserver/mysql --with-mysqli=/usr/local/webserver/mysql/bin/mysql_config --with-iconv-dir=/usr/local --with-freetype-dir --with-jpeg-dir --with-png-dir --with-zlib --with-libxml-dir=/usr --enable-xml --disable-rpath --enable-discard-path --enable-safe-mode --enable-bcmath --enable-shmop --enable-sysvsem --enable-inline-optimization --with-curl --with-curlwrappers --enable-mbregex --enable-fastcgi --enable-fpm --enable-force-cgi-redirect --enable-mbstring --with-mcrypt --with-gd --enable-gd-native-ttf --with-openssl --with-mhash --enable-pcntl --enable-sockets --with-ldap --with-ldap-sasl --with-xmlrpc --enable-zip --enable-soap --without-pear

复制代码

配置完然后 make ZEND_EXTRA_LIBS='-liconv'
make install

oyoy8629 · 发表于 2009-10-15 09:25:25

littlehz(::13::)果然强大

4667506 · 发表于 2009-10-15 09:49:27

我之前也遇到过采集不完整的情况,检查检查:"采集页面编码".
最好别使用这个功能.我之前使用了.采集失败太多了~

oyoy8629 · 发表于 2009-10-15 11:02:50

PHP蜘蛛都用curl.~~是专门用来采集的
默认的file_get_contents和file采集,本身就不是用来做采集的
肯定失败高啊....

xukrl · 发表于 2009-10-15 11:30:28

从来不用采集功能，不过还是支持一下
www.23xm.com

calllilei · 发表于 2010-3-9 12:23:33

支持一下，但不知道改完后那些机器人直接导入后能不能用？规则什么需要改吗？

		自动登录	找回密码
密码			立即注册

[采集] SS7.0采集优化成功率99%,经常失败的请看

评分