Discuz!官方免费开源建站系统

 找回密码
 立即注册

QQ登录

只需一步,快速开始

搜索

[求助] 【aixiagame首发教程】【Discuz屏蔽垃圾蜘蛛或机器人】以垃圾蜘蛛Bytespider为例

[复制链接]
qxyhuiyuan 发表于 2020-1-22 17:15:34 | 显示全部楼层 |阅读模式
本帖最后由 qxyhuiyuan 于 2020-1-30 19:19 编辑

这是写给不懂编程的新手站长看的


大量站长网站发现了Bytespider这个新型垃圾蜘蛛爬虫,大量的对网站进行抓取,其行为不压于持续的CC攻击,模拟移动手机用户大量抓取信息. GET信息的方式与爬虫类似, 但并未标注任何爬虫信息(站长们已经知道,这是某国内公司的,做新闻头条的). 请求速度经常导致服务器崩溃.
所以,我们就要采取防护措施,如何屏蔽搜索引擎机器人。


以DiscuzX 3.4 R20191201建站程序的来屏蔽垃圾蜘蛛为例子。


首先感谢@dgqjj 提供的思路:robots.txt+程序识别UA屏蔽(php代码屏蔽、nginx代码屏蔽、.htaccess代码屏蔽等等,任选其一)+屏蔽蜘蛛IP段
提供给小白们来解决这个垃圾蜘蛛问题,老白就不用看了,写的不一定好。


打开文件夹logs,查看里面的日志文件,发现了大量的垃圾蜘蛛Bytespider访问爬取,造成大量的流量消耗。


这时候我们就要对他采取必要的封禁行动,四步封死。
第一步:robots.txt规则封禁
文本中复制粘贴以下代码:

#
# robots.txt for Discuz! X3
#

User-agent: Googlebot
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: Googlebot-Image
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: Baiduspider-news
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: Baiduspider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: Baiduspider-image
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: baiduboxapp
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: Sosospider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: bingbot
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: 360Spider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: HaosouSpider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: yisouspider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: YoudaoBot
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: Sogou Orion spider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: Sogou News Spider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: Sogou blog
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: Sogou Spider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: Sogou spider2
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: Sogou inst spider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: Sogou web spider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: EasouSpider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: MSNBot
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*

User-agent: Bytespider
Disallow: /

User-Agent: *
Disallow: /

第二步:程序识别UA屏蔽(.htaccess代码屏蔽)跳转与阻拦
当然垃圾蜘蛛是不会遵守规则的,继续,文本中复制粘贴以下代码:

# 将 RewriteEngine 模式打开
RewriteEngine On
# 修改以下语句中的 /discuz 为您的论坛目录地址,如果程序放在根目录中,请将 /discuz 修改为 /
RewriteBase /
# Rewrite 系统规则请勿修改
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^topic-(.+)\.html$ portal.php?mod=topic&topic=$1&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^article-([0-9]+)-([0-9]+)\.html$ portal.php?mod=view&aid=$1&page=$2&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^forum-(\w+)-([0-9]+)\.html$ forum.php?mod=forumdisplay&fid=$1&page=$2&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^thread-([0-9]+)-([0-9]+)-([0-9]+)\.html$ forum.php?mod=viewthread&tid=$1&extra=page\%3D$3&page=$2&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^group-([0-9]+)-([0-9]+)\.html$ forum.php?mod=group&fid=$1&page=$2&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^space-(username|uid)-(.+)\.html$ home.php?mod=space&$1=$2&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^blog-([0-9]+)-([0-9]+)\.html$ home.php?mod=space&uid=$1&do=blog&id=$2&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^(fid|tid)-([0-9]+)\.html$ archiver/index.php?action=$1&value=$2&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^([a-z]+[a-z0-9_]*)-([a-z0-9_\-]+)\.html$ plugin.php?id=$1:(去掉括号内容,文章变表情了)$2&%1
# 这里跳转其他地方,想把几个跳转就复制几个ua
RewriteCond %{HTTP_USER_AGENT} ^(.*)Bytespider$
RewriteRule ^(.*)$ 这里输入你想要把这垃圾扔哪儿去(最好输入字节跳动的官网,把让仍会回老家去,自己爬自己去)
# 这里进行阻拦ua,想阻拦几个就复制几个ua
RewriteCond %{HTTP_USER_AGENT} ^Bytespider [NC,OR]
RewriteCond %{HTTP_USER_AGNET} ^spbot [NC]
RewriteRule .* - [F,L]

第三步:程序识别UA屏蔽(php代码屏蔽)过滤
在入口处加这段php代码,然后引用。感谢@gaoyibobo 提供的代码
<?php
// 屏蔽过滤恶意爬虫蜘蛛机器人
$ua = $_SERVER['HTTP_USER_AGENT'];

$now_ua = array('Bytespider');

if(!$ua) {
        header("Content-type: text/html; charset=utf-8");
        die('请勿采集本站,如有疑问请联系客服!');
}else{
        foreach($now_ua as $value)
        if(strpos($ua,$value) !== false) {
                header("Content-type: text/html; charset=utf-8");
                die('请勿采集本站,如有疑问请联系客服!');
        }
}

?>

在目录/source/class/里建个robots.txt,保存后改成robots.php,之后引用到class_core.php

首行加引用php代码:include('robots.php');
如图:

第四步:屏蔽蜘蛛IP段(注意是垃圾蜘蛛)禁止访问
已知的垃圾蜘蛛Bytespider的IP段为以下(如果有新的,会逐渐更新):
220.243.136.*
220.243.135.*
110.249.202.*
110.249.201.*
111.225.148.*
111.225.149.*
60.8.123.*


这两个地方禁止垃圾蜘蛛的IP段。


最后,总结一下:
第一步:robots.txt规则屏蔽
第二步:程序识别UA屏蔽(.htaccess代码屏蔽)跳转与阻拦
第三步:程序识别UA屏蔽(php代码屏蔽)过滤
第四步:屏蔽蜘蛛IP段(注意是垃圾蜘蛛)禁止访问
如果有什么新方法、好方法,欢迎大家留言,给予遇到的人帮助,先就这样吧,祝大家新年快乐、大吉大利、万事顺心。


本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

x
evilvoy 发表于 2020-1-24 02:03:45 | 显示全部楼层
qxyhuiyuan 发表于 2020-1-22 21:58
遵守规则的话,我还写4步,看都不看就废话,一边去,这不是写给你看的

是你自己没看清楚我的。


算了。
回复

使用道具 举报

jianjian566 发表于 2020-1-27 00:39:41 | 显示全部楼层
本帖最后由 jianjian566 于 2020-1-27 10:08 编辑

大哥 他爬你 是因为他也要做搜索引擎 叫头条搜索, 字节跳动 拥有 抖音 西瓜 火山  头条 内涵段子 等十几个产品  他觉得 牛逼了 就要干当年360要干的事,成立自己的搜索引擎, 他也是搜索引擎的蜘蛛, 实际上 目前爬的最频繁的是谷歌, 但个人建议 建了网站 就是让蜘蛛爬的,不然 没收录 没流量,与其屏蔽  不如多买点宽带,这样能赚的更多!
渔飞源钓鱼网
https://www.yufeiyuan.net/
回复

使用道具 举报

 楼主| qxyhuiyuan 发表于 2020-1-30 19:19:58 | 显示全部楼层
jianjian566 发表于 2020-1-27 00:39
大哥 他爬你 是因为他也要做搜索引擎 叫头条搜索, 字节跳动 拥有 抖音 西瓜 火山  头条 内涵段子 等十几个 ...

确实,不过太恶心就要禁止,头条也没流量
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

手机版|小黑屋|Discuz! 官方站 ( 皖ICP备16010102号 )star

GMT+8, 2024-11-25 07:17 , Processed in 0.021286 second(s), 3 queries , Gzip On, Redis On.

Powered by Discuz! X3.4

Copyright © 2001-2023, Tencent Cloud.

快速回复 返回顶部 返回列表