本帖最后由 littlehz 于 2009-8-18 17:31 编辑
一、GBK版SupeSite的修改:
将下列代码插入到batch.search.php约132行- $urlplus = 'searchkey='.rawurlencode($searchkey).'&type='.rawurlencode($type);
复制代码 下一行(不能改变顺序,否则将无法得到准确的分词结果)。- function clear_point($jiugui)
- {
- return str_replace
- (
- array("~","!","@","#","$","%","^","&","*",",",".","?",";",":","/","'",'"',"[","]","{","}","!"," ¥","……","…","、",",","。","?",";",":","‘","“","”","’"," 【","】","~","!","@","#","$","%","^","&","*",",","."," <",">",";",":","'",""","[","]","{","}","/","\"," "),
- array(' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' '),
- $jiugui
- );
- }
- $searchkey = urlencode($searchkey);
- $searchkey = file_get_contents("http://www.littz.cn:1989/?w=".$searchkey);
- $searchkey = clear_point($searchkey);
- $searchkey1 = preg_replace('/\s+/',' ',$searchkey);
- $searchkey = str_replace(' ','%',$searchkey1);
复制代码 一、UTF-8版SupeSite的修改:
将下列代码插入到batch.search.php约132行- $urlplus = 'searchkey='.rawurlencode($searchkey).'&type='.rawurlencode($type);
复制代码 下一行(不能改变顺序,否则将无法得到准确的分词结果)。- function clear_point($jiugui)
- {
- return str_replace
- (
- array("~","!","@","#","$","%","^","&","*",",",".","?",";",":","/","'",'"',"[","]","{","}","!"," ¥","……","…","、",",","。","?",";",":","‘","“","”","’"," 【","】","~","!","@","#","$","%","^","&","*",",","."," <",">",";",":","'",""","[","]","{","}","/","\"," "),
- array(' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' ',' '),
- $jiugui
- );
- }
- $searchkey = iconv("UTF-8", "GBK//IGNORE", $searchkey);
- $searchkey = urlencode($searchkey);
- $searchkey = file_get_contents("http://www.littz.cn:1989/?w=".$searchkey);
- $searchkey = iconv("GBK", "UTF-8//IGNORE", $searchkey);
- $searchkey = clear_point($searchkey);
- $searchkey1 = preg_replace('/\s+/',' ',$searchkey);
- $searchkey = str_replace(' ','%',$searchkey1);
复制代码 因HTTPCWS只能接收GBK的分词,所以UTF-8的词汇需要转换成GBK分词之后再转回。
三、GBK和UTF-8均要做的修改。
默认模版,templates/default/site_search.html.php的56行附近,- <input type="text" class="input_tx" size="50" name="searchkey" value="$searchkey" />
复制代码 修改为- <input type="text" class="input_tx" size="50" name="searchkey" value="$searchkey1" />
复制代码 附加说明:
分词搜索依赖与www.littz.cn的服务器 以及 SS站点所在服务器 连接至www.littz.cn服务器 的网络状况,www.littz.cn服务器在美国硅谷IP:64.71.167.26,受海底光缆影响,例如2009年8月17日的海底光缆故障就导致访问缓慢。HTTPCWS 接口本身的中文分词处理速度非常快,如果有条件的朋友建议自己搭建HTTPCWS + Sphinx搜索服务器,本人不能保证此服务会长期有效运转,但肯定会尽量坚持,是提供一种解决问题的方法 |