用C#过滤HTML代码的函数

2016-02-19 20:10 4 1 收藏

下面图老师小编跟大家分享一个简单易学的用C#过滤HTML代码的函数教程，get新技能是需要行动的，喜欢的朋友赶紧收藏起来学习下吧！

【 tulaoshi.com - Web开发】

　　正好有时间所以用C#写了一段正则表达式,作用是删除 Page 里面Code 中的 HTML标签,这在做采集信息,消除其中的HTML很有用处。

以下是引用片段：
publicstringcheckStr(stringhtml)
　　　{
　　　　　System.Text.RegularExpressions.Regexregex1=newSystem.Text.RegularExpressions.Regex(@"script[sS]+/script*",System.Text.RegularExpressions.RegexOptions.IgnoreCase);
　　　　　System.Text.RegularExpressions.Regexregex2=newSystem.Text.RegularExpressions.Regex(@"href*=*[sS]*script*:",System.Text.RegularExpressions.RegexOptions.IgnoreCase);
　　　　　System.Text.RegularExpressions.Regexregex3=newSystem.Text.RegularExpressions.Regex(@"no[sS]*=",System.Text.RegularExpressions.RegexOptions.IgnoreCase);
　　　　　System.Text.RegularExpressions.Regexregex4=newSystem.Text.RegularExpressions.Regex(@"iframe[sS]+/iframe*",System.Text.RegularExpressions.RegexOptions.IgnoreCase);
　　　　　System.Text.RegularExpressions.Regexregex5=newSystem.Text.RegularExpressions.Regex(@"frameset[sS]+/frameset*",System.Text.RegularExpressions.RegexOptions.IgnoreCase);
　　　　　System.Text.RegularExpressions.Regexregex6=newSystem.Text.RegularExpressions.Regex(@"img[^]+",System.Text.RegularExpressions.RegexOptions.IgnoreCase);　
　　　　　System.Text.RegularExpressions.Regexregex7=newSystem.Text.RegularExpressions.Regex(@"/p",System.Text.RegularExpressions.RegexOptions.IgnoreCase);
　　　　　System.Text.RegularExpressions.Regexregex8=newSystem.Text.RegularExpressions.Regex(@"p",System.Text.RegularExpressions.RegexOptions.IgnoreCase);
　　　　　System.Text.RegularExpressions.Regexregex9=newSystem.Text.RegularExpressions.Regex(@"[^]*",System.Text.RegularExpressions.RegexOptions.IgnoreCase);
　　　　　html=regex1.Replace(html,"");//过滤script/script标记
　　　　　html=regex2.Replace(html,"");//过滤href=javascript:(A)属性
　　　　　html=regex3.Replace(html,"_disibledevent=");//过滤其它控件的on...事件
　　　　　html=regex4.Replace(html,"");//过滤iframe
　　　　　html=regex5.Replace(html,"");//过滤frameset
　　　　　html=regex6.Replace(html,"");//过滤frameset
　　　　　html=regex7.Replace(html,"");//过滤frameset
　　　　　html=regex8.Replace(html,"");//过滤frameset
　　　　　html=regex9.Replace(html,"");
　　　　　html=html.Replace("","");
　　　　　html=html.Replace("/strong","");
　　　　　html=html.Replace("strong","");
　　　　　returnhtml;
}

来源:http://www.tulaoshi.com/n/20160219/1622909.html

上一篇：达内金牌讲师唐亮Java语言细节(下)
下一篇： VC开发多语言界面支持的简单方法

看过《用C#过滤HTML代码的函数》的人还看了以下文章更多>>

用C#代码编写的SN快速输入工具

标签：编程语言网络编程

一般软件都要输入序列号(SN)，而大家平时用的最多的恐怕是盗版软件，通常盗版软件的序列号(SN)都保存成：XXXXX-XXXXX-XXXX-XXXX的形式。而软件输入序列号的地方通常都是几个文本框(TextBox)组成。一个个的将XXXXX复制到文本框将非常麻烦。于是SN快速输入工具便由此产生了。当然这些都和我的编写这个程序的原因无关。我编写这个...

C# Mines(布雷) 代码

标签：编程语言网络编程

本文给出一个 C# Mines(布雷)的代码，新手研究一下吧。以下是引用片段： using System.Collections; using System.IO; using System; namespace com.Mines { class SearchingMines { public ArrayList list = new ArrayList(); pub...

用C#写的ADSL拨号程序的代码示例

标签：编程语言网络编程

!--StartFragment--ADSL自动拨号类,前提是在系统中已经有了一个宽带拨号连接调用代码: RASDisplay ras = new RASDisplay(); ras.Disconnect();//断线 ras.Connect("adsl");//拨号代码如下： using System; using System.Runtime.InteropServices; public struct RASCONN...

用C# 实现鼠标框选效果的实现代码

标签：编程语言网络编程

实现步骤： 1.实现整个鼠标框选的几个事件(down、move、up)，当鼠标点下记录鼠标框选的起点，鼠标抬起结束操作。 2.以鼠标框选过程中获取的鼠标坐标为基点计算框选的矩形的4点坐标,4点坐标以顺时针方向布点。 3.通过Shape.Path类实现在类上画出此矩形。代码如下：代码如下： namespace HostDemo { public class HostCanvas : Ca...

索引服务调用代码（C#）

private void Button1_Click(object sender, System.EventArgs e) { // Catalog Name string strCatalog = "TestCatalog"; string strQuery=""; strQuery = "Select DocTitle,Filename,Size,PATH,URL from Scope() where FREETEXT('" +TextBox1.Text+ "')"; // TextBox1.Text is the word that you type in the text box to query by using In...

查看更多精彩>>