Text to HTML Parser
Add Comment<div align="center"> <table border="0" width="90%" class="outline"> <tr> <td width="50%" class="outline"><b>Download File</b></td> <td width="50%" class="outline"><b>SDK</b></td> </tr> <tr> <td width="50%" class="outline"><a href="../../file/textparser.zip" class="wbox">textparser.zip</a> (3kb)</td> <td width="50%" class="outline">Beta1</td> </tr> </table> </div> <p> <span class=wboxheado>Introduction</span><br> If you have been into developing Web Applications then you might have at many times experienced that when you display multiple lines of data from the database you loose the spacing or formatting between multiple lines of data. Also in some applications like Forums, where users can post HTML content directly which can lead to some serious problems. What I mean by Posting HTML content is that e.g.. A user can post a HTML Image tag like <b><br> <img src="http://myserver.com/mypic.jpg" ></b> and when someone views this post the actual image gets displayed instead of the Tag! Someone can post a link to some malicious coded page and all the users can become easy targets which can cause some serious security implications. </p> <p align="justify"><span class=wboxheado>Problem</span><br> The problem that I have described above is divided into 2 parts.<br> 1) <b>Formatting Problem:</b> In HTML all the white spaces between two characters get converted into a single white space automatically. Also Carriage Return '/r' and Line Feed '/n' characters do not have any affect on the HTML formatting. Due to this if you have a multiple line post, while displaying HTML converts all formatting to just a single continuous line.</p> <p align="justify">2) <b>HTML Content:</b> This can be both a problem or boon depending on users of your application. While displaying the content from the database, the HTML engine of the client browser actually parses the HTML content of the data. Due to this instead of displaying the tag's as text, they actually get converted to HTML.</p> <p align="justify"><span class=wboxheado>Solution</span><br> There is a common solution to both the above problems, you have to parse the Text content from the Database into respective HTML tags.<br> 1) <b>Formatting Solution</b>: In HTML <b>&nbsp;</b> denotes a extra white space. So every 2 white spaces should be substituted by a single white space and <b>&nbsp;</b> . <br> Also every line terminator should be replaced by the break tag <b><br></b>, which will result in the next character starting for a new line. </p> <p align="justify">2) <b>HTML Content:</b> The solution to this is a bit tricky, in HTML every valid tag is contained within the < and > brackets. So to make all the HTML tags in your post invalid just change the < and > tags to their HTML counter parts <b>&lt;</b> and <b>&gt;</b> respectively. Also one other formatting change to be made is that the double quotation mark <b>"</b> has to be changed into its HTML equivalent <b>&quot;</b> </p> <p align="justify"><span class=wboxheado>Text to HTML parser</span><br> On the .NET Platform the String object is immutable i.e. once you create a String object you cannot change its contents. Since our parser needs to do some heavy weight string manipulations, I use the StringBuilder class from the System.Text namespace which provides a mutable string object. Also for streaming access to textual content I use the StringReader and StringWriter classes from the System.IO namespace.</p> <b>Example:</b> Original Post <table border="0" class="outline" cellspacing="0" cellpadding="0"> <tbody> <tr> <td width="100%"> <p><textarea rows="5" name="S1" cols="61" class="wbox">Some sample text with lots of extra white spacing .. ... and some text on a new line. lastly the HTML textbox tag <input type="text"></textarea></p> </td> </tr> </tbody> </table> <br> <b>Example:</b> Normal post (See the problem!!) <table border="0" cellspacing="0" cellpadding="0" class="outline"> <tbody> <tr> <td width="100%">Some sample text with lots of extra white spacing .. ... and some text on a new line. lastly the HTML textbox tag <input type="text" size="20"></td> </tr> </tbody> </table> <br> <b>Example</b>: Parsed Text with HTML posting allowed (See the difference!) <table border="0" class="outline" cellspacing="0" cellpadding="0"> <tbody> <tr> <td width="100%">Some sample text with lots of extra white spacing .. ...<br> <br> and some text on a new line.<br> <br> lastly the HTML textbox tag <input type="text" size="20"> </td> </tr> </tbody> </table> <p><b>Example:</b> Parsed Text with HTML posting disabled (exactly same as posted!) <table border="0" class="outline" cellspacing="0" cellpadding="0"> <tbody> <tr> <td width="100%">Some sample text with lots of extra white spacing .. ...<br> <br> and some text on a new line.<br> <br> lastly the HTML textbox tag <input type="text"> </td> </tr> </tbody> </table><p><span class=wboxheado>Code</span><br> 1) <i><b>ParseText method</b> :- The method to convert Text into HTML </i></p> <table border="0" width="100%" class="code" cellspacing="0" cellpadding="0"> <tr> <td width="100%"> <pre>public string parsetext(string text, bool allow) { <span class=cmt>//Create a StringBuilder object from the string input //parameter</span> StringBuilder sb = new StringBuilder(text) ; <span class=cmt>//Replace all double white spaces with a single white space //and &nbsp;</span> sb.Replace(" "," &nbsp;"); <span class=cmt>//Check if HTML tags are not allowed</span> if(!allow) { <span class=cmt> //Convert the brackets into HTML equivalents</span> sb.Replace("<","&lt;") ; sb.Replace(">","&gt;") ; <span class=cmt>//Convert the double quote</span> sb.Replace("\"","&quot;"); } <span class=cmt>//Create a StringReader from the processed string of //the StringBuilder</span> StringReader sr = new StringReader(sb.ToString()); StringWriter sw = new StringWriter(); <span class=cmt>//Loop while next character exists</span> while(sr.Peek()>-1) { <span class=cmt>//Read a line from the string and store it to a temp //variable</span> string temp = sr.ReadLine(); <span class=cmt> //write the string with the HTML break tag //Note here write method writes to a Internal StringBuilder //object created automatically</span> sw.Write(temp+"<br>") ; } <span class=cmt>//Return the final processed text</span> return sw.GetStringBuilder().ToString(); }</pre> </td> </tr> </table><P> </P> <P>2)<I> <b> textparser.aspx</b> - A sample consumer for the Text to HTML parser</I></P> <table border="0" width="100%" cellspacing="0" cellpadding="0" class="code"> <tr> <td width="100%"> <pre><%@ Page Language="C#" %> <%@ Import namespace="System.Text" %> <%@ Import Namespace="System.IO" %> <html> <head> <script language="C#" runat=server > private void Post_Text(object sender, EventArgs e) { <span class=cmt>//Check if there is some text inside the TextBox</span> if(mess.Text!="") { /<span class=cmt>/Check if option to Parse Text is selected</span> if(parse.Checked) { <span class=cmt>//Check if option to convert HTML tags to text is selected</span> if(htmlpost.Checked) { <span class=cmt>//Call the parsetext method //Pass the text content from the textbox and false so that //HTML tags do not get converted to text</span> postmess.Text=parsetext(mess.Text,false) ; } else { <span class=cmt>//Call the parsetext method //Pass the text content from the textbox and true so that //HTML tags get converted to text</span> postmess.Text=parsetext(mess.Text,true) ; } } else { <span class=cmt>//Just post the text without any parsing</span> postmess.Text=mess.Text ; } } } <span class=cmt>//Method to parse Text into HTML</span> public string parsetext(string text, bool allow) { <span class=cmt>//Create a StringBuilder object from the string input //parameter</span> StringBuilder sb = new StringBuilder(text) ; <span class=cmt>//Replace all double white spaces with a single white space //and &nbsp;</span> sb.Replace(" "," &nbsp;"); <span class=cmt>//Check if HTML tags are not allowed</span> if(!allow) { <span class=cmt>//Convert the brackets into HTML equivalents</span> sb.Replace("<","&lt;") ; sb.Replace(">","&gt;") ; <span class=cmt> //Convert the double quote</span> sb.Replace("\"","&quot;"); } <span class=cmt> //Create a StringReader from the processed string of //the StringBuilder object</span> StringReader sr = new StringReader(sb.ToString()); StringWriter sw = new StringWriter(); <span class=cmt> //Loop while next character exists</span> while(sr.Peek()>-1) { <span class=cmt>//Read a line from the string and store it to a temp //variable</span> string temp = sr.ReadLine(); <span class=cmt>//write the string with the HTML break tag //Note here write method writes to a Internal StringBuilder //object created automatically</span> sw.Write(temp+"<br>") ; } <span class=cmt>//Return the final processed text</span> return sw.GetStringBuilder().ToString(); } </script> </head> <body> <center> <h3>Wecome to Saurabh's Text to HTML Parser</h3> <br> <form runat=server > <table border=1> <tr> <td valign=top>Your message</td> <td> <asp:label text="&nbsp;" id=postmess runat=server /> </td></tr> <tr><td valign=top>Enter Message </td> <td><asp:textbox Columns="50" Rows="20" TextMode="MultiLine" id=mess runat=server /></td></tr> <tr><td colspan=2> <asp:checkbox id=parse text="Select to Parse the Text into HTML" runat=server/> <br> <asp:checkbox id=htmlpost text="Select to allow posting of HTML content" runat=server /> </td></tr> <tr><td colspan=2> <asp:button onClick="Post_Text" text="Post Text" runat=server/></td></tr> </table> </form> </center> </body> </html></pre> </td> </tr> </table>