Exploring your first C# application - Hello World
Add Comment<span class="wboxheado">Introduction</span><br> In the <a href="http://www.mastercsharp.com/article.aspx?ArticleID=89&&TopicID=4">previous article</a> you learned to compile a C# program. In this article you will understand the source code written in the previous article viz. the HelloWorld program.<br> Later, you will skinny dip into <i>Intermediate Language</i> (IL) code for the HelloWorld program which will give you a greater understanding of the internal working of the .NET Platform.<br> <br> <span class="wboxheado">HelloWorld - Source Code</span><br> Bellow is the full source code of the HelloWorld program written in the previous article:<br> <table cellpadding="1" cellspacing="2" width="100%" class="Code"> <tr> <td width="100%"><pre><span class="cmt">/* HelloWorld.cs - First C# Program Written by - Saurabh Nandu Compilation: csc HelloWorld.cs */</span> public class HelloWorld { public static void Main() { <span class="cmt">//Print Hello World</span> System.Console.WriteLine( "Hello World !" ) ; } }</pre> </td></tr> </table> <br> <span class="wboxheado">Code Components</span><br> Lets understand the source code:<br> <span class="wboxhead">Code Comments</span><br> The first few lines in the source code are code comments. Comments are nothing but additional text a programmer supplies to add some meaningful information for the program. Adding comments to code is optional, but a good practice since over time people tend to forget important details about the program. Liberal usage of comments helps tremendously when you want to understand a program later.<br> <br> When you compile the program, the compiler totally ignores all the comments you have added, hence comments do not appear in the final executable generated.<br> <br> Comments can be inserted anywhere in the source code, many time while writing applications, programmers use comments to hide a parts of a program from the compiler.<br> <br> C# supports 3 kinds of comments <br> <br> <span class="wboxhead">1) Delimited comments</span><br> Delimited comments start with <b>/*</b> and end with <b>*/</b>. This style of commenting will be familiar for C++ programmers. All the text between the delimiters are considered as comments and ignored by the compiler. This kind of comment can span multiple lines.<br> The snip below shows an example of a delimited comment used in the HelloWorld program.<br> <table cellpadding="1" cellspacing="2" width="100%" class="Code"> <tr> <td width="100%"><pre> <span class="cmt">/* HelloWorld.cs - First C# Program Written by - Saurabh Nandu Compilation: csc HelloWorld.cs */</span></pre> </td> </tr> </table> <br> <b>Note: </b><i>Dont forget to close this kind of comment with the */ delimiter else all the code below will be considered as comment and the compiler will throw an error.</i><br> <br> <span class="wboxhead">2) Single Line Comment</span><br> Single line comments are useful to comment things in-place. They start with <b>//</b> and end when the line ends. As the name suggests they are single lined and if you want to have multiple lines of such comments you need to prefix all the lines with <i>//</i>. <br> These comments are generally provided over or besides the code statement. The line below shows an example from the HelloWorld program.<br> <br> <span class="codetext">//Print Hello World</span><br> <br> <span class="wboxhead">3) XML Documentation Comments</span><br> The third type of comments supported by C# is XML documentation comments. This style of commenting is used to document the code while coding. I will cover this style of comments in detail in a separate article.<p><span class="wboxheado">Class Definition</span><br> Class is a blue print or template of an object. All the behavior and data for an object is packaged into the class. You can also call class as the basic building block of your application, all code written is within the context of a class<br> The following lines from the HelloWorld code consist of the class definition: </p> <p> <span class="codetext">public class HelloWorld<br> {<br> ..<br> ..<br> }<br></span> <br> The keyword <b>class</b> is used to define a class in C#. The keyword <b>public</b> before the word class is the access modifier for the class, indicating that any assembly can create an instance of the class. <br> After the class definition, the scope operators i.e. braces <b>{ }</b> are used to demark the class. All the behavior and data of the class has to be contained within the brace delimiters. <br> I will discuss Object Oriented Programming as well as classes in more detail in the following articles.</p> <p> <span class="wboxheado">Main method Entry point of an application.</span><br> The behavior of the object is broken down into various methods within the class. In my example I have one method that is the <b>Main</b> method defined as follows:<br> <br><span class="codetext" > public static void Main( )<br> {<br> ..<br> ..<br> }<br></span> <br> The definition of the <i>Main</i> method starts with its access modifier i.e. <i>public</i>, hence any object can call this method. Its followed by the keyword <b>static,</b> this keyword makes turns the method into a <i>class level </i>member. Hence you do not have to first create an object of the class to call the <i>Main</i> method. I understand things might seem a bit cloudy now, but they will get clarified as we move on. <br> Then there is the return type, after every method finishes execution it returns the control back to the caller. When a method returns, it has to have a return value. In this case since this method is going to return no value the <b>void</b> keyword is used to denote that the method returns no value. Finally, there is the method name, Main with a set of empty parentheses. Like the classes, methods too use <i>scope operators</i> i.e. { } to define the scope of the method. All the code of the method has to be packaged within the scope operators for the method.<br> <br> There is a special significance attached with the method Main in C#. When you click on an application to run it, the runtime should know from where (which class, which method) it should start executing the code. In C#, like many other programming languages there is the Main method which the runtime uses by default to start running the application. Hence the Main method is also known as the entry point of the application. <br> <br> <b>Note: </b><i>C / C++ / Java programmers its called the Main method with a capital M, unlike other languages.</i><br> <br> <b>Note:</b><i> Unlike C++ Main method is not a global function, it has to be defined as a class member.</i> </p> <p> <span class="wboxheado">Code Statement Method Call</span><br> Within the Main method there is one line of code statement which actually performs the work of displaying the Hello World message on the console. This statement defines the behavior of the Main method.<br> <br> <span class="codetext">System.Console.WriteLine( "Hello World !" ) ;</span><br> <br> In the above line a call is made to the <b>WriteLine</b> method of the <b>Console</b> class by passing a string parameter to it. <br> <br> <b>System</b> is a <i>namespace</i>, its closest analogy in Java is the package. Namespaces are used to resolve the issue of name clashing i.e. two classes from different libraries having the same name. Namespace is a logical grouping of related classes. Namespaces are discussed in detail in the following articles. <br> <br> <i>Console</i> is a class, like the <b>HelloWorld</b> class defined in the <i>System</i> namespace. Within the <i>Console</i> class there is a class level <i>WriteLine</i> method that is called in the example. The dot (.) operator is used to reference a class belonging to a namespace as well as it is used to reference a method within a class. <br> Since the <i>WriteLine</i> method is also a class level method like the <i>Main</i> method it can be called directly without making an instance of the <i>Console</i> class.<br> <br> A string parameter (Hello World !) is passed to the <i>WriteLine</i> method, this parameter is used by the <i>WriteLine</i> method to print on the Console screen. The behavior of the <i>WriteLine</i> method is to take a parameter and print it on the console screen. <br> Since in the HelloWorld application the only statement is to print the message to the screen, once the message is printed the program will end automatically since it has no more tasks to perform.<br> <br> <b>Note:</b> <i>String values are delimited by quotes .<br> </i> <br> <b>Note:</b> <i>All statements in C# end with a semi-colon. If you forget to add a semi-colon after a C# statement the compiler will complain.</i><br> <br> This ends the overview of various components of the HelloWorld program.</p> <p> <span class="wboxheado">Understanding the compilation and execution process of a C# application.</span><br> <span class="wboxhead">Compilation Process</span><br> In the previous article you learned to compile a C# source code file and execute it. Lets see what happens underneath when you compile a C# source code file. The diagram below represents the compilation process.</p> <p align="center"> <img border="0" src="../../img/exploringyourfirstapplication1.gif" width="346" height="171"><br> <b>Figure 1:</b> <i>C# Compilation Process</i></p> <p align="left"> During the compilation of a source code file, the C# compiler (csc.exe) converts the C# code into <i>Microsoft Intermediate Language (MSIL)</i> code. It packages the MSIL code as a Win32 executable file with some extended features. The header table of the executable has been expanded to accommodate additional metadata about the assembly. Also the code contained within it is not assembly language but its MSIL. <br> <br> <b>Note:</b> <i>MSIL and IL refer to the same thing. Thats intermediate code generated by the language compiler. Java people can compare it with Java Byte Code.</i><br> <br> <span class="wboxhead">Execution Process</span><br> When the user either clicks on the executable assembly or calls it from the command prompt the execution process starts. First, the Operating System loads the executable assembly, the C# compiler crafts the assembly in such a way that as soon as the OS load it the control jumps to the Microsoft .NET Runtime or Common Language Runtime. The common language runtime has a J<b>ust-In-Time compiler </b>which is very smart. The JIT compiler inspects the IL within the assembly and identifies the part thats required to run the assembly. It only converts the required IL into native code. Native code then executes against the hardware generating the final output. <br> <br> <b>Note: </b><i>Since C# Executable Assemblies have IL code they cannot execute without the .NET Runtime. Hence you cannot run your C# applications on client machines without the .NET framework installed.</i></p> <p align="center" > <img border="0" src="../../img/exploringyourfirstapplication2.gif" width="396" height="301"><br> <b>Figure 2:</b><i> C# Application Execution Process</i></p> <p> <span class="wboxheado">Skinny Dipping in IL</span><br> There are a lot of articles available today that teach you the C# language. I was looking at ways to add some interesting learning besides just learning the language. In order to get in-depth understanding of how Microsofts C# compiler emits IL code which is consumed by the .NET runtime its necessary to dig into IL. Please be aware that the information in this section totally relies on the version of the C# compiler being used, Microsoft can anytime choose to optimize/change the output of IL from the source code. <br> <br> The goal of this section is not to teach you to write IL, but its to show you some nifty things in IL as well as reconfirm a few things we have discussed above. Digging into IL will give you a clear understanding of the concepts. <br> <br> <b>Note:</b> <i>Exploring IL code can be an addictive hobby!! Caution is advised.</i><br> <br> <span class="wboxhead">Intermediate Language (IL) An Overview</span><br> As described in the compilation process section above, when a C# source code is compiled by the C# compiler its converted into an assembly containing IL code.<br> The fact that makes this a very important step is that all managed languages like VB.NET, Jscript.NET, J# etc. follow the same process. That is, on compilation of any managed language the compiler produces an assembly with IL code. IL is a language itself, which has been standardized by <a class="wbox" target="_blank" href="http://www.ecma-international.org/publications/standards/Ecma-335.htm">ECMA</a>. <br> The impact of this is that the .NET runtime can run any managed assembly without caring in which language it was created. IL abstracts out the language specific details from the .NET Runtime. Hence even you can write a new language can create a compiler that will generate IL code and everyone who had just the .NET Runtime installed will be able to execute your application. <br> IL can also be considered the assembly level language for the .NET Runtime. Even though its a bit cryptic to understand, still its quite verbose to make some sense out of it. In fact once you learn to read and understand IL you can explore the internals of the libraries provided by Microsoft.<br> <br> <span class="wboxhead">IL Disassembler (ILDASM) Tool to explore IL</span><br> In order to extract the IL code out of a managed assembly you need to use the <b>ILDASM</b> tool. This tool along with other development tools can be generally found in the <i>C:\Program Files\Microsoft .NET\FrameworkSDK\v1.1\Bin</i> directory.<br> There are a lot of features in this tool. In this article it will be only used to extract the IL code from the assembly.<br> <br> <span class="wboxhead">Analyzing the HelloWorld.exe assembly</span><br> Let us extract the IL for the <i>HelloWorld.exe</i> assembly we generated in the previous article. Open the command prompt and navigate to the directory containing the <i>HelloWorld.exe</i> assembly (<i>c:\csharp</i>).<br> On the command prompt give the following command to extract the IL from the assembly.<br> <br> <b>ildasm HelloWorld /output:HelloWorld.il</b> </p> <p align="center"> <img border="0" src="../../img/exploringyourfirstapplication3.gif" width="469" height="211"><br> <b>Figure 3:</b> <i>Decompiling HelloWorld.exe</i></p> <p > This command will extract the IL from the <i>HelloWorld.exe</i> assembly into <b> HelloWorld.il</b> file. <br> <br> <b>Note:</b> <i>Please ensure you have compiled the HelloWorld.cs source code to produce the HelloWorld.exe assembly before you can disassemble it.</i><br> <br> Open the <i>HelloWorld.il</i> file in notepad (or any other text editor). Now dont get scared by all the code in that file.<br> Scan through the file till you find the <b>Class Member Definition</b> section. The snip below displays the IL from this section.</p><p> <table cellpadding="1" cellspacing="2" width="100%" class="Code"> <tr> <td width="100%"><pre> .class public auto ansi beforefieldinit HelloWorld extends [mscorlib]System.Object { .method public hidebysig static void Main() cil managed { .entrypoint // Code size 11 (0xb) .maxstack 1 IL_0000: ldstr "Hello World !" IL_0005: call void [mscorlib]System.Console::WriteLine(string) IL_000a: ret } // end of method HelloWorld::Main .method public hidebysig specialname rtspecialname instance void .ctor() cil managed { // Code size 7 (0x7) .maxstack 1 IL_0000: ldarg.0 IL_0001: call instance void [mscorlib]System.Object::.ctor() IL_0006: ret } // end of method HelloWorld::.ctor } // end of class HelloWorld </pre> </td></tr></table> </p><p><span class="wboxhead">Class Definition in IL</span><br> The <b>.class</b> IL keyword defines a class. You can note that the access modifier <b>public</b> is also present in the class definition. The <b>extends</b> keyword denotes that the <i>HelloWorld</i> class inherits from the <b>System.Object</b> class. Every class in .NET implicitly inherits from a base class <b>Object</b> defined in the <i>System</i> namespace even if its not explicitly mentioned (we will learn about inheritance in future articles). Similarly, in the HelloWorld source code even though we did not inherit HelloWorld class from any other base class it implicitly inherits the <i>System.Object</i> class. The <b>[mscorlib]</b> attribute declaration indicates that the <i>System.Object</i> class is implemented in the <i>mscorlib</i> assembly. This way of .NET to clearly mark the assemblies from which the classes are referenced makes all .NET assemblies totally self-describing. <p><span class="wboxhead">Main Method in IL</span><br> As noted above the <i>Main</i> method has a special significance in C#. Since the Main method is the first method that is called when the application starts. The snip below shows the IL for the Main method. <p><table cellpadding="1" cellspacing="2" width="100%" class="Code"> <tr> <td width="100%"><pre> .method public hidebysig static void Main() cil managed { .entrypoint // Code size 11 (0xb) .maxstack 1 IL_0000: ldstr "Hello World !" IL_0005: call void [mscorlib]System.Console::WriteLine(string) IL_000a: ret } // end of method HelloWorld::Main</pre></td></tr></table></p> <br> The <b>.method</b> IL keyword defines a method. The access modifier <i>public</i>, the method type <i>static</i> and the method return type void can also be observed.<br> The most interesting IL keyword to note is <b>.entrypoint</b>. Its the <i>.entrypoint</i> keyword which indicates to the .NET runtime to start executing the assembly from this method.<br> <br> Its interesting to note that its the languages like C#, VB.NET that lay the rule that an assembly can only start from a method called Main. On the IL level any public static method that has the <i>.entrypoint</i> keyword defined will be first called when the application starts. <br> <br> <b>Note:</b> <i>There cannot be two .entrypoint keywords defined within a single assembly.</i><br> <br> <b>Note:</b> <i>All executables (*.exe) need to have an entrypoint. Dlls dont have/need an entrypoint.</i><br> <br> On the line <i>IL 0000</i> the<b> ldstr</b> keyword allocates memory for the string Hello World ! on the stack. <br> <br> Line <i>IL 0005</i> then uses the <b>call</b> keyword to call the <i>WriteLine</i> method that takes a string parameter from the stack as an argument. The <i> WriteLine</i> method is from <i>System.Console</i> class from the <i>mscorlib</i> assembly, this is clearly expressed in IL. <br> <br> Lastly, line <i>IL 000a</i> returns back. As mentioned earlier all methods return after completing execution. <br> <br> <span class="wboxhead">Default Constructor in IL</span> <br> Another implicit declaration is that every class has a default constructor. A constructor is a special method that is called when an object is created. All the code to initialize an object before it can be used is added in the constructor. The default constructor for the HelloWorld class is shown below.<br> <p> <table cellpadding="1" cellspacing="2" width="100%" class="Code"> <tr> <td width="100%"><pre> .method public hidebysig specialname rtspecialname instance void .ctor() cil managed { // Code size 7 (0x7) .maxstack 1 IL_0000: ldarg.0 IL_0001: call instance void [mscorlib]System.Object::.ctor() IL_0006: ret } // end of method HelloWorld::.ctor</pre></td></tr></table> </p><br> The C# compiler by default adds a constructor in the IL code if your class does not contain the definition of it. <br> Here the <i>.method</i> defines a new method but the method as a special name <i> .ctor</i> indicating its the classs constructor.<br> <br> This method also returns no value as indicated by the <i>void</i> return type. <br> On line <i>IL 0001</i> the default constructor of HelloWorlds base class i.e. <i>System.Object</i> classs is called. Line <i>IL 0006</i> returns the control after the method has finished executing.<br> <br> <b>Note:</b> <i>On scanning the HelloWorld.il file you must have noticed that comments added in source code do not appear in the IL code.</i><br> <br> I hope this insight into IL will help you understand the internals of C# better and provide a deeper understanding on why things have been implemented in a certain way.<br> <br> <span class="wboxheado">C# Keywords Encountered</span><br> 1) <b>class</b> Defines a class<br> 2) <b>public</b> Access Modifier for class and methods<br> 3) <b>static</b> Defines the method to be a class level member.<br> 4) <b>void</b> - Used to indicate that a method returns no value<br> <br> <span class="wboxheado">Points to Remember</span><br> 1) C# has multiple types of comments. A liberal usage of comments is recommended. Comments are omitted by the C# compiler while compiling.<br> 2) Class is the blue print / template of an object. All behavior and data is wrapped inside the class.<br> 3) Main method is required in all applications and is the first method the .NET Runtime uses to start executing the application.<br> 4) The WriteLine method of the System.Console class is used to print messages to the console window.<br> 5) The C# compiler csc.exe converts the C# source code into an assembly containing IL code.<br> 6) When you execute a managed assembly the JIT compiler within the .NET Runtime is invoked and it compiles the necessary IL code from the assembly into native language code.<br> 7) All managed language compilers compile to produce an assembly containing IL code.<br> 8) ILDasm is the tool that can be used to disassemble a managed assembly into IL.<br><br> <span class="wboxheado">Next Step</span><br> Read <a href="http://www.mastercsharp.com/article.aspx?ArticleID=92&&TopicID=4">this article</a> that provides a quick overview of Object Oriented Programming (OOPS). <br> <br> <span class="wboxheado">Curious Minds</span><br> 1) Try to modify the HelloWorld program to print your own message to the screen. Remember to save the source code after each code change and recompile the program to reflect the changes.<br> 2) Try adding more comments in the C# source code.<br> 3) What happens when 2 or more classes in the same assembly have the Main method defined? Which method is used by the runtime to while loading the application?<br> (Hint: Look at the C# compiler options documentation for the answer.)<br> 4) Explore the System.Console class and try using its different methods.<br>