Wednesday, July 17, 2013

Copy folders and files - example using LINQ

Their are many examples available to copy entire content of a folder to another folder also retaining its file structure. Here I tried to do the same just with a different flavor. I used LINQ which is new and I haven't came across anyone who used it's power.

Concept is very simple with following steps:
  1. Get list of all files with complete path.
  2. Check each file and determine if corresponding folder structure exists? if not then create the folder.
  3. Copy the relevant file to destination location
Created simple console application for reference and use it as per your requirement.

namespace CopyFilesLINQ
{
    class Program
    {
        static void Main(string[] args)
        {
            string src = @"D:\Adobe_Photoshop_CS3";
            string dest = @"d:\TestCopy";
            string DD;
            String[] saFiles = Directory.GetFiles(src, "*.*", SearchOption.AllDirectories);
            double dlen = saFiles.Length, dcnt = 0.00;
            foreach (var file in saFiles)
            {
                string[] x = ((from f in dest.Split(Path.DirectorySeparatorChar) 
                               select f).Union(from s in file.Split(Path.DirectorySeparatorChar) 
                                               where !src.Split(Path.DirectorySeparatorChar).Contains(s) 
                                               select s)).ToArray();
                for (int i = 0; i < (x.Length - 1); i++)
                {
                    if (i == 0)
                        DD = x[i];
                    else
                        DD = string.Format(@"{0}{1}{2}", DD, Path.DirectorySeparatorChar.ToString(), x[i]);
                    if (!Directory.Exists(DD))
                        Directory.CreateDirectory(DD);
                }
                try
                {
                    File.Copy(file, string.Join(Path.DirectorySeparatorChar.ToString(), x));
                }
                catch (Exception e)
                {
                    // record the exception in trace log or in a list to refer it later or display
                }
                dcnt++;
                
                Console.SetCursorPosition(1,0);
                Console.WriteLine(string.Format("Copy progress {0} %", ((dcnt / dlen) * 100).ToString("0.00")));
            }
            Console.SetCursorPosition(1, 0);
            Console.WriteLine("file Copy successful");
            Console.ReadLine();
        }
    }
}

Sunday, June 30, 2013

DAO 3.6 - Repair and Compact Database

There is something I came across a while back when migrating an old VB6 application to .NET. Everything was fine but had some roadblocks with DAO in VB6. Typically in VB6 all the data objects are created using DAO and migrating it to ADO.NET is simple except one area. There is a functionality of RepairDatabase which is not present in the ADO.NET. I left that as it is and to do that imported Interop.DAO.dll (3.51 version) in new .NET application. RepairDatabase method is not present in DAO 3.6 onward, will discuss it later.

The issue faced, even though it is working well in development environment but in production environment it is throwing me error.
System.Runtime.InteropServices.COMException (0x80040112): Creating an instance of the COM component with CLSID {00000010-0000-0010-8000-00AA006D2EA4} from the IClassFactory failed due to the following error: 80040112.

Googled and got confused, nothing specific and nothing conclusive. So, started gathering information from different sources and tried to get to some conclusion...

What is DAO?
(Data Access Objects) was the first object-oriented interface that exposed the Microsoft Jet database engine (used by Microsoft Access) and allowed Visual Basic developers to directly connect to Access tables - as well as other databases - through ODBC. DAO is suited best for either single-system applications or for small, local deployments. Here is a good post to help you out.

Reason for this error is Data Access Objects (DAO) is not properly registered. But when you try to register using regsvr32, DAO 3.51 it fails as there is no entry point. but thats not an issue with DAO 3.6. There are some major difference between 3.5 and 3.6. The later has been totally revamped, this is by design to match Microsoft Jet 4.0.

As per the Microsoft recommendation
RDO and ADO can still be used in code from Visual Basic 2008, with some trivial modifications. However, Visual Basic 2008 does not support DAO and RDO data binding to controls, data controls, or RDO User connection. We recommend that if your applications contain DAO or RDO data binding you either leave them in Visual Basic 6.0 or upgrade the DAO or RDO data binding to ADO before upgrading your project to Visual Basic 2008, as ADO data binding is supported in Windows Forms. Information on how to upgrade DAO or RDO to ADO in Visual Basic 6.0 is available in the Visual Basic 6.0 Help.

Here is a surprise...
In Data Access Object (DAO) 3.6, the RepairDatabase method is no longer available or supported. This is by design to match Microsoft Jet 4.0. If you need this functionality, you can use the CompactDatabase method, which also repairs a Microsoft Jet database. Registering DAO 3.6 dll has no issue. Compacting a Jet/ACE database first detects if there are any problems in need of repair and if there are none, it skips the repair step and just compacts the file (rewriting data and index pages in contiguous data files and discarding unused data pages and updating all statistics and internal pointers, etc.).

In JRO if you compact a Access 97 database it will convert to Access 2000 format because even though JRO can read the Access 97 file it no longer support it and it is by design.

Following research implementation has two sections one Repairing and Compacting database using DAO and JRO. You may need to include the DAO 3.6 dlls in your application folder.
For that Add Following COM reference:
  • Microsoft DAO 3.6 Object Library
  • Microsoft Jet and Replication Objects 2.6 Library
namespace RepairDB
{
    class Program
    {
        static void Main(string[] args)
        {
            
            try
            {
                DAO.Database dd;
                DAO.DBEngine db = new DAO.DBEngine();
                dd = db.OpenDatabase(@"d:\KSDB1.mdb", null, null, ";pwd=KSTEST1");
                dd.Close();
                
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
            }
            Console.WriteLine("Open and Close of database successful using DAO.");
            try
            {
                File.Delete(@"d:\KSDB1_Tmp.mdb");
                if (File.Exists(@"d:\KSDB1.mdb"))
                {
                    DAO.DBEngine db = new DAO.DBEngine();
                    db.CompactDatabase(@"d:\KSDB1.mdb", @"d:\KSDB1_Tmp.mdb", null, null, ";pwd=KSTEST1");
                    File.Delete(@"d:\KSDB1.mdb");
                    File.Move(@"d:\KSDB1_Tmp.mdb", @"d:\KSDB1.mdb");
                }   
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
            }
            Console.WriteLine("Compacting and Repair database successful using DAO.");
            try
            {
                File.Delete(@"d:\KSDB1_Tmp.mdb");
                if (File.Exists(@"d:\KSDB1.mdb"))
                {
                    JRO.JetEngine jj = new JRO.JetEngine();
                    jj.CompactDatabase(@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=d:\KSDB1.mdb;Jet OLEDB:Database Password=KSTEST1",
                        @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=d:\KSDB1_Tmp.mdb;Jet OLEDB:Database Password=KSTEST1");
                    File.Delete(@"d:\KSDB1.mdb");
                    File.Move(@"d:\KSDB1_Tmp.mdb", @"d:\KSDB1.mdb");
                }
            }
            catch (Exception e)
            {
                Console.WriteLine(e.Message);
            }
            Console.WriteLine("Compacting and Repair database successful using JRO.");
            Console.ReadLine();
        }
    }
}

Furthur Readings
http://msdn.microsoft.com/en-us/library/office/aa164825(v=office.10).aspx
http://msdn.microsoft.com/en-us/library/aa984815(v=vs.71).aspx
http://msdn.microsoft.com/en-us/library/e80y5yhx(v=vs.110).aspx

Friday, June 7, 2013

Indexer Enumerator etc. etc. examples

It is so simple to work with an Array or List of string. But when it comes to use List of complex class objects it becomes nightmare. Some simply prefer to go on implementing long inefficient codes for it. At the same time due to lots of code and complexity quality and scalability impacted.

Lets now give a second thought to it and understand how can we implement it efficiently with minimal effort.

What is an Indexer?

Defining an indexer allows you to create classes that act like "virtual arrays." Instances of that class can be accessed using the [] array access operator. Defining an indexer in C# is similar to defining operator [] in C++, but is considerably more flexible. For classes that encapsulate array- or collection-like functionality, using an indexer allows the users of that class to use the array syntax to access the class. An indexer is often used to implement a stack so that its contents may be accessed without item removal.

An indexer's simple syntax helps client applications access element groups as an array object member (type, class, or struct). An indexer provides an indirect method of inserting boundary checking logic. Due to its intuitive nature, an indexer improves code readability.

Now, in the implementation below there are multiple implementations how we can use Indexer and Collection for simple implementation of complex class / struct objects. A Class is a reference type so the objects needed to created explicitly. Where as a struct in value type and there is no need to create object explicitly.

* First implementation
A simple Test1 class with string array variable and using Indexer exposing string array.

* Second implementation
Extending First implementation to make it two dimensional array.

* Third implementation
Implementation of array using struct, being value type implementation is as simple as siring array. Extending it is easy and so just a simple implementation.

* Fourth implementation
It is simple class array implementation but it not that simple. If you see after defining length of array each node object needed to be created. This is an overhead but if implemented in a better way will be more simpler and scalable.

* Fifth implementation
This is extension of previous implementation but this time created new list class using IEnumerable<> interface and implementing GetEnumerator. This way even though it is complicated for simple implementation but for complicated class object it is simple.

namespace Indexer
{
    class Program
    {
        static void Main(string[] args)
        {
            // First implementation
            // implementing class using indexer
            Console.WriteLine("implementing class using indexer");
            Test1 EE = new Test1();
            EE[0] = "this 1";
            EE[1] = "this 2";
            //foreach (var X1 in EE) cannot implement because it doesn't have public definition of GetNumerator
            Console.WriteLine("{0}", EE[0].ToString());
            Console.WriteLine("{0}", EE[1].ToString());

            // Second implementation
            // extending indexable class to build 2 dimentional array
            Console.WriteLine("extending indexable class to build 2 dimentional array");
            Test1[] XX = new Test1[2];
            XX[0] = new Test1();
            XX[1] = new Test1();
            XX[0][0] = "XXX";
            XX[0][1] = "YYY";
            XX[1][0] = "aaa";
            XX[1][1] = "bbb";
            foreach (var X1 in XX)
                Console.WriteLine(X1[0] + " ; " + X1[1]);

            // Third implementation
            // simple implementation of array of object using struct
            Console.WriteLine("simple implementation of array of abject using struct");
            objStruct[] ddd = new objStruct[2];
            ddd[0].iCnt = 1;
            ddd[0].sName = "name 1";
            ddd[1].iCnt = 2;
            ddd[1].sName = "name 2";
            foreach (var Xd in ddd)
                Console.WriteLine(Xd.iCnt.ToString() + " ; " + Xd.sName);
            
            // Fourth implementation
            // simple implementation of array of object using class
            Console.WriteLine("simple implementation of array of object using class");
            objClass[] sss = new objClass[2];
            sss[0] = new objClass();
            sss[1] = new objClass();
            sss[0].iCnt = 1;
            sss[0].sName = "name 1";
            sss[1].iCnt = 2;
            sss[1].sName = "name 2";
            foreach (var Xd in sss)
                Console.WriteLine(Xd.iCnt.ToString() + " ; " + Xd.sName);

            // Fifth implementation
            // extending class object to make it enumerable using IEnumerable interface
            Console.WriteLine("extending it enumerable using IEnumerable interface");
            objClassList ssX = new objClassList();
            ssX.Add(new objClass() { iCnt = 1, sName = "XXC" });
            ssX.Add(new objClass() { iCnt = 2, sName = "CCC" });
            ssX.Add(new objClass() { iCnt = 3, sName = "BBB" });
            foreach (var w in ssX)
                Console.WriteLine(w.iCnt + " # " + w.sName);
            Console.ReadLine();
        }
    }
    class Test1
    {
        private string[] names;
        public Test1()
        {
            this.names = new string[2];
        }
        public string this[int i]
        {
            get { return this.names[i]; }
            set { this.names[i] = value; }
        }
    }
    struct objStruct
    {
        public int iCnt;
        public string sName;
    }
    class objClass
    {
        public int iCnt;
        public string sName;

    }
    class objClassList : IEnumerable<objclass>
    {
        private List<objclass> objList = new List<objclass>();
        public int Count 
        { 
            get 
            { 
                return objList.Count; 
            } 
        }
        public objClass this[int index]
        {
            get
            {
                return objList[index];
            }
        }
        public void Add(objClass objX)
        {
            objList.Add(objX);
        }
        public void Remove(objClass objX)
        {
            objList.Remove(objX);
        }
        public IEnumerator<objclass> GetEnumerator()
        {
            return this.objList.GetEnumerator();
        }
        System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
        {
            return this.GetEnumerator();
        }
    }
}

Console Output
implementing class using indexer                         
this 1                                                   
this 2                                                   
extending indexable class to build 2 dimentional array   
XXX ; YYY                                                
aaa ; bbb                                                
simple implementation of array of abject using struct    
1 ; name 1                                               
2 ; name 2                                               
simple implementation of array of object using class     
1 ; name 1                                               
2 ; name 2                                               
extending it enumerable using IEnumerable interface      
1 # XXC                                                  
2 # CCC                                                  
3 # BBB                                                  

Thursday, May 30, 2013

Decorator Pattern - with simple example

In general it is hard for guys to understand different design patterns. Decorator design pattern is one even though simple it is bit tricky to understand. To make it simple I have taken a simple example to calculate selling price of 1 kg of SUGAR from manufacturing to hand of buyer. Also, consider selling of 1 kg of sugar in 2 different states. Just consider sugar is manufactured in state JKL.

Lets consider following scenario for both the states:
  1. Manufacturing and Packaging Cost of 1 kg sugar in JKL state = $ 15.00
  2. Transportation Cost = in state ABC is $ 25.00; in state XYZ is $ 23.00
  3. Retail Shop Profit = in state ABC is $ 23.00; in state XYZ is $ 31.22
  4. Sales Tax = in state ABC is 11 %; in state XYZ is 21 %
Now, lets implement this scenario using Decorator Design Pattern and find out Selling Price. But before that let understand the idea behind this design pattern. 

Wiki Says The decorator pattern can be used to extend (decorate) the functionality of a certain object statically, or in some cases at run-time, independently of other instances of the same class, provided some groundwork is done at design time. This is achieved by designing a new decorator class that wraps the original class.

Everything start with a base class and simple abstract method
public abstract class Base
{
    public abstract double Cost();
}

Use the base class to implement the base object to calculate manufacturing cast of 1 kg of sugar in state JKL
public class SugarBasePrice : Base
{
    public override double Cost()
    {
        Console.WriteLine("1 kg Sugar Base Price cost = 15.00");
        return 15.00;
    }
}

Now, there are additional decoration i.e. expenses needed to be added. But we know now an then additional taxes or expenses added on it. To take care of it without tampering with existing existing implementation will implement decorator base class inherited from product base class.
public abstract class AdditionalExpense : Base
{
    protected Base BaseObj { get; set; }
    public AdditionalExpense(Base b)
    {
        Console.WriteLine("AdditionalExpense constructor");
        BaseObj = b;
    }
}

Now, inherit the decorator base class and implement the different decorator classes for state ABC (refer to example given above).
public class TransportationCost2ABC : AdditionalExpense
{
    public TransportationCost2ABC(Base b)
        : base(b)
    {
        Console.WriteLine("TransportationCost2ABC constructor");
    }
    public override double Cost()
    {
        double d = 25.00 + BaseObj.Cost();
        Console.WriteLine("with Transportation Cost to ABC cost = " + d);
        return d;
    }
}
public class RetailShopProfitInABC : AdditionalExpense
{
    public RetailShopProfitInABC(Base b)
        : base(b)
    {
        Console.WriteLine("RetailShopProfitInABC constructor");
    }
    public override double Cost()
    {
        double d = 23.00 + BaseObj.Cost();
        Console.WriteLine("with Retail Shop Profit In ABC cost = " + d);
        return d;
    }
}
public class SalesTaxInABC : AdditionalExpense
{
    public SalesTaxInABC(Base b)
        : base(b)
    {
        Console.WriteLine("SalesTaxInABC constructor");
    }
    public override double Cost()
    {
        double d = BaseObj.Cost();
        d += (d * 0.11);
        Console.WriteLine("with Sales Tax In ABC cost = " + d);
        return d;
    }
}

Now, implement it for state XYZ (refer to example given above).
public class TransportationCost2XYZ : AdditionalExpense
{
    public TransportationCost2XYZ(Base b)
        : base(b)
    {
        Console.WriteLine("TransportationCost2XYZ constructor");
    }
    public override double Cost()
    {
        double d = 23.00 + BaseObj.Cost();
        Console.WriteLine("with Transportation Cost to XYZ cost = " + d);
        return d;
    }
}
public class RetailShopProfitInXYZ : AdditionalExpense
{
    public RetailShopProfitInXYZ(Base b)
        : base(b)
    {
        Console.WriteLine("RetailShopProfitInXYZ constructor");
    }
    public override double Cost()
    {
        double d = 31.22 + BaseObj.Cost();
        Console.WriteLine("with Retail Shop Profit In XYZ cost = " + d);
        return d;
    }
}
public class SalesTaxInXYZ : AdditionalExpense
{
    public SalesTaxInXYZ(Base b)
        : base(b)
    {
        Console.WriteLine("SalesTaxInXYZ constructor");
    }
    public override double Cost()
    {
        double d = BaseObj.Cost();
        d += (d * 0.21);
        Console.WriteLine("with Sales Tax In XYZ cost = " + d);
        return d;
    }
}

I think it is still not clear how will this implementation works. To understand, if you have seen the code above there are many break points added to display in console regarding when the constructors and cost method fires. Following is the console application implementation and output. Observe the output carefully, you will see the invocation of different constructor and calculations:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace DecoratorPattern
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("********* State ABC Details ************");
            Base BB = new SugarBasePrice();
            BB = new TransportationCost2ABC(BB);
            BB = new RetailShopProfitInABC(BB);
            BB = new SalesTaxInABC(BB);
            Console.WriteLine("Sale Price of 1 Kg Sugar in ABC state = " + BB.Cost());
            Console.WriteLine("********* State XYZ Details ************");
            Base CC = new SugarBasePrice();
            CC = new TransportationCost2XYZ(CC);
            CC = new RetailShopProfitInXYZ(CC);
            CC = new SalesTaxInXYZ(CC);
            Console.WriteLine("Sale Price of 1 Kg Sugar in XYZ state = " + CC.Cost());
            Console.ReadLine();
        }
    }
}

Console Output
********* State ABC Details ************             
AdditionalExpense constructor                        
TransportationCost2ABC constructor                   
AdditionalExpense constructor                        
RetailShopProfitInABC constructor                    
AdditionalExpense constructor                        
SalesTaxInABC constructor                            
1 kg Sugar Base Price cost = 15.00                   
with Transportation Cost to ABC cost = 40            
with Retail Shop Profit In ABC cost = 63             
with Sales Tax In ABC cost = 69.93                   
Sale Price of 1 Kg Sugar in ABC state = 69.93        
********* State XYZ Details ************             
AdditionalExpense constructor                        
TransportationCost2XYZ constructor                   
AdditionalExpense constructor                        
RetailShopProfitInXYZ constructor                    
AdditionalExpense constructor                        
SalesTaxInXYZ constructor                            
1 kg Sugar Base Price cost = 15.00                   
with Transportation Cost to XYZ cost = 38            
with Retail Shop Profit In XYZ cost = 69.22          
with Sales Tax In XYZ cost = 83.7562                 
Sale Price of 1 Kg Sugar in XYZ state = 83.7562      

Tuesday, May 28, 2013

Finalize and Dispose - How it works

Initially I had some difficulty in understanding how finalize and Dispose method works and its relation with Garbage Collection in .NET. Now a days realized that many are having the same issue even though they have sound practical experience. Following implementation is a simple way to understand how it works.

First lets understand what is Finalize and Dispose methods and how it is correlated to each other.

As per MSDN Dispose Method
  • should only be used for objects which uses unmanaged resources like FileStream etc. 
  • should be implemented using IDisposable interface.
  • should release all the resources that it owns. It should also release all resources owned by its base types by calling its parent type's Dispose method. The parent type's Dispose method should release all resources that it owns and in turn call its parent type's Dispose method, propagating this pattern through the hierarchy of base types. To help ensure that resources are always cleaned up appropriately, a Dispose method should be callable multiple times without throwing an exception.
  • should call GC.SupressFinalize method so that object should not land up in Finalization Queue in Garbage Collection. Why so?? because when you call Dispose method you have already released all the managed and unmanaged resources to collect using GC directly without putting through the Finalization process, else you will get NullReferenceException.
As per MSDN Finalize Method
  • is protected and therefore is accessible only through this class or through a derived class e.g. protected virtual void Finalize()
  • called automatically after object is not accessible or after shutdown of application domain unless it is exempted from it by using GC.SupressFinalize. It releases all unmanaged resources before destroying the object.
  • time and order of execution of Finalizers cannot be predicted or pre-determined that's why you'll hear that the nature of finalization is "non-deterministic".
  • Destructors are the C# mechanism for performing cleanup operations. Destructors provide appropriate safeguards, such as automatically calling the base type's Destructor. In C# code, Object.Finalize cannot be called or overridden. But in the VB.NET, Finalize method can be override because it does support Destructor method.
  • In practical scenario if you have implemented Dispose method then you should implement Finalize or Destructor. If any scenario call Dispose then it need not call the finalize else it will. Destructor internally converted into System.Object.Finalize during Garbage Collection process.
Following is a sample class which has Dispose method and Destructor. Let's understand how it works and when it is called, when you can visualize it understanding it will be easy.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Windows.Forms;

namespace DisposableFinalizeExample
{
    class TestClass : IDisposable
    {
        public String sData;
        public TestClass()
        {
            sData = "Test Test Test";
            Console.WriteLine("Call Constructor");
        }
        public void Dispose()
        {
            Dispose(true);
            GC.SuppressFinalize(this);
            Console.WriteLine("Call Dispose Method");
        }
        protected virtual void Dispose(bool bDispose)
        {
            if (bDispose)
            {
                ReleaseManagedResource();
            }
            ReleaseUnmanagedResource();
        }
        void ReleaseManagedResource()
        {
            Console.WriteLine("Release Manage Resource");
        }
        void ReleaseUnmanagedResource()
        {
            Console.WriteLine("Release Unmanage Resource");
        }
        ~TestClass()
        {
            Dispose(false);
            Console.WriteLine("Destructor called in Finalization");
            Console.ReadLine();
        }
    }
}

Now, here is a sample console implementation which has two sections
  • Section 1 - Efficient way to call Dispose method
  • Section 2 - Calling Destructor in Garbage Collection. In execution you will observe that after the Destructor is called Console application closes automatically even though there is Console.ReadLine() to pause execution.

namespace DisposableFinalizeExample
{
    class Program
    {
        static void Main(string[] args)
        {
            // Section 1 - calling Dispose method automatically using "using"
            using (TestClass xx = new TestClass())
            {
                Console.WriteLine("2nd call " + xx.sData);
            }
            Console.ReadLine();
            // Section 2 - Calling Destructor which internally converted in Finalize and callec in GC
            if (true)
            {
                TestClass xx = new TestClass();
                if (xx.sData == "Test Test Test")
                {
                    Console.WriteLine("2nd call " + xx.sData);
                }
            
            }
            Console.ReadLine();
        }
    }
}

Here is the console output
Call Constructor                                            
2nd call Test Test Test                                     
Release Manage Resource                                     
Release Unmanage Resource                                   
Call Dispose Method                                         
<<PRESS ENTER>>                                             
Call Constructor                                            
2nd call Test Test Test                                     
<<PRESS ENTER>>                                             
Release Unmanage Resource                                   
Destructor called in Finalization                           

Monday, May 20, 2013

Check Execution Time - Stopwatch

This is a quick reference for how to check execution time in C#. This may be helpful to others but basically keeping it for my reference; as many times when needed it I need to Google it. Following code is sample I implemented in the Zipping Assembly to test which implementation is faster.

Need to reference System.Diagnostics but the implementation is quiet simple.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Archive;
using System.Diagnostics;
using System.Threading;

namespace TestConsole
{
    class Program
    {
        static void Main(string[] args)
        {
            Stopwatch sw = new Stopwatch();
            sw.Start();
            ArchiveCreator ss = new WinBaseZipCreator(@"d:\testzip1.zip");
            ArchiveObj xx = ss.GetArchieve();
            xx.AddFile(@"d:\TEST1.txt");
            xx.AddFile(@"d:\TEST2.txt");
            xx.SaveArchive();
            sw.Stop();
            Console.WriteLine(sw.Elapsed);
            Thread.Sleep(5000);
            sw.Restart();
            ArchiveCreator sss = new IonicZipCreator(@"d:\testzip2.zip");
            ArchiveObj xxx = sss.GetArchieve();
            xxx.AddFile(@"d:\TEST1.txt");
            xxx.AddFile(@"d:\TEST2.txt");
            xxx.SaveArchive();
            sw.Stop();
            Console.WriteLine(sw.Elapsed);
            Thread.Sleep(5000);
            sw.Restart();
            ArchiveCreator ssss = new ZipShellCreator(@"d:\testzip3.zip");
            ArchiveObj xxxx = ssss.GetArchieve();
            xxxx.AddFile(@"d:\TEST1.txt");
            xxxx.AddFile(@"d:\TEST2.txt");
            xxxx.SaveArchive();
            sw.Stop();
            Console.WriteLine(sw.Elapsed);
            Console.ReadLine();
        }
    }
}

Sunday, May 19, 2013

Understanding Garbage Collection in .NET

Preface

Garbage Collection and CLR is a pain point for many. Following article is written and extract taken from the reference URLs (see end of article). I tried to put all information in context to understand as your read through, in other words those questions that may be asked in interview. 95 per cent interviewer doesn't ask in depth but it is anytime good to impress them.
Why GC? Because GC and its processes in CLR covers critical understanding of .NET environment and how it works. I experienced once that interviewer started with "What is GC in .NET?" and I ended up spending almost half an hour on it discussing... funny?? But it’s true!!

Take your time out and read through the article… don’t forget to keep on googling if you don’t understand something.
This is a small contribution from my side for new comers in our .NET fraternity.

All the best...

Overview

The .NET Framework's garbage collector manages the allocation and release of memory for your application. Each time you create a new object, the common language runtime allocates memory for the object from the managed heap. As long as address space is available in the managed heap, the runtime continues to allocate space for new objects. However, memory is not infinite. Eventually the garbage collector must perform a collection in order to free some memory. The garbage collector's optimizing engine determines the best time to perform a collection, based upon the allocations being made. When the garbage collector performs a collection, it checks for objects in the managed heap that are no longer being used by the application and performs the necessary operations to reclaim their memory.

In the common language runtime (CLR), the garbage collector serves as an automatic memory manager. It provides the following benefits:
  • Enables you to develop your application without having to free memory.
  • Allocates objects on the managed heap efficiently.
  • Reclaims objects that are no longer being used, clears their memory, and keeps the memory available for future allocations. Managed objects automatically get clean content to start with, so their constructors do not have to initialize every data field.
  • Provides memory safety by making sure that an object cannot use the content of another object.

Virtual Memory

  • Virtual memory can be in three states:
    • Free. The block of memory has no references to it and is available for allocation.
    • Reserved. The block of memory is available for your use and cannot be used for any other allocation request. However, you cannot store data to this memory block until it is committed.
    • Committed. The block of memory is assigned to physical storage.
  • Virtual address space can get fragmented. This means that there are free blocks, also known as holes, in the address space. When a virtual memory allocation is requested, the virtual memory manager has to find a single free block that is large enough to satisfy that allocation request. Even if you have 2 GB of free space, the allocation that requires 2 GB will be unsuccessful unless all of that space is in a single address block.
  • You can run out of memory if you run out of virtual address space to reserve or physical space to commit.

Stack and Heap in Depth

We need to understand stack and heap before we move forward with Garbage Collection.  There are three types of virtual space available to play with
  1. Stack
  2. Managed Heap – for applications and assemblies runs in .NET Framework umbrella and GC is responsible to its memory management(in this section term “heap” means “managed heap”)
  3. Heap – for unmanaged applications and assemblies outside .Net Framework. We are not discussing it here.
The Stack is more or less responsible for keeping track of what's executing in our code.  The Heap is more or less responsible for keeping track of our objects. The Stack is self-maintaining, meaning that it basically takes care of its own memory management.  When the top box is no longer used, it's thrown out.  The Heap, on the other hand, has to worry about Garbage collection which deals with how to keep the Heap clean.

What goes on the Stack and Heap?
Four main types of things go in the Stack and Heap as our code is executing: Value Types, Reference Types, Pointers, and Instructions.
  • Value Type: The items which use System.ValueType when they are created. Example bool, byte, char, decimal, double, enum, float, int, long, sbyte, short, struct, uint, ulong and ushort.
  •  Reference Type: All the items declared with the types in this list are Reference types and inherit from System.Object. Example class, interface, delegate, object and string.
  • Pointer: The item to be put in our memory management scheme is a Reference to a Type. A Reference is often referred to as a Pointer.  We don't explicitly use Pointers, they are managed by the Common Language Runtime (CLR). A Pointer (or Reference) is different than a Reference Type in that when we say something is a Reference Type is means we access it through a Pointer.  A Pointer is a chunk of space in memory that points to another space in memory.  A Pointer takes up space just like any other thing that we're putting in the Stack and Heap and its value is either a memory address or null.
  • Instruction: When compiling to managed code, the compiler translates your source code into Microsoft intermediate language (MSIL), which is a CPU-independent set of instructions that can be efficiently converted to native code. MSIL includes instructions for loading, storing, initializing, and calling methods on objects, as well as instructions for arithmetic and logical operations, control flow, direct memory access, exception handling, and other operations. Before code can be run, MSIL must be converted to CPU-specific code, usually by a just-in-time (JIT) compiler. Because the common language runtime supplies one or more JIT compilers for each computer architecture it supports, the same set of MSIL can be JIT-compiled and run on any supported architecture.
Here are our two golden rules:
  1. A Reference Type always goes on the Heap - easy enough, right? 
  2. Value Types and Pointers always go where they were declared.  This is a little more complex and needs a bit more understanding of how the Stack works to figure out where items are declared.
The Stack, as we mentioned earlier, is responsible for keeping track of where each thread is during the execution of our code (or what's been called). This means that each thread has its own stack.
  • When our code makes a call to execute a method the thread starts executing the instructions that have been JIT compiled and live on the method table, it also puts the method's parameters on the thread stack.
  • Then, as we go through the code and run into variables within the method they are placed on top of the stack. Once we start executing the method, the method's parameters are placed on the stack.
  • Next, control (the thread executing the method) is passed to the instructions to the method which lives in our type's method table, a JIT compilation is performed if this is the first time we are hitting the method.
  • As the method executes, we need some memory for the "result" variable and it is allocated on the stack.
  • And all memory allocated on the stack is cleaned up by moving a pointer to the available memory address where method started and we go down to the previous method on the stack.
  • "result" variable is placed on the stack.  As a matter of fact, every time a Value Type is declared within the body of a method, it will be placed on the stack.

Value Types are also sometimes placed on the Heap.  Remember the rule; Value Types always go where they were declared?  Well, if a Value Type is declared outside of a method, but inside a Reference Type it will be placed within the Reference Type on the Heap. Let’s assume a method whose return type is user defined class object, it’s a reference type and stored in heap. Then after execution of method stack will get cleared as stated above but the result will be stored in heap.

* Once our program reaches a certain memory threshold and we need more Heap space, our GC will kick off.  The GC will stop all running threads (a FULL STOP), find all objects in the Heap that are not being accessed by the main program and delete them.  The GC will then reorganize all the objects left in the Heap to make space and adjust all the Pointers to these objects in both the Stack and the Heap.  As you can imagine, this can be quite expensive in terms of performance, so now you can see why it can be important to pay attention to what's in the Stack and Heap when trying to write high-performance code.

Parameters - When we make a method call here's what happens:
  1. Space is allocated for information needed for the execution of our method on the stack (called a Stack Frame). This includes the calling address (a pointer) which is basically a GOTO instruction so when the thread finishes running our method it knows where to go back to in order to continue execution.
  2. Our method parameters are copied over. This is what we want to look at more closely.
  3. Control is passed to the JIT'ted method and the thread starts executing code. Hence, we have another method represented by a stack frame on the "call stack".
Passing Value Type - When we are passing a value types, space is allocated (new class object of Type is created) and the value in our type is copied to the new space on the stack.

* If we have a very large value type (such as a big struct) and pass it to the stack, it can get very expensive in terms of space and processor cycles to copy it over each time. The stack does not have infinite space and just like filling a glass of water from the tap, it can overflow.

Passing Reference Type - It is similar to passing value types by reference; that means no new object is created only the reference to the memory location is passed. So, duplicate copies are not created which is sometimes efficient way when passing large value type like struct.

Condition for Garbage Collection

Garbage collection occurs when one of the following conditions is true:
  • The system has low physical memory.
  • The memory that is used by allocated objects on the managed heap surpasses an acceptable threshold. This means that a threshold of acceptable memory usage has been exceeded on the managed heap. This threshold is continuously adjusted as the process runs.
  • The GC.Collect method is called. In almost all cases, you do not have to call this method, because the garbage collector runs continuously. This method is primarily used for unique situations and testing.

What is Managed Heap and how it related to GC?

After the garbage collector is initialized by the CLR, it allocates a segment of memory to store and manage objects. This memory is called the managed heap, as opposed to a native heap in the operating system.

There is a managed heap for each managed process. All threads in the process allocate memory for objects on the same heap. To reserve memory, the garbage collector calls the Win32 VirtualAlloc function and reserves one segment of memory at a time for managed applications. The garbage collector also reserves segments as needed, and releases segments back to the operating system (after clearing them of any objects) by calling the Win32 VirtualFree function.

The fewer objects allocated on the heap, the less work the garbage collector has to do. When you allocate objects, do not use rounded-up values that exceed your needs, such as allocating an array of 32 bytes when you need only 15 bytes.

When a garbage collection is triggered, the garbage collector reclaims the memory that is occupied by dead objects. The reclaiming process compacts live objects so that they are moved together, and the dead space is removed, thereby making the heap smaller. GC takes care of re-referencing of live objects and update root table. This ensures that objects that are allocated together stay together on the managed heap, to preserve their locality. This process of re- referencing breaks any functionality which uses unsafe pointers to the managed objects.

The intrusiveness (frequency and duration) of garbage collections is the result of the volume of allocations and the amount of survived memory on the managed heap.

The heap can be considered as the accumulation of two heaps: the large object heap and the small object heap. The large object heap contains objects that are 85,000 bytes and larger. Very large objects on the large object heap are usually arrays. It is rare for an instance object to be extremely large.

What are the Generations in GC?

The managed heap is organized into generations so it can handle long-lived and short-lived objects. Garbage collection primarily occurs with the reclamation of short-lived objects that typically occupy only a small part of the heap.

There are three generations of objects on the heap:
Generation 0 - This is the youngest generation and contains short-lived objects. An example of a short-lived object is a temporary variable. Garbage collection occurs most frequently in this generation.
Newly allocated objects form a new generation of objects and are implicitly generation 0 collections, unless they are large objects, in which case they go on the large object heap in a generation 2 collections. Most objects are reclaimed for garbage collection in generation 0 and do not survive to the next generation.

Generation 1 - This generation contains short-lived objects and serves as a buffer between short-lived objects and long-lived objects.

Generation 2 - This generation contains long-lived objects. An example of a long-lived object is an object in a server application that contains static data that is live for the duration of the process.
Garbage collections occur on specific generations as conditions warrant. Collecting a generation means collecting objects in that generation and all its younger generations. A generation 2 garbage collection is also known as a full garbage collection, because it reclaims all objects in all generations (that is, all objects in the managed heap).

Survival and Promotions in GC

Objects that are not reclaimed in a garbage collection are known as survivors, and are promoted to the next generation. Objects that survive a generation 0 garbage collections are promoted to generation 1; objects that survive a generation 1 garbage collection are promoted to generation 2; and objects that survive a generation 2 garbage collection remain in generation 2.

When the garbage collector detects that the survival rate is high in a generation, it increases the threshold of allocations for that generation, so the next collection gets a substantial size of reclaimed memory. The CLR continually balances two priorities: not letting an application's working set get too big and not letting the garbage collection take too much time.

What are Ephemeral Generations and Segments in GC and how it works?

Because objects in generations 0 and 1 are short-lived, these generations are known as the ephemeral generations. Ephemeral generations must be allocated in the memory segment that is known as the ephemeral segment.

Each new segment acquired by the garbage collector becomes the new ephemeral segment and contains the objects that survived a generation 0 garbage collection. The old ephemeral segment becomes the new generation 2 segment. The ephemeral segment can include generation 2 objects. Generation 2 objects can use multiple segments (as many as your process requires and memory allows for).

The amount of freed memory from an ephemeral garbage collection is limited to the size of the ephemeral segment. The amount of memory that is freed is proportional to the space that was occupied by the dead objects.

What happens during Garbage Collection?

A garbage collection has the following phases:
  • A marking phase that finds and creates a list of all live objects.
  • A relocating phase that updates the references to the objects that will be compacted.
  • A compacting phase that reclaims the space occupied by the dead objects and compacts the surviving objects. The compacting phase moves objects that have survived a garbage collection toward the older end of the segment.
Because generation 2 collections can occupy multiple segments, objects that are promoted into generation 2 can be moved into an older segment. Both generation 1 and generation 2 survivors can be moved to a different segment, because they are promoted to generation 2. The large object heap is not compacted, because copying large objects imposes a performance penalty.

The garbage collector uses the following information to determine whether objects are live:
  • Stack roots: Stack variables provided by the just-in-time (JIT) compiler and stack walker.
  • Garbage collection handles: Handles that point to managed objects and that can be allocated by user code or by the common language runtime.
  • Static data: Static objects in application domains that could be referencing other objects. Each application domain keeps track of its static objects.

* Before a garbage collection starts, all managed threads are suspended except for the thread that triggered the garbage collection.

How Unmanaged Resources managed in Garbage Collection?

If your managed objects reference unmanaged objects by using their native file handles, you have to explicitly free the unmanaged objects, because the garbage collector tracks memory only on the managed heap.

Users of your managed object may not dispose the native resources used by the object. To perform the clean-up, you can make your managed object finalizable. Finalization consists of clean-up actions that you execute when the object is no longer in use. When your managed object dies, it performs clean-up actions that are specified in its finalizer method.

When a finalizable object is discovered to be dead, its finalizer is put in a queue so that its clean-up actions are executed, but the object itself is promoted to the next generation. Therefore, you have to wait until the next garbage collection that occurs on that generation (which is not necessarily the next garbage collection) to determine whether the object has been reclaimed.

What is Workstation and Server Garbage Collection?

The garbage collector is self-tuning and can work in a wide variety of scenarios. The only option you can set is the type of garbage collection, based on the characteristics of the workload. The CLR provides the following types of garbage collection:
  • Workstation garbage collection, which is for all client workstations and stand-alone PCs. This is the default setting for the <gcServer> element in the runtime configuration schema.
  • Workstation garbage collection can be concurrent or non-concurrent. Concurrent garbage collection enables managed threads to continue operations during a garbage collection.
  • Starting with the .NET Framework 4, background garbage collection replaces concurrent garbage collection.
* Server garbage collection, which is intended for server applications that need high throughput and scalability. Server garbage collection can be non-concurrent or background.

Configuring Garbage Collection; How?

You can use the <gcServer> element of the runtime configuration schema to specify the type of garbage collection you want the CLR to perform. When this element's enabled attribute is set to false (the default), the CLR performs workstation garbage collection. When you set the enabled attribute to true, the CLR performs server garbage collection.

Concurrent garbage collection is specified with the <gcConcurrent> element of the runtime configuration schema. The default setting is enabled. Concurrent garbage collection is available only for workstation garbage collection and has no effect on server garbage collection.

You can also specify server garbage collection with unmanaged hosting interfaces. Note that ASP.NET and SQL Server enable server garbage collection automatically if your application is hosted inside one of these environments.

Comparing Workstation and Server Garbage Collection

Threading and performance considerations for workstation garbage collection:
  • The collection occurs on the user thread that triggered the garbage collection and remains at the same priority. Because user threads typically run at normal priority, the garbage collector (which runs on a normal priority thread) must compete with other threads for CPU time.
  • Threads that are running native code are not suspended.
  • Workstation garbage collection is always used on a computer that has only one processor, regardless of the <gcServer> setting. If you specify server garbage collection, the CLR uses workstation garbage collection with concurrency disabled.
Threading and performance considerations for server garbage collection:
  • The collection occurs on multiple dedicated threads that are running at THREAD_PRIORITY_HIGHEST priority level.
  • A dedicated thread to perform garbage collection and a heap are provided for each CPU, and the heaps are collected at the same time. Each heap contains a small object heap and a large object heap, and all heaps can be accessed by user code. Objects on different heaps can refer to each other.
  • Because multiple garbage collection threads work together, server garbage collection is faster than workstation garbage collection on the same size heap.
  • Server garbage collection often has larger size segments.
  • Server garbage collection can be resource-intensive. For example, if you have 12 processes running on a computer that has 4 processors, there will be 48 dedicated garbage collection threads if they are all using server garbage collection. In a high memory load situation, if all the processes start doing garbage collection, the garbage collector will have 48 threads to schedule.
* If you are running hundreds of instances of an application, consider using workstation garbage collection with concurrent garbage collection disabled. This will result in less context switching, which can improve performance.

Concurrent Garbage Collection

In workstation or server garbage collection, you can enable concurrent garbage collection, which enables threads to run concurrently with a dedicated thread that performs the garbage collection for most of the duration of the collection. This option affects only garbage collections in generation 2; generations 0 and 1 are always non-concurrent because they finish very fast.

Concurrent garbage collection enables interactive applications to be more responsive by minimizing pauses for a collection. Managed threads can continue to run most of the time while the concurrent garbage collection thread is running. This results in shorter pauses while a garbage collection is occurring.

To improve performance when several processes are running, disable concurrent garbage collection.
Concurrent garbage collection is performed on a dedicated thread. By default, the CLR runs workstation garbage collection with concurrent garbage collection enabled. This is true for single-processor and multi-processor computers.

Your ability to allocate small objects on the heap during a concurrent garbage collection is limited by the objects left on the ephemeral segment when a concurrent garbage collection starts. As soon as you reach the end of the segment, you will have to wait for the concurrent garbage collection to finish while managed threads that have to make small object allocations are suspended.
Concurrent garbage collection has a slightly bigger working set (compared with non-concurrent garbage collection), because you can allocate objects during concurrent collection. However, this can affect performance, because the objects that you allocate become part of your working set. Essentially, concurrent garbage collection trades some CPU and memory for shorter pauses.

What is Background Garbage Collection?

* Background garbage collection is available only in the .NET Framework 4 and later versions. In the .NET Framework 4, it is supported only for workstation garbage collection. Starting with the .NET Framework 4.5, background garbage collection is available for both workstation and server garbage collection.

In background garbage collection, ephemeral generations (0 and 1) are collected as needed while the collection of generation 2 is in progress. There is no setting for background garbage collection; it is automatically enabled with concurrent garbage collection. Background garbage collection is a replacement for concurrent garbage collection. As with concurrent garbage collection, background garbage collection is performed on a dedicated thread and is applicable only to generation 2 collections.

A collection on ephemeral generations during background garbage collection is known as foreground garbage collection. When foreground garbage collections occur, all managed threads are suspended.
When background garbage collection is in progress and you have allocated enough objects in generation 0, the CLR performs a generation 0 or generation 1 foreground garbage collection. The dedicated background garbage collection thread checks at frequent safe points to determine whether there is a request for foreground garbage collection. If there is, the background collection suspends itself so that foreground garbage collection can occur. After the foreground garbage collection is completed, the dedicated background garbage collection thread and user threads resume.

Background garbage collection removes allocation restrictions imposed by concurrent garbage collection, because ephemeral garbage collections can occur during background garbage collection. This means that background garbage collection can remove dead objects in ephemeral generations and can also expand the heap if needed during a generation 1 garbage collection.

Background Server Garbage Collection

Starting with the .NET Framework 4.5, background server garbage collection is the default mode for server garbage collection. To choose this mode, set the enabled attribute of the <gcServer> element to true in the runtime configuration schema. This mode functions similarly to background workstation garbage collection, described in the previous section, but there are a few differences. Background workstation garbage collection uses one dedicated background garbage collection thread, whereas background server garbage collection uses multiple threads, typically a dedicated thread for each logical processor. Unlike the workstation background garbage collection thread, these threads do not time out.

What are Weak References?

The garbage collector cannot collect an object in use by an application while the application's code can reach that object. The application is said to have a strong reference to the object.
A weak reference permits the garbage collector to collect the object while still allowing the application to access the object. A weak reference is valid only during the indeterminate amount of time until the object is collected when no strong references exist. When you use a weak reference, the application can still obtain a strong reference to the object, which prevents it from being collected. However, there is always the risk that the garbage collector will get to the object first before a strong reference is re-established.
Weak references are useful for objects that use a lot of memory, but can be recreated easily if they are reclaimed by garbage collection.
Suppose a tree view in a Windows Forms application displays a complex hierarchical choice of options to the user. If the underlying data is large, keeping the tree in memory is inefficient when the user is involved with something else in the application.
When the user switches away to another part of the application, you can use the WeakReference class to create a weak reference to the tree and destroy all strong references. When the user switches back to the tree, the application attempts to obtain a strong reference to the tree and, if successful, avoids reconstructing the tree.
To establish a weak reference with an object, you create a WeakReference using the instance of the object to be tracked. You then set the Target property to that object and set the original reference to the object to null. For a code example, see WeakReference in the class library.

You can create a short weak reference or a long weak reference:
  • Short - The target of a short weak reference becomes null when the object is reclaimed by garbage collection. The weak reference is itself a managed object, and is subject to garbage collection just like any other managed object. A short weak reference is the default constructor for WeakReference.
  • Long - A long weak reference is retained after the object's Finalize method has been called. This allows the object to be recreated, but the state of the object remains unpredictable. To use a long reference, specify true in the WeakReference constructor. If the object's type does not have a Finalize method, the short weak reference functionality applies and the weak reference is valid only until the target is collected, which can occur any time after the finalizer is run.

To establish a strong reference and use the object again, cast the Target property of a WeakReference to the type of the object. If the Target property returns null, the object was collected; otherwise, you can continue to use the object because the application has regained a strong reference to it.

What is Latency in GC and what are the Latency Modes?

To reclaim objects, the garbage collector must stop all the executing threads in an application. In some situations, such as when an application retrieves data or displays content, a full garbage collection can occur at a critical time and impede performance. You can adjust the intrusiveness of the garbage collector by setting the GCSettingsLatencyMode property to one of the GCLatencyMode values.

Latency refers to the time that the garbage collector intrudes in your application. During low latency periods, the garbage collector is more conservative and less intrusive in reclaiming objects. The GCLatencyMode enumeration provides two low latency settings:
  • LowLatency suppresses generation 2 collections and performs only generation 0 and 1 collections. It can be used only for short periods of time. Over longer periods, if the system is under memory pressure, the garbage collector will trigger a collection, which can briefly pause the application and disrupt a time-critical operation. This setting is available only for workstation garbage collection.
  • SustainedLowLatency suppresses foreground generation 2 collections and performs only generation 0, 1, and background generation 2 collections. It can be used for longer periods of time, and is available for both workstation and server garbage collection. This setting cannot be used if concurrent garbage collection is disabled.
During low latency periods, generation 2 collections are suppressed unless the following occurs:
  • The system receives a low memory notification from the operating system.
  • Your application code induces a collection by calling the GCCollect method and specifying 2 for the generation parameter.
When you use LowLatency mode, consider the following guidelines:
  • Keep the period of time in low latency as short as possible.
  • Avoid allocating high amounts of memory during low latency periods. Low memory notifications can occur because garbage collection reclaims fewer objects.
  • While in the low latency mode, minimize the number of allocations you make, in particular allocations onto the Large Object Heap and pinned objects.
  • Be aware of threads that could be allocating. Because the LatencyMode property setting is process-wide, you could generate an OutOfMemoryException on any thread that may be allocating.
  • Wrap the low latency code in constrained execution regions (for more information, see Constrained Execution Regions).
  • You can force generation 2 collections during a low latency period by calling the GCCollect(Int32, GCCollectionMode) method.  

Using "Using"

Garbage Collection always impacts performance as you have seen that it suspends all other threads. We can’t do away with this in simple implementation but we can work out to increase performance of Garbage Collection. One main reason why Garbage Collection takes time is to make sure that it is not deleting any object which is in use. But at the same time Garbage Collector doesn’t make sure that it has 100 per cent efficiency. To avoid the load of Garbage Collector to identify which objects to clear one way is use of statement "using". When we use "using" statement after the scope of operation dispose method automatically get called. It basically sets the objects for Garbage Collection and hereby reduces the load the Garbage Collector. But "using" statement has some one condition, user need to implement Dispose method using IDisposable interface.

Now, what I am missing is how GC works in summarized flow, let’s refresh it again and conclude this article


Garbage collection in .NET is done using tracing collection and specifically the CLR implements the Mark/Compact collector. This method consists of two phases as described below.

Phase I: Mark
When the garbage collector starts running, it makes the assumption that all objects in the heap are garbage. In other words, it assumes that none of the application’s roots refer to any objects in the heap.
  1. The GC identifies live object references or application roots.
  2. It starts walking the roots and building a graph of all objects reachable from the roots.
  3. If the GC attempts to add an object already present in the graph, then it stops walking down that path. This serves two purposes. First, it helps performance significantly since it doesn’t walk through a set of objects more than once. Second, it prevents infinite loops should you have any circular linked lists of objects. Thus cycles are handles properly.
Once all the roots have been checked, the garbage collector’s graph contains the set of all objects that are somehow reachable from the application’s roots; any objects that are not in the graph are not accessible by the application, and are therefore considered garbage.

Phase II: Compact
Move all the live objects to the bottom of the heap, leaving free space at the top.
Phase II includes the following steps:
  1. The garbage collector now walks through the heap linearly, looking for contiguous blocks of garbage objects (now considered free space).
  2. The garbage collector then shifts the non-garbage objects down in memory, removing all of the gaps in the heap.
  3. Moving the objects in memory invalidates all pointers to the objects. So the garbage collector modifies the application’s roots so that the pointers point to the objects’ new locations.
  4. In addition, if any object contains a pointer to another object, the garbage collector is responsible for correcting these pointers as well.
After all the garbage has been identified, all the non-garbage has been compacted, and all the non-garbage pointers have been fixed-up, a pointer is positioned just after the last non-garbage object to indicate the position where the next object can be added.

Finalization
.Net Framework’s garbage collection implicitly keeps track of the lifetime of the objects that an application creates, but fails when it comes to the un managed resources (i.e. a file, a window or a network connection) that objects encapsulate.

The unmanaged resources must be explicitly released once the application has finished using them. .Net Framework provides the Object. Finalize method: a method that the garbage collector must run on the object to clean up its unmanaged resources, prior to reclaiming the memory used up by the object. Since Finalize method does nothing, by default, this method must be overridden if explicit clean-up is required. The potential existence of finalizer complicates the job of garbage collection in .Net by adding some extra steps before freeing an object.
Whenever a new object, having a Finalize method, is allocated on the heap a pointer to the object is placed in an internal data structure called Finalization queue. When an object is not reachable, the garbage collector considers the object garbage. The garbage collector scans the finalization queue looking for pointers to these objects. When a pointer is found, the pointer is removed from the finalization queue and appended to another internal data structure called F-reachable queue, making the object no longer a part of the garbage. At this point, the garbage collector has finished identifying garbage. The garbage collector compacts the reclaimable memory and the special runtime thread empties the reachable queue, executing each object’s Finalize method.
The next time the garbage collector is invoked, it sees that the finalized objects are truly garbage and the memory for those objects is then, simply freed.
Thus when an object requires finalization, it dies, then lives (resurrects) and finally dies again. It is recommended to avoid using Finalize method, unless required. Finalize methods increase memory pressure by not letting the memory and the resources used by that object to be released, until two garbage collections. Since you do not have control on the order in which the finalize methods are executed, it may lead to unpredictable results.

Garbage Collection Performance Optimizations
  • Weak references
  • Generations
When an object has a weak reference to it, it basically means that if there is a memory requirement and the garbage collector runs, the object can be collected and when the application later attempts to access the object, the access will fail. On the other hand, to access a weakly referenced object, the application must obtain a strong reference to the object. If the application obtains this strong reference before the garbage collector collects the object, then the GC cannot collect the object because a strong reference to the object exists.

The managed heap contains two internal data structures whose sole purpose is to manage weak references:
  1. Short weak reference table - the object which has a short weak reference to itself is collected immediately without running its finalization method.
  2. Long weak reference table - the garbage collector collects object pointed to by the long weak reference table only after determining that the object’s storage is reclaimable. If the object has a Finalize method, the Finalize method has been called and the object was not resurrected.
* These two tables simply contain pointers to objects allocated within the managed heap. Initially, both tables are empty. When you create a WeakReference object, an object is not allocated from the managed heap. Instead, an empty slot in one of the weak reference tables is located; short weak references use the short weak reference table and long weak references use the long weak reference table.

Generations
Since garbage collection cannot complete without stopping the entire program, they can cause arbitrarily long pauses at arbitrary times during the execution of the program. Garbage collection pauses can also prevent programs from responding to events quickly enough to satisfy the requirements of real-time systems.
One feature of the garbage collector that exists purely to improve performance is called generations. A generational garbage collector takes into account two facts that have been empirically observed in most programs in a variety of languages:

  1. Newly created objects tend to have short lives i.e. Gen 0 and 1.
  2. The older an object is, the longer it will survive Gen 2.
Thus, as objects “mature” (survive multiple garbage collections) in their current generation, they are moved to the next older generation. Generation 2 is the maximum generation supported by the runtime’s garbage collector. When future collections occur, any surviving objects currently in generation 2 simply stay in generation 2.

Thus, dividing the heap into generations of objects and collecting and compacting younger generation objects improves the efficiency of the basic underlying garbage collection algorithm by reclaiming a significant amount of space from the heap and also being faster than if the collector had examined the objects in all generations.

Must Read


Reference

http://msdn.microsoft.com/en-us/magazine/bb985010.aspx