Thursday, May 31, 2012

DataTable.Select() Vs DataView.RowFilter() Performance Test – Filtering Rows


In this post DataTable.Select() Vs DataView.RowFilter() Performance Test – Filtering Rows we, shall compare the performance of DataTable.Select() and DataView.RowFilter() in populating a GridView.


Asp.Net provides a number of ways to fill a GridView, like DataReader, DataTable, DataView etc, each one has its own advantages and disadvantages, we shall now see the difference in the execution time between DataTable.Select() and DataView.RowFilter()

The configuration of the system used to perform the evaluation is as follows.

OS
Windows XP Professional 2002 SP3
Processor
Pentium® D 2.66 GHz
RAM
3 GB




The performance test was carried out for 50iterations with both DataTable and DataView; the results of the test are as follows.

Total Records in Table: 3000
Matching Records: 643
DataTable.Select() Average time:       50.26 Milliseconds
DataView.RowFilter() Average time:   43.88  Milliseconds





Inference
DataView.RowFilter() performs better than DataTable.Select() when the DataView is sorted based on the column to be filtered, this is because when the DataView is sorted, an index is created based on the sorted column, hence it is advisable to SORT a dataview and use the same for filtering especially when we want to perform multiple filters based on the same column.

The code used to perform the test is as follows.



protected void Page_Load(object sender, EventArgs ea)
{
    string strConn = ConfigurationManager.ConnectionStrings["EmployeesConnectionString"].ToString();
    objConn = new SqlConnection(strConn);
    strQuery = "SELECT * FROM Employee";
    objCmd = new SqlCommand(strQuery, objConn);

    objDA = new SqlDataAdapter(objCmd);
    dsEmployee = new DataSet();
    objDA.Fill(dsEmployee, "dtEmployee");
    //
    DataColumn[] arrPrimaryKey = new DataColumn[1];
    arrPrimaryKey[0] = dsEmployee.Tables["dtEmployee"].Columns["ID"];
    dsEmployee.Tables["dtEmployee"].PrimaryKey = arrPrimaryKey;
    //
    dvEmployee = new DataView(dsEmployee.Tables["dtEmployee"]);
    dvEmployee.Sort = "Name";
    //
    Response.Write("Test Block 1
"
);
    for (int i = 0; i < 50; i++)
    {
        TestBlock1();
    }
    //
    Response.Write("

Test Block 2
"
);
    for (int i = 0; i < 50; i++)
    {
        TestBlock2();
    } 
}
//
private void TestBlock1()
{
    timer = new Stopwatch();
    timer.Start();
    //
    dvEmployee.RowFilter = "Name='John'";
    grdEmployees.DataSource = dvEmployee;
    grdEmployees.DataBind();
    //
    timer.Stop();
    //
    Response.Write(timer.ElapsedMilliseconds.ToString() + "
"
);
}
//
private void TestBlock2()
{
    timer = new Stopwatch();
    timer.Start();
    //
    DataRow[] drEmp = dsEmployee.Tables["dtEmployee"].Select("Name='John'");
    //
    grdEmployees.DataSource = drEmp.CopyToDataTable();
    grdEmployees.DataBind();
    timer.Stop();
    //
    Response.Write(timer.ElapsedMilliseconds.ToString() + "
"
);
}


That’s it we have seen the performance difference between the DataTable.Select() and the DataView.RowFilter() in loading a GridView.

Related Posts


Search Flipkart Products:
Flipkart.com

7 comments:

Thomas said...

You indeed point to a common problem - DataTable.Select(), esp. when invoked within a loop, can really screw performance on large DataTables. However, I think you are also making some assumptions which are not quite correct:

- Judging from the chart, I don't see how DataView.RowFilter() is significantly faster than DataTable.Select(). With different runtime complexities, the chart lines should differ a lot more.
- Measuring execution time of something like 50ms with StopWatch can be error-prone, as its minimum resolution typically is around 15ms. Why not just do thousands of lookups on a DataTable containing 100,000 rows, and measure the overall time? This will also show how performance develops with larger amounts of data.
- Why include CopyToDataTable() / DataBind() within the StopWatch scope? This will distort results.
- The real benefit of indexed lookup comes when MULTIPLE lookups happen on ONE existing index. Actually, building up an index and looking up only one value necessarily is slower than just scanning the whole table once for the value.
- As your filter-criteria value most likely is going to change, instead of applying DataView.RowFilter, you should probably use DataView.Sort (invoked ONCE to create the index) in combination with DataView.FindRows() (invoked N-times on the same DataView to lookup different index values). This article explains it in greater detail. Or, if you prefer HashMaps to TreeMaps, create your own Dictionary for fast lookup.

HTH!

Kind regards!

Thomas said...

I should correct one of the above statements, the .NET StopWatch class actually does support high-resolution timers depending on the underlying operating system. High-resolution data can be calculated by invoking StopWatch.ElapsedTicks / StopWatch.Frequency.

Prakash B said...

Hi Thomas,

Thanks for your Comments.

In this case, there is no significant difference between DataView.RowFilter() and DataTable.Select()

The DataView.RowFilter() comes in handly when you want to perform multiple searched based on the same column, The index is created initially and will be re-used for the successive searches.

Thomas said...

Hello Prakash,

thanks for your reply, interesting! Are you sure the index is re-used when the RowFilter contains the same columns? This seems technically possible, but the MSDN documentation kind of indicates that this might not the case (although it's not 100% clear):

"If you want to return the results of a particular query on the data, as opposed to providing a dynamic view of a subset of the data, use the Find or FindRows methods of the DataView to achieve best performance rather than setting the RowFilter property. Setting the RowFilter property rebuilds the index for the data, adding overhead to your application and decreasing performance. The RowFilter property is best used in a data-bound application where a bound control displays filtered results."

From: http://msdn.microsoft.com/en-us/library/13wb36xf(v=vs.100).aspx

Cheers!

Prakash B said...

Hi Thomas,

The MSDN link says

"The Find and FindRows methods leverage the current index without requiring the index to be rebuilt."

This holds good, when we want to filter data based on the existing index, in this case the Find/FindRows will have a better performance, but if we want to search the table based on some other column which does not have an index, then better to use the RowFilter, since it will create the index initially and later re-use the index for the successive searches, therby improving Performance.

Once you set the RowFilter for the first time, the index is created, then use the same View without resetting the RowFilter.

Thanks
B. Prakash.

priya said...

Excellent blog, I wish to share your post with my folks circle. It’s really helped me a lot, so keep sharing post like this
Selenium training in Chennai
Selenium training in Bangalore
Selenium training in Pune
Selenium Online training

Softgen Infotech said...

Wonderful thanks for sharing an amazing idea. keep it...

Looking for Data Warehousing Training in Bangalore, learn from Softgen Infotech provide Data Warehousing Training on online training and classroom training. Join today!