Wednesday, May 25, 2011

Remove duplicate rows in SQL Server


Sometime we face problem of duplicate records in SQL server database. In this scenario, we need to remove duplicate rows.

There are many ways available to remove duplicate rows from the database like
  • Using temporary table – In this you have to transfer data into temporary table which have identity column so you can identify record uniquely and remove duplicate rows. But it gives more loads on SQL if you have lots of data in your table.
  • Using adding Identity column into existing database

All above options gives very heavy load to SQL server. You can remove duplicate rows using following method:
You just have to create in-memory table which just create schema of your table and add new row number column. Using row number column, you can remove duplicate rows.

With [Temp_Table_Name] as
(
Select RN = row_number() over (PARTITION by col1, col2, col3 order by col1, col2, col3), * from [Table_Name]
)

delete from [Temp_Table_Name] where RN > 1

The above script add new column of row number in your table. This is just in-memory change. You can remove duplicate rows which actually remove from your table.

Here I used Row_Number function with “Order by” and “Partition by”.

No comments:

Post a Comment