I answered this question on LinkedIn:

Deduping a database from scratch. How would you go about it?

LinkedInOur customer database has quite a few duplicate records in it (both businesses and contacts). How would you go about the initial dedupe? What steps would you put in place to ensure records aren’t duplicated in the future? Is there any tech to help do this? Thanks!

My answer:

Each CRM has third party tools that help with the mechanics of this.

You should examine…

1) How many criteria do you believe are the most consistent ?
2) Do you want the computer to replace the newest record ?
3) What do you do with the “leftovers” ? For example if two people have different phone numbers, you replace one, how do record the “leftover” ?

How do you eliminate it completely ? hmmm…That’s a tough one.

As a start, you could use e-mail as a match. Of course, then what if a client offers his personal and not business address for one person, then, not the other one ? Company and contact name are good starts.

I believe you have to set a couple criteria that flag you if one of three criteria meet the duplicate standard; then, let the user decide