Data cleansing is the process of detecting and correcting or removing corrupt, incomplete, inaccurate or duplicate records from a database. Working closely with you we can create a unique solution to help cleanse your data.
- The actual process of data cleansing may involve removing spelling mistakes or validating and correcting values against a known list of issues. The validation may be strict (such as rejecting any address that does not have a valid postal code) or fuzzy (such as correcting records that partially match existing, known records).
- Some data cleansing solutions will clean data by cross checking with already validated data.
- Data cleansing may also involve activities like, harmonization of data, and standardization of data. For example, harmonization of short codes (St, Ltd etc.) to actual words (Street, Limited). Standardization of data means eliminating duplication and multiple data.
- Data cleansing differs from data validation in that validation means data is checked on entry and rejected if it does not comply. Once cleansing is completed proper validation is essential to ensure future cleansing is not required.
- CDE also offers data enhancement. Making data is more complete by adding related information, for example appending addresses with phone numbers related to that address.