top of page
  • Writer's pictureDavid Sage

Deduplication: Streamlining Databases and Legal Discovery


Deduplication is the process of identifying and removing duplicate records or data from a database. It's a crucial step in maintaining data integrity and optimizing storage capacity. In legal discovery, deduplication is used to streamline the review process and reduce costs. Let's take a closer look at how deduplication works and its benefits.

Deduplication in Databases

In databases, deduplication is used to prevent duplicate data from being stored, which can lead to data inconsistencies, errors, and wasted storage space. Deduplication is achieved by comparing the data and eliminating identical records. There are two main types of deduplication:

Inline deduplication: this type of deduplication occurs in real-time, as data is being added to the database. The system checks if the data already exists in the database and removes it if it's a duplicate.

Offline deduplication: this type of deduplication occurs periodically, on a schedule, or as needed. The system scans the database for duplicates and removes them.

Deduplication can significantly reduce storage costs, improve database performance, and enhance data quality.

Deduplication in Legal Discovery

In legal discovery, deduplication is a crucial step in the review process. During legal discovery, multiple copies of the same document or email can be collected from various sources, such as custodians, backups, or archives. Reviewing each copy individually can be time-consuming and costly.

Deduplication in legal discovery involves identifying and removing duplicate copies of documents or emails. This process reduces the number of documents that need to be reviewed and allows legal teams to focus on relevant documents. Deduplication also helps to identify privileged or confidential documents and ensures they're not produced to the opposing party.

Deduplication can save time and reduce costs during legal discovery. However, it's important to use reliable and defensible deduplication methods to ensure accuracy and completeness.


Deduplication is an essential process in databases and legal discovery. It can help optimize storage capacity, enhance data quality, and streamline the review process. By understanding how deduplication works and its benefits, organizations can improve their data management practices and reduce costs.

The concept of deduplication can be extended to include removal of files or data points that are functionally identical, but may appear in technically different formats:.

A batch of discovery documents may include several digital photographs whose contents and composition are identical, but are stored in multiple file formats. For example, newer Apple devices store images by default in the ‘HEIC’ format. This is a relatively recent type of image compression that generally takes up less space than an equivalent ‘JPG’ image. During retrieval and processing of these HEIC images, a duplication conversion to JPG may occur, resulting in two separate files whose contents are actually the same. If the images are particularly large and/or numerous, deduplicating in this manner


bottom of page