Remove duplicate documents from Mongodb by keys

Background

To ensure there is no duplicate documents, we could create an unique index.
However, it will throw error when there are duplicates entries in the collection:

MongoDB cannot create a unique index on the specified index field(s) if the collection already contains data that would violate the unique constraint for the index.

Remove the duplicates by keys

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
const collection = db.collection('MyCollection');
const operations: Promise<any>[] = [];
console.time('aggregation');

await collection
.aggregate([
{
$group: {
_id: {
key1: '$key1',
key2: '$key2',
},
dups: {
$push: '$_id',
},
count: {
$sum: 1,
},
},
},
{
$match: {
_id: {
$ne: null,
},
count: {
$gt: 1,
},
},
},
])
.forEach((doc) => {
console.log(doc);
doc.dups.slice(1).forEach((duplicateId: string) => {
operations.push(collection.deleteOne({ _id: duplicateId }));
});
});

console.timeEnd('aggregation');

console.time('remove duplicate');
await Promise.all(operations);
console.timeEnd('remove duplicate');

Create the unique index

1
await collection.createIndex({ key1: 1, key2: 1 }, { unique: true, name: 'unique_index' });