Finding Duplicate Documents in MongoDB
Recently I needed to create a new unique index on a MongoDB collection. However, there were some duplicate data… so I got the following error:
E11000 duplicate key error collection: db.collection index: index_name dup key: { key: "duplicate value" }
The error message lists the first duplicated value, but they may be more duplicates. Instead of fixing them one by one, I used the aggregation pipeline to find them all and take action to remove them.
You can connect to the database using the command line utility or whatever tool you prefer and use the following aggregation pipeline:
db.Demo.aggregate([
// Group by the key and compute the number of documents that match the key
{
$group: {
_id: "$Nickname", // or if you want to use multiple fields _id: { a: "$FirstName", b: "$LastName" }
count: { $sum: 1 }
}
},
// Filter group having more than 1 item, which means that at least 2 documents have the same key
{
$match: {
count: { $gt: 1 }
}
}
])
This command pipeline outputs the duplicated keys:
If you prefer something more visual, you can use MongoDB Compass and create the aggregation pipeline:
You can now remove the duplicate keys and add the unique index.
Do you have a question or a suggestion about this post? Contact me!