Find and delete duplicate files in Bash

For the times that I need to find duplicate files on my linux server and delete them, I go through the following procedure.

md5sum

This command calculates and outputs the md5 checksum of a file. If two files are the same, they will have the same hash.

To, get the md5sum of a file, simply do the following:


# md5sum example.php

# 312e9f7d1d6600989f0d1ac8c72f1de7 example.php

In the above, 312e9f7d1d6600989f0d1ac8c72f1de7  is the md5 hash of the example.php file.

Now, find all files with the exact md5 hash, and store their filenames in a file.


# find /home/ -type f -exec md5sum {} + | grep 312e9f7d1d6600989f0d1ac8c72f1de7 | awk '{ print $2 }' > duplicates.txt

With the above code, we are finding all files that have the md5 hash 312e9f7d1d6600989f0d1ac8c72f1de7, and outputting the second column (which is the filename) to a file called duplicates.txt

Now we loop through the duplicates.txt file, and delete each file one by one.


for f in $(cat duplicates.txt);
do rm -f $f;
done;

2 thoughts on “Find and delete duplicate files in Bash”

  1. Sometimes we get an error when we try to delete a File or a folder for no reason , but of course there is a reason.We have many damage file or blocked files.Do not worry if we want to remove the error files or too long path files from our system,here I suggest a smooth way.So use “Long path tool” software and keep yourself.

Leave a Reply to brown clark Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.