October, 2016 - Damchey's Blog

For the times that I need to find duplicate files on my linux server and delete them, I go through the following procedure.

md5sum

This command calculates and outputs the md5 checksum of a file. If two files are the same, they will have the same hash.

To, get the md5sum of a file, simply do the following:


# md5sum example.php

# 312e9f7d1d6600989f0d1ac8c72f1de7 example.php

In the above, 312e9f7d1d6600989f0d1ac8c72f1de7 is the md5 hash of the example.php file.

Now, find all files with the exact md5 hash, and store their filenames in a file.


# find /home/ -type f -exec md5sum {} + | grep 312e9f7d1d6600989f0d1ac8c72f1de7 | awk '{ print $2 }' > duplicates.txt

With the above code, we are finding all files that have the md5 hash 312e9f7d1d6600989f0d1ac8c72f1de7, and outputting the second column (which is the filename) to a file called duplicates.txt

Now we loop through the duplicates.txt file, and delete each file one by one.


for f in $(cat duplicates.txt);
do rm -f $f;
done;

Month: October 2016

Find and delete duplicate files in Bash