Can anyone recommend a good safe duplicate file finder please? After many years I think I should clean up some file storage storage systems containing many trips and old downloads from skydemon.
my quick recipe for that is:
find /path/to/directory -type f -print0 | xargs -0 md5sum | sort
I then eyeball the first column of the output, duplicates are usually rather obvious. If there are too many to eyeball it, then some sh/awk/perl should easily parse the above and print out only names of duplicates.
Why not use -exec instead?
find /path/to/dir -type f -exec md5sum {} \; | sort
@Fenland_Flyer which Operating System you need it for?
alioth wrote:
Why not use -exec instead?
The candid honest answer is habit, I have the pipe/xargs construction in my “automatic thought”.
The smart-ass answer is that it is a bit more efficient (less CPU time), since it will spawn less /usr/bin/md5sum processes. The “-exec” of find will call md5sum separately on each file, while the xargs construction will call md5sum on groups of files. To which one could answer that -exec supports \+ instead of \; but I’m not sure whether that construct is safe against OS maximum command line lengths (whether it will try to construct a command line that is bigger than the OS maximum), which xargs is.
Funny how everybody assumes you are running unix
It’s like asking about sorting text strings and somebody posts an algorithm in COBOL 66.
It’s like asking about sorting text strings and somebody posts an algorithm in COBOL 66
Whereas assuming Windows is like posting an algorithm in Fortran II ?
Peter wrote:
Fully how everybody assumes you are running unix
It works just as well on Linux. And on MacOS if you install md5sum.
Those are all unix based.
Peter wrote:
Those are all unix based.
MacOS (or Darwin, actually) is by a quite a long shot based on BSD Unix. Linux is a separate system although it has a high degree of compatibility with Unix. Anyway, if you include Linux and Darwin as being “Unix-based” then I don’t understand the comparison to COBOL 66.