Menu Sign In Contact FAQ
Banner
Welcome to our forums

Duplicate file finder

Can anyone recommend a good safe duplicate file finder please? After many years I think I should clean up some file storage storage systems containing many trips and old downloads from skydemon.

UK, United Kingdom

my quick recipe for that is:

find /path/to/directory -type f -print0 | xargs -0 md5sum | sort

I then eyeball the first column of the output, duplicates are usually rather obvious. If there are too many to eyeball it, then some sh/awk/perl should easily parse the above and print out only names of duplicates.

ELLX

Why not use -exec instead?

find /path/to/dir -type f -exec md5sum {} \; | sort
Andreas IOM

@Fenland_Flyer which Operating System you need it for?

EGTR

alioth wrote:

Why not use -exec instead?

The candid honest answer is habit, I have the pipe/xargs construction in my “automatic thought”.

The smart-ass answer is that it is a bit more efficient (less CPU time), since it will spawn less /usr/bin/md5sum processes. The “-exec” of find will call md5sum separately on each file, while the xargs construction will call md5sum on groups of files. To which one could answer that -exec supports \+ instead of \; but I’m not sure whether that construct is safe against OS maximum command line lengths (whether it will try to construct a command line that is bigger than the OS maximum), which xargs is.

ELLX

Funny how everybody assumes you are running unix

It’s like asking about sorting text strings and somebody posts an algorithm in COBOL 66.

Administrator
Shoreham EGKA, United Kingdom

It’s like asking about sorting text strings and somebody posts an algorithm in COBOL 66

Whereas assuming Windows is like posting an algorithm in Fortran II ?

LFMD, France

Peter wrote:

Fully how everybody assumes you are running unix

It works just as well on Linux. And on MacOS if you install md5sum.

ESKC (Uppsala/Sundbro), Sweden

Those are all unix based.

Administrator
Shoreham EGKA, United Kingdom

Peter wrote:

Those are all unix based.

MacOS (or Darwin, actually) is by a quite a long shot based on BSD Unix. Linux is a separate system although it has a high degree of compatibility with Unix. Anyway, if you include Linux and Darwin as being “Unix-based” then I don’t understand the comparison to COBOL 66.

ESKC (Uppsala/Sundbro), Sweden
19 Posts
Sign in to add your message

Back to Top