Welcome to our forums

Forum → IT / Website

Duplicate file finder

19 Posts

13-Dec-22 19:25

Can anyone recommend a good safe duplicate file finder please? After many years I think I should clean up some file storage storage systems containing many trips and old downloads from skydemon.

UK, United Kingdom

13-Dec-22 19:40

my quick recipe for that is:

find /path/to/directory -type f -print0 | xargs -0 md5sum | sort

I then eyeball the first column of the output, duplicates are usually rather obvious. If there are too many to eyeball it, then some sh/awk/perl should easily parse the above and print out only names of duplicates.

ELLX

14-Dec-22 14:04

Why not use -exec instead?

find /path/to/dir -type f -exec md5sum {} \; | sort

Andreas IOM

14-Dec-22 14:27

@Fenland_Flyer which Operating System you need it for?

EGTR

14-Dec-22 14:44

alioth wrote:

Why not use -exec instead?

The candid honest answer is habit, I have the pipe/xargs construction in my “automatic thought”.

The smart-ass answer is that it is a bit more efficient (less CPU time), since it will spawn less /usr/bin/md5sum processes. The “-exec” of find will call md5sum separately on each file, while the xargs construction will call md5sum on groups of files. To which one could answer that -exec supports \+ instead of \; but I’m not sure whether that construct is safe against OS maximum command line lengths (whether it will try to construct a command line that is bigger than the OS maximum), which xargs is.

ELLX

14-Dec-22 15:09

Funny how everybody assumes you are running unix

It’s like asking about sorting text strings and somebody posts an algorithm in COBOL 66.

Administrator

Shoreham EGKA, United Kingdom

14-Dec-22 15:21

It’s like asking about sorting text strings and somebody posts an algorithm in COBOL 66

Whereas assuming Windows is like posting an algorithm in Fortran II ?

LFMD, France

14-Dec-22 15:27

Peter wrote:

Fully how everybody assumes you are running unix

It works just as well on Linux. And on MacOS if you install md5sum.

ESKC (Uppsala/Sundbro), Sweden

14-Dec-22 15:33

Those are all unix based.

Administrator

Shoreham EGKA, United Kingdom

14-Dec-22 15:47

Peter wrote:

Those are all unix based.

MacOS (or Darwin, actually) is by a quite a long shot based on BSD Unix. Linux is a separate system although it has a high degree of compatibility with Unix. Anyway, if you include Linux and Darwin as being “Unix-based” then I don’t understand the comparison to COBOL 66.

ESKC (Uppsala/Sundbro), Sweden

19 Posts

Threads possibly related to this one

For those who fly with just an Ipad :) (shutdowns at high or low temperatures, and GPS losses)

Dropping in images during editing

Free website which obtains Eurocontrol "proposed" route?

Which countries don't require flight plans for VFR border crossing, and in which scenarios?