RAK's box Flag: fi Suomeksi
2015-03-05

Duplicate-files.sh

Version 1.1

Duplicate-files is a linux (dash) shell script, that finds and list duplicate files in given directory trees. It is released under GPL ver 3.

Contents

How to install
Manual
Change log

How to install

Download duplicate-files.sh version 1.1. This archive has three files:

To use the script from a folder, for example ~/scripts, move the archive duplicate-files_1.1.tar.gz to the folder, change to that folder, extract the package, and make the script executable:

mv duplicate-files_1.1.tar.gz ~/scripts/.
cd ~/scripts
tar -zxvf duplicate-files_1.1.tar.gz
chmod +x duplicate-files.sh

Now you can test running the script:

./duplicate-files.sh -H

Manual

Description

Duplicate-files finds and lists duplicate files from one or several directory trees. The paths to the roots of the trees are given as arguments to duplicate-files.

Directory names and file names may include spaces; duplicate-files works correctly, if it runs through such names while operating. But a path string given as parameter to the script can not include spaces. Such would cause an error. Section Bugs below lists ways to get around.

Duplicate-files compares files of the same size to find sets of duplicates. A set is printed on one line as soon as it is found, the file size and the path/filenames separated by a space. The operation continues until all sets of duplicate files have been listed. It is possible to alter this operation with options (described below).

If the directory trees have zero\-length files (they may be numerous), they all are listed first: all zero\-length files are duplicates.

Usage

duplicate-files --help"
duplicate-files --manual"
duplicate-files --version"
duplicate-files [options] path [path [path...]]

Options

Most options have two alternative forms: either a dash with a single option letter, or two dashes with a long option name.

-h or --help
Print a help text and exit.
-H or --manual
Display the manual page, file duplicate-files.1.gz, and exit. Note, that the file must be stored in the same directory as the script duplicate-files.sh.
-p or --pause
Pause after printing each set of duplicate files. Continue, when the user presses enter.
-s or --no-size
Don't print the file size with the duplicate file names.
-v or --verbose
Print more details of what is being done.
--version
Print version information and exit.

Exit status

0
Duplicate files are found, or option --help, --manual, or --version was used.
1
No duplicate files were found.
2
There was an error.

Bugs

Duplicate-files stops to an error, if a path string including spaces is given as a parameter. According some limited testing, the script works correctly if during operation it runs through a directory or a path name which include spaces - as long as such a name is not given as a parameter. There are ways to get around the problem:

  1. Change to the directory, which name caused the problem. Then you can type "." as the path parameter - it doesn't include spaces now.
  2. Remove the spaces from the parameter string by starting one or several directory levels up.
  3. Remove the spaces by renaming the directory.

duplicate-files was written and tested with dash (Debian Almquist shell), which is the Debian default /bin/sh for the time being. If and how duplicate-files operates in other shells is untested.

If the path/filenames include spaces, it may cause confusion in the listing, as space is used to separate files in the lines.

An effort has been made to make duplicate-files to operate correctly with path/filenames having special characters, but, only a few tests with special characters have been done. Because of incomplete testing, it is very possible that duplicate-files operates incorrectly with some special characters in filenames or pathnames.

Examples

All of the following examples assume that duplicate-files.sh and duplicate-files.1.gz are stored in ~/scripts/.

~/scripts/duplicate-files.sh ~
Print all duplicate files starting from the home directory. The listing is ordered according the file size, which starts each line. The rest of each line lists the duplicate files of that size:
12345 path0/file0 path1/file1
654 path2/file2 path3/file3 path4/file4 path5/file5
654 path6/file6 path7/file7
9988772 path8/file8 path9/file9
~/scripts/duplicate-files.sh -ps . ../dir3
Print all duplicate files in two directory trees, the current directory and directory ../dir3. Do not print file sizes. Pause after each printed line.
~/scripts/duplicate-files.sh --pause --verbose dirA dirB dirC
Print detailed information of searching duplicate files in three directory trees, dirA, dirB and dirC. Pause after each printed set of duplicate files.
~/scripts/duplicate\-files.sh -H
Display the manual.
man -l ~/scripts/duplicate-files.1.gz
Another way to display the manual.

Copyright

Copyright (C) 2015 Risto A. Karola
License GPLv3: GNU GPL version 3. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

Change log

Version 1.1 2015-03-05 Risto Karola
Sometimes (rarely) duplicate-files failed to list a duplicate file. This was caused by command "sort", which did not always sort the lines as I expected. Added option "-n" to command "sort", which fixed the problem. As an extra bonus, now the duplicate files listing is ordered according the file size.
Version 1.0 2015-01-28 Risto Karola
The initial release.