Duplicate-files is a linux (dash) shell script, that finds and list
duplicate files in given directory trees. It is released under GPL ver 3.
How to install
How to install
version 1.1. This archive has three files:
- duplicate-files.sh - the script
- duplicate-files.1.gz - the manual file, keep it in the same folder
- COPYING.txt - the GPL license file
To use the script from a folder, for example ~/scripts, move the archive
the folder, change to that folder,
extract the package, and make the script executable:
mv duplicate-files_1.1.tar.gz ~/scripts/.
tar -zxvf duplicate-files_1.1.tar.gz
chmod +x duplicate-files.sh
Now you can test running the script:
Duplicate-files finds and lists duplicate files from one or several
directory trees. The paths to the roots of the trees are given as arguments
Directory names and file names may include spaces; duplicate-files works
correctly, if it runs through such names while operating.
But a path string given as parameter to the script can not include spaces.
Such would cause an error. Section Bugs below lists ways to get around.
Duplicate-files compares files of the same size to find sets of duplicates.
A set is printed on one line as soon as it is found,
the file size and the path/filenames separated by a space.
The operation continues until all sets of duplicate files have been listed.
It is possible to alter this operation with options (described below).
If the directory trees have zero\-length files (they may be numerous), they
all are listed first: all zero\-length files are duplicates.
duplicate-files [options] path [path [path...]]
Most options have two alternative forms: either a dash with a single option
letter, or two dashes with a long option name.
- -h or --help
- Print a help text and exit.
- -H or --manual
- Display the manual page, file duplicate-files.1.gz, and exit.
Note, that the file must be stored in the same directory as the script
- -p or --pause
- Pause after printing each set of duplicate files. Continue, when
the user presses enter.
- -s or --no-size
- Don't print the file size with the duplicate file names.
- -v or --verbose
- Print more details of what is being done.
- Print version information and exit.
- Duplicate files are found, or option --help, --manual, or --version
- No duplicate files were found.
- There was an error.
Duplicate-files stops to an error, if a path string including spaces is given
as a parameter. According some limited testing, the script works correctly
if during operation it runs through a directory or a path name which include
spaces - as long as such a name is not given as a parameter.
There are ways to get around the problem:
Change to the directory, which name caused the problem.
Then you can type "." as the path parameter - it doesn't include spaces now.
Remove the spaces from the parameter string by starting one or several
directory levels up.
Remove the spaces by renaming the directory.
duplicate-files was written and tested with dash (Debian Almquist shell),
which is the Debian default /bin/sh for the time being.
If and how duplicate-files operates in other shells is untested.
If the path/filenames include spaces, it may cause confusion in the listing,
as space is used to separate files in the lines.
An effort has been made to make duplicate-files to operate correctly with
path/filenames having special characters, but, only a few tests with special
characters have been done.
Because of incomplete testing, it is very possible that duplicate-files
operates incorrectly with some special characters in filenames or pathnames.
All of the following examples assume that duplicate-files.sh and
duplicate-files.1.gz are stored in ~/scripts/.
- ~/scripts/duplicate-files.sh ~
Print all duplicate files starting from the home directory.
The listing is ordered according the file size, which starts each line.
The rest of each line lists the duplicate files of that size:
12345 path0/file0 path1/file1
654 path2/file2 path3/file3 path4/file4 path5/file5
654 path6/file6 path7/file7
9988772 path8/file8 path9/file9
- ~/scripts/duplicate-files.sh -ps . ../dir3
- Print all duplicate files in two directory trees, the current
directory and directory ../dir3. Do not print file sizes.
Pause after each printed line.
- ~/scripts/duplicate-files.sh --pause --verbose dirA dirB dirC
- Print detailed information of searching duplicate files in three
directory trees, dirA, dirB and dirC. Pause after each printed set of
- ~/scripts/duplicate\-files.sh -H
- Display the manual.
- man -l ~/scripts/duplicate-files.1.gz
- Another way to display the manual.
Copyright (C) 2015 Risto A. Karola
License GPLv3: GNU GPL version 3.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
2015-03-05 Risto Karola
Sometimes (rarely) duplicate-files failed to list a duplicate file.
This was caused by command "sort", which did not always sort the lines
as I expected.
Added option "-n" to command "sort", which fixed the problem. As an extra
bonus, now the duplicate files listing is ordered according the file size.
2015-01-28 Risto Karola
- The initial release.