RAK:n loota Flag: uk In English
2015-03-05

Duplicate-files.sh

Versio 1.1

Duplicate-files on linux (dash) shell skripti, mikä löytää ja listaa tiedostokopiot annetuista hakemistoista. Se julkaistaan GPL ver 3 -lisenssillä.

Sisällys

Asennus
Manuaali (englanniksi)
Muutokset (englanniksi)

Asennus

Lataa duplicate-files.sh versio 1.1. Tämä pakkaus sisältää kolme tiedostoa:

Käyttääksesi skriptiä esimerkiksi ~/scripts -hakemistosta, siirrä pakkaus duplicate-files_1.1.tar.gz siihen hakemistoon, siirry siihen itsekin, avaa pakkaus ja tee skriptistä ajettava:

mv duplicate-files_1.1.tar.gz ~/scripts/.
cd ~/scripts
tar -zxvf duplicate-files_1.1.tar.gz
chmod +x duplicate-files.sh

Nyt voit testata skriptiä:

./duplicate-files.sh -H

Manuaali

Manuaali on vain englannin kielisenä:

Description

Duplicate-files finds and lists duplicate files from one or several directory trees. The paths to the roots of the trees are given as arguments to duplicate-files.

Directory names and file names may include spaces; duplicate-files works correctly, if it runs through such names while operating. But a path string given as parameter to the script can not include spaces. Such would cause an error. Section Bugs below lists ways to get around.

Duplicate-files compares files of the same size to find sets of duplicates. A set is printed on one line as soon as it is found, the file size and the path/filenames separated by a space. The operation continues until all sets of duplicate files have been listed.

For example:

12345 path0/file0 path1/file1
654 path2/file2 path3/file3 path4/file4 path5/file5
654 path6/file6 path7/file7
9988772 path8/file8 path9/file9

It is possible to alter this operation with options (described below).

If the directory trees have zero\-length files (they may be numerous), they all are listed first: all zero\-length files are duplicates.

Usage

duplicate-files --help"
duplicate-files --manual"
duplicate-files --version"
duplicate-files [options] path [path [path...]]

Options

Most options have two alternative forms: either a dash with a single option letter, or two dashes with a long option name.

-h or --help
Print a help text and exit.
-H or --manual
Display the manual page, file duplicate-files.1.gz, and exit. Note, that the file must be stored in the same directory as the script duplicate-files.sh.
-p or --pause
Pause after printing each set of duplicate files. Continue, when the user presses enter.
-s or --no-size
Don't print the file size with the duplicate file names.
-v or --verbose
Print more details of what is being done.
--version
Print version information and exit.

Exit status

0
Duplicate files are found, or option --help, --manual, or --version was used.
1
No duplicate files were found.
2
There was an error.

Bugs

Duplicate-files stops to an error, if a path string including spaces is given as a parameter. According some limited testing, the script works correctly if during operation it runs through a directory or a path name which include spaces - as long as such a name is not given as a parameter. There are ways to get around the problem:

  1. Change to the directory, which name caused the problem. Then you can type "." as the path parameter - it doesn't include spaces now.
  2. Remove the spaces from the parameter string by starting one or several directory levels up.
  3. Remove the spaces by renaming the directory.

duplicate-files was written and tested with dash (Debian Almquist shell), which is the Debian default /bin/sh for the time being. If and how duplicate-files operates in other shells is untested.

If the path/filenames include spaces, it may cause confusion in the listing, as space is used to separate files in the lines.

An effort has been made to make duplicate-files to operate correctly with path/filenames having special characters, but, only a few tests with special characters have been done. Because of incomplete testing, it is very possible that duplicate-files operates incorrectly with some special characters in filenames or pathnames.

Examples

All of the following examples assume that duplicate-files.sh and duplicate-files.1.gz are stored in ~/scripts/.

~/scripts/duplicate-files.sh ~
Print all duplicate files starting from the home directory.
~/scripts/duplicate-files.sh -ps . ../dir3
Print all duplicate files in two directory trees, the current directory and directory ../dir3. Do not print file sizes. Pause after each printed line.
~/scripts/duplicate-files.sh --pause --verbose dirA dirB dirC
Print detailed information of searching duplicate files in three directory trees, dirA, dirB and dirC. Pause after each printed set of duplicate files.
~/scripts/duplicate\-files.sh -H
Display the manual.
man -l ~/scripts/duplicate-files.1.gz
Another way to display the manual.

Copyright

Copyright (C) 2015 Risto A. Karola
License GPLv3: GNU GPL version 3. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

Muutokset

Version 1.1 2015-03-05 Risto Karola
Sometimes (rarely) duplicate-files failed to list a duplicate file. This was caused by command "sort", which did not always sort the lines as I expected. Added option "-n" to command "sort", which fixed the problem. As an extra bonus, now the duplicate files listing is ordered according the file size.
Version 1.0 2015-01-28 Risto Karola
The initial release.