Managing photos using CLI tools
By Andrius Miasnikovas
A little while ago I finally got around to bringing order to the mess called my photo library. They were scattered across multiple machines, disks and SD cards. First thing’s first - I moved all files to one place so it’s easier to work with. In retrospect, I’m glad I did this clean up on my nVME disk - this saved me quite a bit of time.
DISCLAIMER
Some of the tools discussed here can delete/overwrite files you didn't intend when
used incorrectly. If you don't feel comfortable with these tools, I suggest creating a
temporary folder with some sample files for practice. And only when you feel you got the
hang of it, move on to working on more sensitive files. And even then - backup is your best
friend.
Managing duplicates
Since I’m a responsible computer user, I do have “backups” well… they’re more
like copies of unsorted files on multiple disks that I no longer can keep track of. So the
first thing to address was the duplication issue once I got all the files onto a single
disk. There’s a great CLI tool called fdupes
you can install using your package manager:
sudo pacman -S fdupes # Arch Linux
sudo apt install fdupes # Ubuntu
This tool can scan your files, find duplicates, report the disk space you’re wasting due to those duplicates and can even remove them for you. And it doesn’t just compare dates or filenames, but rather if it sees files of the same size no matter the name, it calculates MD5 hashes and compares those effectively doing a byte-by-byte comparison. By doing this, it’s much more reliable in identifying real duplicates.
Basic usage is pretty simple, you just give it a directory to work on and pass in a few params that tells it to check directories recursively and summarize the results
fdupes -mr photos/
this will get you an idea of what kind of disk space savings you’re looking at
11 duplicate files (in 7 sets), occupying 63.2 kilobytes
If you’d rather see the actual duplicate files do this
fdupes -rS photos/
you’ll know which files have duplicates and where this might be a very long list though
10402 bytes each:
./.vim/colors/desert.vim
./.vim/colors/desert2.vim
./.vim/colors/desert3.vim
29 bytes each:
./.vim/UltiSnips/notes.snippets
./.vim/UltiSnips/text.snippets
A couple of useful parameters are -n
and -A
which exclude zero-length and hidden
files respectively. Now, let’s say you’re ready to delete those duplicates, simply add
a -d
parameter like so
fdupes -rd photos/
this tool will stop and prompt you which file you want to preserve before deleting anything. It can be all you need if you don’t have too many duplicates and would like a fine-grained control over what you remove. Remember that if there are a few duplicates of the same file you will be asked only once to choose the file that will be spared.
Now, this is all fine and good, but in my case, there were just too many duplicates for
me to sit around and mash Enter, so I added another parameter -N
. Be CAREFUL with this one
and remember to make backups, when working with data you can’t afford to lose.
fdupes -rdN photos/
This will not ask you anything. It will scan the directories, find duplicates, choose the first file to preserve for each duplicate group and delete the others.
Dealing with iFiles
Some of the photos I have were not in the good old JPEG
format, but rather .HEIC
files
from an iPhone. I wanted to convert those to JPEGs because the support of HEIC
is still rather
limited and the format itself doesn’t seem to get a lot of traction, compare to WebP
format by Google. Both of these formats want to be the replacement for how we store photos.
They boast better compression algorithms than JPEG
. While some of the claims are true,
in practice neither format is widespread and is still lacking support in common imaging
software. Google is pushing its WebP
standard in they’re browser, Android OS, etc, but HEIC
format seems to be only found on iDevices (as far as I know). So, I searched around and
found something called libheif
- it’s technically just a library, but it has so-called
example programs like heif-convert
which can convert this format to JPEG
. So, let’s
install it!
sudo pacman -S libheif # Arch Linux
sudo apt install libheif-examples # Ubuntu
If you’re using another distro try searching for heif
related packages or you could
try to build it from the source - https://github.com/strukturag/libheif
Once you have the tools, the usage is very straight-forward just specify the input file
and the output file. You can specify the JPEG
output quality by specifying a parameter
-q XX
where XX
is a number between 0 and 100. I would recommend using 90
as it seems to be the perfect balance between perceivable quality and file size. There
is one caveat - this tool accepts only one file at a time. So we can’t specify in the
parameters multiple files to be converted like *.HEIC
, but that’s easily fixable with
just a few lines of shell script:
for fname in *.HEIC
do
heif-convert -q 90 "$fname" "$fname.jpg"
done
I’m making an assumption here that the current working directory contains the HEIC
files.
Unifying file extensions
Like I mentioned, I gathered my files from different machines. Because files are named
using a counter (which can be reset) on cameras I was facing an interesting issue. Some
files had the same filename, but contained completely different photos. And some files had
uppercase extensions, while other had lowercase ones. Even if I wouldn’t find this annoying
it posed a risk of potential data loss if copied over to a Windows file system. So I
decided to stick to lowercase extensions and rename all the offending files. That’s were
another great tool comes into play, and you don’t even need to install it as it’s part
of the core Linux utils. Its simplicity can be seend even in its name - rename
. In
typical UNIX fashion, it’s a very quiet tool so using the -v
parameter is suggested to make
it more verbose and get a better feel of what it’s about to do. Besides the file you’re
renaming, it takes an expression or a part of the filename that you want to change and the
replacement string. If you’re not sure you got the pattern and replacement string right,
you could add option -n
which will prevent the renaming from taking place. In combination
with the verbose option, this allows you to safely test your replacement patterns before
removing the -n
option and performing the action.
rename -vn JPG jpg *.JPG
As you can see it’s easy to rename things in the current directory by feeding it multiple
files like the example above. But the tool itself does not support recursion, so for my
use-case of multiple, poorly structured directories, The solution - find all the offending
files and apply the rename
tool for each case.
find . -name '*.JPG' -exec rename -vo '.JPG' '_1.JPG' {} \;
find . -name '*.JPG' -exec rename -vo 'JPG' 'jpg' {} \;
Note that I use the case-sensitive parameter -name
for finding the files with uppercase
extensions. I also added the -o
parameter to rename
which means it will not overwrite
any files if the resulting filename already exists in that directory, it will simply skip
it (safety first!). The reason why I do two passes for each file is to first rename them
to something that hopefully would not clash with other files in the same directory.
Sorting photos
Now that all the cleanup is done, I was finally able to get to the actual sorting of files and putting them in appropriate directories. The idea was simple enough - have all the photos in directories according to the year they were taken in, and another layer of directories specifying the month they were taken in (I prefer numeric values of the months). So the final result should look something like the one below.
$ tree -L 2
.
├── 2008
│ ├── 02
│ ├── 03
│ ├── 05
│ ├── 06
│ └── 12
├── 2009
│ ├── 04
│ ├── 05
│ ├── 06
│ ├── 07
...
The first thing I need is to be able to extract the EXIF metadata from the files which
contain the date when the photo was taken. I found that exiv2
is a nice little
tool that does exactly what I need.
sudo pacman -S exiv2 # Arch Linux
Its output is very conveniently laid out for grepping and further processing.
$ exiv2 IMG_20190101_141739.jpg
File name : IMG_20190103_151833.jpg
File size : 2921643 Bytes
MIME type : image/jpeg
Image size : 4048 x 3036
Camera make : Google
Camera model : Pixel
Image timestamp : 2019:01:01 14:17:39
...
Now I had to create a directory structure and place each file in the correct directory.
I didn’t want to do two passes on a large number of files, so I wrote a script that
generated a new script for those operations, but I executed them separately as you’ll see
later on. The script exifdata.sh
looked like this.
#!/usr/bin/env sh
set -euo pipefail
BASE="../sorted"
T=`exiv2 "$1" 2>/dev/null | tr '\0' '\n' | grep 'Image timestamp' | cut -d ' ' -f 4`
if [ -n "$T" ]
then
YEAR=`echo "$T" | cut -d ':' -f 1`
MONTH=`echo "$T" | cut -d ':' -f 2`
if [ \( -n "$YEAR" \) -a \( -n "$MONTH" \) ]
then
echo "mkdir -p $BASE/$YEAR/$MONTH"
echo "mv -i \"$1\" $BASE/$YEAR/$MONTH"
else
echo "Could not parse date for $1 : $T" > /dev/stderr
fi
else
echo "Could not extract info for $1" > /dev/stderr
fi
Then I applied this script for each file.
find . -iname '*.jpg' -exec ./exifdata.sh "{}" \; > complete.sh
This created a large file which contained directory creation and file moving commands.
But obviously, a lot of directory creation commands are duplicates, because a bunch of
photos are taken in the same year and same month. So instead of executing this whole
script, I extracted only mkdir
commands sorted and removed duplicates like so:
grep mkdir complete.sh | sort | uniq > dirs.sh
grep -v mkdir complete.sh > move.sh
The final step is to review and execute these two files. First the dirs.sh
script
creates the required directory structure in a directory that is one level above where all
the files reside and the script is being run. The second one performs the move operation
on the files. For file moving step I included the -n
parameter which prevents file
overwrite. And that’s it! I now have a proper structure for all my photos and no duplicates.
Throughout all this process I tried to use parameters that should help with keeping the
data intact, not overwrite anything and only remove things that are 100% duplicate. But as
with all things on a computer - it will do what you tell it, not what you want. So I
cannot overstress the importance of having a backup if you’re applying these steps to
files that are important to you.
That’s how I did it, there are of course other ways of achieving the same result or maybe do some of those steps more efficiently. Leave a comment if you want to share such ideas with others.