Managing photos using CLI tools

By Andrius Miasnikovas

March 11, 2020

A little while ago I finally got around to bringing order to the mess called my photo library. They were scattered across multiple machines, disks and SD cards. First thing’s first - I moved all files to one place so it’s easier to work with. In retrospect, I’m glad I did this clean up on my nVME disk - this saved me quite a bit of time.

DISCLAIMER

Some of the tools discussed here can delete/overwrite files you didn't intend when
used incorrectly. If you don't feel comfortable with these tools, I suggest creating a
temporary folder with some sample files for practice. And only when you feel you got the
hang of it, move on to working on more sensitive files. And even then - backup is your best
friend.

Managing duplicates

Since I’m a responsible computer user, I do have “backups” well… they’re more like copies of unsorted files on multiple disks that I no longer can keep track of. So the first thing to address was the duplication issue once I got all the files onto a single disk. There’s a great CLI tool called fdupes you can install using your package manager:

sudo pacman -S fdupes # Arch Linux

sudo apt install fdupes # Ubuntu

This tool can scan your files, find duplicates, report the disk space you’re wasting due to those duplicates and can even remove them for you. And it doesn’t just compare dates or filenames, but rather if it sees files of the same size no matter the name, it calculates MD5 hashes and compares those effectively doing a byte-by-byte comparison. By doing this, it’s much more reliable in identifying real duplicates.

Basic usage is pretty simple, you just give it a directory to work on and pass in a few params that tells it to check directories recursively and summarize the results

fdupes -mr photos/

this will get you an idea of what kind of disk space savings you’re looking at

11 duplicate files (in 7 sets), occupying 63.2 kilobytes

If you’d rather see the actual duplicate files do this

fdupes -rS photos/

you’ll know which files have duplicates and where this might be a very long list though

10402 bytes each:
./.vim/colors/desert.vim
./.vim/colors/desert2.vim
./.vim/colors/desert3.vim

29 bytes each:
./.vim/UltiSnips/notes.snippets
./.vim/UltiSnips/text.snippets

A couple of useful parameters are -n and -A which exclude zero-length and hidden files respectively. Now, let’s say you’re ready to delete those duplicates, simply add a -d parameter like so

fdupes -rd photos/

this tool will stop and prompt you which file you want to preserve before deleting anything. It can be all you need if you don’t have too many duplicates and would like a fine-grained control over what you remove. Remember that if there are a few duplicates of the same file you will be asked only once to choose the file that will be spared.

Now, this is all fine and good, but in my case, there were just too many duplicates for me to sit around and mash Enter, so I added another parameter -N. Be CAREFUL with this one and remember to make backups, when working with data you can’t afford to lose.

fdupes -rdN photos/

This will not ask you anything. It will scan the directories, find duplicates, choose the first file to preserve for each duplicate group and delete the others.

Dealing with iFiles

Some of the photos I have were not in the good old JPEG format, but rather .HEIC files from an iPhone. I wanted to convert those to JPEGs because the support of HEIC is still rather limited and the format itself doesn’t seem to get a lot of traction, compare to WebP format by Google. Both of these formats want to be the replacement for how we store photos. They boast better compression algorithms than JPEG. While some of the claims are true, in practice neither format is widespread and is still lacking support in common imaging software. Google is pushing its WebP standard in they’re browser, Android OS, etc, but HEIC format seems to be only found on iDevices (as far as I know). So, I searched around and found something called libheif - it’s technically just a library, but it has so-called example programs like heif-convert which can convert this format to JPEG. So, let’s install it!

sudo pacman -S libheif # Arch Linux

sudo apt install libheif-examples # Ubuntu

If you’re using another distro try searching for heif related packages or you could try to build it from the source - https://github.com/strukturag/libheif

Once you have the tools, the usage is very straight-forward just specify the input file and the output file. You can specify the JPEG output quality by specifying a parameter -q XX where XX is a number between 0 and 100. I would recommend using 90 as it seems to be the perfect balance between perceivable quality and file size. There is one caveat - this tool accepts only one file at a time. So we can’t specify in the parameters multiple files to be converted like *.HEIC, but that’s easily fixable with just a few lines of shell script:

for fname in *.HEIC
do
  heif-convert -q 90 "$fname" "$fname.jpg"
done

I’m making an assumption here that the current working directory contains the HEIC files.

Unifying file extensions

Like I mentioned, I gathered my files from different machines. Because files are named using a counter (which can be reset) on cameras I was facing an interesting issue. Some files had the same filename, but contained completely different photos. And some files had uppercase extensions, while other had lowercase ones. Even if I wouldn’t find this annoying it posed a risk of potential data loss if copied over to a Windows file system. So I decided to stick to lowercase extensions and rename all the offending files. That’s were another great tool comes into play, and you don’t even need to install it as it’s part of the core Linux utils. Its simplicity can be seend even in its name - rename. In typical UNIX fashion, it’s a very quiet tool so using the -v parameter is suggested to make it more verbose and get a better feel of what it’s about to do. Besides the file you’re renaming, it takes an expression or a part of the filename that you want to change and the replacement string. If you’re not sure you got the pattern and replacement string right, you could add option -n which will prevent the renaming from taking place. In combination with the verbose option, this allows you to safely test your replacement patterns before removing the -n option and performing the action.

rename -vn JPG jpg *.JPG

As you can see it’s easy to rename things in the current directory by feeding it multiple files like the example above. But the tool itself does not support recursion, so for my use-case of multiple, poorly structured directories, The solution - find all the offending files and apply the rename tool for each case.

find . -name '*.JPG' -exec rename -vo '.JPG' '_1.JPG' {} \;
find . -name '*.JPG' -exec rename -vo 'JPG' 'jpg' {} \;

Note that I use the case-sensitive parameter -name for finding the files with uppercase extensions. I also added the -o parameter to rename which means it will not overwrite any files if the resulting filename already exists in that directory, it will simply skip it (safety first!). The reason why I do two passes for each file is to first rename them to something that hopefully would not clash with other files in the same directory.

Sorting photos

Now that all the cleanup is done, I was finally able to get to the actual sorting of files and putting them in appropriate directories. The idea was simple enough - have all the photos in directories according to the year they were taken in, and another layer of directories specifying the month they were taken in (I prefer numeric values of the months). So the final result should look something like the one below.

$ tree -L 2

.
├── 2008
│   ├── 02
│   ├── 03
│   ├── 05
│   ├── 06
│   └── 12
├── 2009
│   ├── 04
│   ├── 05
│   ├── 06
│   ├── 07
...

The first thing I need is to be able to extract the EXIF metadata from the files which contain the date when the photo was taken. I found that exiv2 is a nice little tool that does exactly what I need.

sudo pacman -S exiv2 # Arch Linux

Its output is very conveniently laid out for grepping and further processing.

$ exiv2 IMG_20190101_141739.jpg 

File name       : IMG_20190103_151833.jpg
File size       : 2921643 Bytes
MIME type       : image/jpeg
Image size      : 4048 x 3036
Camera make     : Google
Camera model    : Pixel
Image timestamp : 2019:01:01 14:17:39
...

Now I had to create a directory structure and place each file in the correct directory. I didn’t want to do two passes on a large number of files, so I wrote a script that generated a new script for those operations, but I executed them separately as you’ll see later on. The script exifdata.sh looked like this.

#!/usr/bin/env sh

set -euo pipefail

BASE="../sorted"
T=`exiv2 "$1" 2>/dev/null | tr '\0' '\n' | grep 'Image timestamp' | cut -d ' ' -f 4`

if [ -n "$T" ]
then
	YEAR=`echo "$T" | cut -d ':' -f 1`
	MONTH=`echo "$T" | cut -d ':' -f 2`

  if [ \( -n "$YEAR" \) -a \( -n "$MONTH" \) ]
  then
    echo "mkdir -p $BASE/$YEAR/$MONTH"
    echo "mv -i \"$1\" $BASE/$YEAR/$MONTH"
  else
    echo "Could not parse date for $1 : $T" > /dev/stderr
  fi

else
  echo "Could not extract info for $1" > /dev/stderr
fi

Then I applied this script for each file.

find . -iname '*.jpg' -exec ./exifdata.sh "{}" \; > complete.sh

This created a large file which contained directory creation and file moving commands. But obviously, a lot of directory creation commands are duplicates, because a bunch of photos are taken in the same year and same month. So instead of executing this whole script, I extracted only mkdir commands sorted and removed duplicates like so:

grep mkdir complete.sh | sort | uniq > dirs.sh
grep -v mkdir complete.sh > move.sh

The final step is to review and execute these two files. First the dirs.sh script creates the required directory structure in a directory that is one level above where all the files reside and the script is being run. The second one performs the move operation on the files. For file moving step I included the -n parameter which prevents file overwrite. And that’s it! I now have a proper structure for all my photos and no duplicates. Throughout all this process I tried to use parameters that should help with keeping the data intact, not overwrite anything and only remove things that are 100% duplicate. But as with all things on a computer - it will do what you tell it, not what you want. So I cannot overstress the importance of having a backup if you’re applying these steps to files that are important to you.

That’s how I did it, there are of course other ways of achieving the same result or maybe do some of those steps more efficiently. Leave a comment if you want to share such ideas with others.

linux
CLI