Email Attachment Detacher
By Andrius Miasnikovas
Recently I was cleaning out my GMail mailbox. Yes, yes I know they give you lots of space and you can even buy more if you need it, but I kind of wanted to leave fewer old emails hanging around. I don’t know, we’ll call it “spring cleaning”. Though not all old emails are useless, some of them I actually wanted to keep and archive offline. For the most part things went smoothly, I made good use of Google Takeout service by downloading labeled emails as sepparate chunks. Though be aware that quotas apply, if I’m not mistaken you get 3 archives per day and a total of 7 per week. Also you might get temporarily blocked like me for “unusually high activity”. But that’s not a big deal, as it only suspends access for an hour or two and then you can get back at it. The actual issue that I ran into was when I wanted to download a few old email attachments. I didn’t need the email itself, just the attachments and I was at my takeout limit. Unfortunatelly these attachments were archives which contained files like .bat, .jar and similar executables which GMail tags as pottentially dangerous and doesn’t let you download them.
This is strange and quite annoying as I know what these files are, since I was the one sending them. So I had to come of with a way to retrieve my stuff. Luckily GMail allows you to access the raw email. You just click the down arrow menu where all the reply, forward and print options are located and chooose “show original”. This provides an intermediary screen with a summary of the email and contains link named “Download original” which delivers the goods. And it looks something like this
MIME-Version: 1.0
Received: by 11.104.26.5 with HTTP; Mon, 2 Nov 2009 06:46:38 -0800 (PST)
Date: Mon, 2 Nov 2009 16:46:38 +0200
Delivered-To: [email protected]
Message-ID: <[email protected]>
Subject: uct
From: Sender <[email protected]>
To: Recipient <[email protected]>
Content-Type: multipart/mixed; boundary=001636c5b322e9c65b0489647146
--001636c5b322e9c65b0489647146
Content-Type: text/plain; charset=ISO-8859-1
--001636c5b322e9c65b0489647146
Content-Type: application/octet-stream; name="uct.zip"
Content-Disposition: attachment; filename="uct.zip"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_g1jcgjo20
UEsDBAoAAAAAACGickq85iwZCgAAAAoAAAAHABwAYWJjLnR4dFVUCQADnXnNWK15zVh1eAsAAQT1
AQAABBQAAABzb21lIHRleHQKUEsBAh4DCgAAAAAAIaJySrzmLBkKAAAACgAAAAcAGAAAAAAAAAAA
AKSBAAAAAGFiYy50eHRVVAUAA515zVh1eAsAAQT1AQAABBQAAABQSwUGAAAAAAEAAQBNAAAASwAA
AAAA
--001636c5b322e9c65b0489647146--
To tell you the truth I’m not quite sure how this email format is called, but it’s definitelly not mbox, this seems like raw SMTP communication and under Windows I believe it would have an .eml extension, but GMail creates it simply as a text file. A quick Google search on how to extract the file within didn’t reveal anything useful. Looking at this format things are quite obvious, so why not write my own to extract these attachments in batches. A fun micro project which would yield some actual benefit. I did this in Go since it’s my language of choice for all my command-line utilities which I don’t want having any runtime dependencies and there’s the added benefit of simple cross-compilation. After finishing this tool I decided to put it up on GitHub and called it Detacher. It’s not that I expect a lot of people to have similar issues which can be solved using this method, but maybe it will prove useful to someone. And of course there’s always the satisfaction of having completed something i.e. properly uploading the project with a release and pre-built binaries for folks who don’t have Go installed. Here’s a simple use case, say you have those original email files downloaded in your emails/originals
directory.
andrius@graybox$ tree emails
emails
└── originals
└── demo_7z.txt
Simply run detacher and specify this directory
andrius@graybox$ detacher -base emails/originals
Scanning for attachments...
demo_7z.txt
That’s it, now you have a new directory called detached
alongside the original one and it contains all your attachments
andrius@graybox$ tree emails
emails
├── detached
│ └── f_ggsflwsc0_demo.7z
└── originals
└── demo_7z.txt
I haven’t tested it with very large files, though it should work, but might be a bit slower. I tested it with multiple attachments in a single email and it handled them correctly. Also you might have noticed the prefix of the detached files, this is the email’s internal attachment id. I decided to use it as a prefix because there might be files with the same name in different emails and this will prevent conflicts and overwriting. That’s it, happy detaching!