Going paperless: step 1

In an effort to leave a (little) smaller footprint on this world, I’ve been minimizing the amount of paper I receive through mail every day by requesting digital versions of everything from invoices to tax statements to payment slips. Right now I receive most of these through email as a PDF file and I put them in a folder on my NAS and then never touch it again until I need it. That’s when I spend hours and hours trying to locate what I need. I mean.. search only gets you so far when it concerns images or passworded PDF’s.

This post describes how to at least get rid of those pesky passworded PDF’s and in the follow-ups on this post, I’ll also dive deeper into my efforts to go truly paperless.

Pesky Passworded PDF’s and how to get rid of them

Some companies insist on supplying you with a passworded PDF. The problem with these is that you do need the password before they’re searchable and what’s worse.. the password may change over time (or you forget it), leaving you with useless files. Of course I could just open the PDF, enter the password, print the file to another PDF and then save it to my NAS. Or I could print it on paper and put it in a physical folder for safekeeping. But seeing as I want to reduce paper and I hate doing manual labor, I just couldn’t resist automating this. So this is how it went:

  • Every 24 hours, an Azure Logic App checks a subfolder in my Exchange Online mailbox for new mails with PDF attachments
  • If it finds any emails matching those criteria, it pipes the attachments into an Azure Function
  • The function removes the password using a password which is stored in Azure Key vault
  • It then returns a filestream to my Azure Logic App which safely stores it as a PDF in OneDrive

The OneDrive folder is then synced with my Synology NAS overnight and there we go.. password removed.

If only it were that easy

The idea was simple enough and after finding a library that didn’t need a commercial license and was also .NET Core compatible iText 7 (MIND YOU: in commercial software you’ll probably want to buy a license due to the AGPL that’s attached to this library), I started to code my Function which was easy enough. But then when deploying, there were a few caveats. The first was that I wouldn’t get any attachments in my function for some reason and then there was the issue of getting my password accessible from within my Function.

Attachments and Logic Apps’ Exchange Connector

Azure Logic Apps is an awesome way to quickly create an application with standard functionality but also allows you to call custom code with a Function connector. In this application I use a few standard connectors and add a custom function. My trigger is a standard Office 365 connector that periodically checks a subfolder in my mailbox, then I have some conditional logic that checks for specific conditions based on the email that triggered the Logic App and if the condition is met, it will proceed with a foreach loop on all the attachments attached to the email and push them through my custom Function which removes the password. Finally another standard OneDrive connector stores the file in my OneDrive account.

This process took a whole 2 minutes to create the Logic App, create the logic and make it all work. Unfortunately it didn’t work. In fact, I wasn’t getting any attachments into my Function and I couldn’t for the life of it figure out why. But I’ll help you out here and say what I’m almost ashamed to say: the ‘Include Attachments’ dropdown, actually means ‘Do you want to do something with these attachments in the following step?’ and you will probably want to answer this with a firm ‘yes’ :-)

After this, it was on to the next step.. securing my secrets.

Azure Functions, MSI, Key vault and VNet integration

For a while now it is possible to use Managed Identities in Azure. This is a great solution in case you don’t want to specify a username/password (a secret) to access your secrets because that kind of defeats the point or at the very least complicates things. So there really was no choice other than to store my secrets in a Key vault and then use an MSI (Managed Service Identity) to access the proper secrets. The best thing there is that you can even reference these secrets from within a function by merely using a specially crafted Environment Variable, this process is called ‘Key vault References’. You simply create an environment variable called something like ‘MyPDFPassword’ and then as a value you use @Microsoft.KeyVault({referenceString}) where {referenceString} is the secret location, something like this:

Now all you need to do is grant the MSI you’re using for the function permission to access the secret in your vault and don’t firewall your Key vault. Wait.. what did you say? Isn’t it a good thing to make sure only Azure Services can access my resources? Well yes dear reader, generally it is, but not when your service isn’t supported yet! Even though Azure App Service is mentioned there, and even though Azure Functions may run in ‘sort of an App Service’, do not make make the same mistake I did and tick the ‘Selected networks’ box, or you’ll spend quite some time figuring out why your Function gets the name of the Environment Variable that you’re using to reference rather than the value of the secret it is supposed to reference..

After you do NOT check that box, you can happily access the passwords for your PDF’s and can even version the secret which might come in handy in case you do ever need that old password.

One step closer

This solution has been spinning for a few months now and has been removing those pesky passwords from my PDF’s and it’s even quite cheap.. In fact, it cost me a whopping €0.01 per month :-) I’m not sure if I’d make this investment as a company, but at least I’ve learned some more and got to play around with some of the more recent concepts in Azure. The code for the Function can be found on GitHub. Over the course of the next few months I’ll continue writing about my efforts of going paperless which includes my first steps into the world of AI (more specifically Machine Learning) to classify documents. And by the end of the year, I hope to be completely paperless.

Share Comments