Error reporting on Linux via Gmail for automated tasks

Have a critical cron’d automated task? You’d like to be notified if something fails? With the ubiquity of smartphones, you can notice an error right away and take action.

Wow, someone wrote to me! It’s from someone named “mdadm”. Must be spam again!

Computers sending emails for various purposes is nothing new. I have a couple of critical cron jobs on my home computer; syncing the family photos to my remote server, backing up the said remote server to my local computer, etc. These are all tasks that are defined in the daily crontab, and without a proper or any alerting system, something can go wrong and you can really find yourself in a bind if it turns out the backup procedure died months ago because the ssh key changed or something. You can either check the backup or automated task every single day to make sure nothing went wrong, or you can setup a robust alerting system that will send you an email if something goes wrong. This is not the only use case, Mdadm can also send you an email if a disk drops from a RAID array etc.

Setting up a Gmail relay system with Postfix

Installing and managing an email service is difficult, and you have to contend with all sorts of issues, is your server blacklisted, do you have the appropriate SPF records, is your IP reverse resolvable to the domain name etc, etc. Most of these requirements are difficult or impossible with a simple home computer behind a router without an FQDN. With the relay, you’ll be able to send an email without having to worry if it’ll end up in spam, or not delivered at all as it will be sent from a real Gmail account. Luckily, it’s extremely simple to set it up:

  • Create a Gmail account.
  • Allow “less secure” apps access your new gmail account. Don’t be fooled by how they’re calling it, we’ll be having full encryption for email transfer.
  • Setup Postfix.

I’ll keep the Postfix related setup high level only:

  • Install Postfix with your package manager and select “Internet site”
  • Edit /etc/postfix/sasl_passwd and add:
[smtp.gmail.com]:587    username@gmail.com:PASSWORD
  • Chmod /etc/postfix/sasl_passwd to 600
  • At the end of /etc/postfix/main.cf add:
relayhost = [smtp.gmail.com]:587 # the relayhost variable is empty by default, just fill in the rest
smtp_use_tls = yes
smtp_sasl_auth_enable = yes
smtp_sasl_security_options =
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_tls_CAfile = /etc/ssl/certs/ca-certificates.crt
  • Use postmap to hash and compile the contents of the sasl_password file:
# postmap /etc/postfix/sasl_passwd
  • Restart the postfix service

Your computer should now be able to send emails. Test with a little bit of here document magic:

$ mail -s "Testing email" youremail@example.com << EOF
Testing email :)
EOF

If everything went fine, you should be getting the email promptly from your new gmail account. I haven’t tried with other email providers, but it should all be pretty much the same.

Usage example

Now that you have a working relay, it’s time to put it to good use. Here is a simple template script with two key functions that can be sourced through Bash so you can use it within other scripts without having to copy & paste it around.

#!/bin/bash

# Global variables
NAME=$(basename "$0")
LOG=/var/log/"$NAME".log
EMAIL=youremail@whatever.com
LOCKFILE=/tmp/"$NAME".lock
HOST=$(hostname -s)

# All STDERR is appended to $LOG
exec 2>>$LOG

# An alert function if the locking fails
function lock_failure {
  mail -s "Instance of "$0" is already running on $HOST" $EMAIL << EOF
Instance of "$0" already running on $HOST. Locking failed.
EOF
  exit 1
}

# An alert function if something goes wrong in the main procedure
function failure_alert {
  mail -s "An error has occured with "$0" on $HOST" $EMAIL << EOF
An error has occured with "$0" on $HOST. Procedure failed. Please check "$LOG"
EOF
  exit 1
}


function procedure {
  # If file locking with FD 9 fails, lock_failure is invoked
  (
    flock -n 9 || lock_failure
    (
      # The entire procedure is started in a subshell with set -e. If a command fails
      # the subshell will return a non-zero exit status and will trigger failure_alert
      set -e
      date >> $LOG
      command 1
      command 2
      [...]
    )

    if [ $? != 0 ]; then
      failure_alert
    fi

  ) 9>$LOCKFILE
}

function main {
  procedure
}

main

flock(1) is used to make sure there is only one instance of the script running, and we’re checking the exit status of the commands. If you don’t need instance locking, you can simply forego the lock_failure function. The actual work is contained in another subshell which is terminated if any of the commands in the chain fail and sends an email advising you to check $LOG.

Conclusion

A lot of Linux services like Mdadm or S.M.A.R.T. have a feature to send emails if something goes wrong. For example, I set it up to send me an email if a drive fails inside my RAID 1 array, I just had to enter my email address in a variable called MAILADDR in the mdadm.conf file. A couple of days later, I received an email at 7 AM; ooooh someone emailed me. I had a rude awakening, it was Mr. Mdadm saying that I have a degraded array. It turned out to be the SATA cabling that was at fault, but still. This could have gone unnoticed for who knows how long and if the other disk from the RAID 1 failed later on, I could have had serious data loss. If you want to keep your data long term you can’t take any chances, you need to know if your RAID has blown up and not rely on yourself to check it out periodically, you won’t, you can’t, that’s why we automate.

Be careful when you write these programs. If your script is buggy and starts sending a lot of emails at once for no good reason, Gmail will block your ass faster than you can say “Linux rules!” If you’re blocked by Gmail, you might miss an important email from your computer.

Leave a Reply