Jan 18, 2015

An exception occurred: 'PG::ConnectionBad: PQconsumeInput() SSL connection has been closed unexpectedly' on Heroku

tl;dr: Always use at least a standard database plan for production apps on Heroku.

Recently I came across some unexpected exceptions in an app running on Heroku. Others have had a similar experience.

# It starts with the connection being unexpectedly closed
An exception occurred: 'PG::ConnectionBad: PQconsumeInput() SSL connection has been closed unexpectedly'
# Followed by a bunch of these
An exception occurred: 'PG::ConnectionBad: PQsocket() can't get socket descriptor'

It turns out that the hobby tier for databases on Heroku have "Unannounced maintenances and database upgrades". Heroku responded basically saying: `The hobby plans have their connections closed sometimes. ActiveRecord should deal with it via a connection pool. If not, you'll have to reconnect.`. The exceptions I saw were on a long running background job (in this case a RabbitMQ consumer), where ActiveRecord was not handling reconnecting. :-(

If you get these exceptions, the first thing to do is to restart your server, which you can do by running this command.

#assuming your git remote is named production
heroku restart -r production 

The best long term fix is to upgrade you Heroku database to at least the standard-0 plan (currently $50/mth). This will prevent connections from being unexpectedly closed. It will also help performance.

If you really don't want to upgrade, you can try reconnecting with your code

  # Do the action you want to do here.
rescue ActiveRecord::StatementInvalid => e
  # Make sure someone finds out!
  # ExceptionNotifier.notify_exception(e)
  # Rails.logger.error { "[insert details here] : {e}" }

  # Now, try to reconnect
  # Don't forget - you need to make sure to retry the original action somehow.

On a final note - there is no way to tell when this might happen. For this particular app, there is both a production and a staging server that are configured in the same way. These exceptions have occurred multiple times on different days on the production server, but have never happened on staging.