PDF OCR via React, Django REST Framework, and Heroku, Part 5: Deploy to Heroku

Joseph Cardenas
9 min readAug 3, 2020

Now that we have our front and back ends working locally and our prerequisites taken care of we can step up the difficulty by deploying them to an online service provider. We’ll be using Heroku for this, as it is relatively simple to use (assuming you set things up correctly to being with 😄).

Deployment Prerequisites

Now that we have everything working as we need it to locally, we need to prepare to deploy our apps to Heroku so our work can OCR documents from anywhere. We are releasing our local application into the wild where anyone can interact with it, so we need to make sure the app is safe and secure.

Preparing our Django app

The least we have to do is make sure the DEBUG setting in our settings.py file is set to False . Remember all those long, helpful error messages Django gave you if there was an error in your application? Those give out sensitive information we can’t have just anyone looking at.

If we run python manage.py check --deploy command we’ll see more security settings to change to make sure our app is safe post-deployment. Let’s run that now.

Uh oh . . .

Well, it looks like we have some work to do. Now, our app will work fine if we’re told we have some security vulnerabilities. That being said, it’s good practice to get our app as secure as we can. In a nutshell, I configured my security settings as follows:

#Security settings for deployment# SECURITY WARNING: keep the secret key used in production secret!SECRET_KEY = os.environ.get('DJANGO_SECRET_KEY', 'reference-project-secret-key')# SECURITY WARNING: don't run with debug turned on in production!DEBUG = FalseALLOWED_HOSTS = ['.herokuapp.com']CSRF_COOKIE_SECURE = 'True'SECURE_REFERRER_POLICY = 'origin'SECURE_SSL_REDIRECT= TrueSESSION_COOKIE_SECURE = TrueCORS_ORIGIN_ALLOW_ALL = FalseCORS_WHITELIST = ['*.herokuapp.com']

Creating the requirements.txt, Procfile, and runtime.txt

Heroku needs either a requirements.txt file, a Pipfile, or setup.py file in order to see what our dependencies are. As we are using Poetry to manage our requirements, we’ll need to type poetry run pip freeze > requirements.txt within our top-level directory. Opening our new file, we see all the programs we added via Poetry and their own dependencies (version numbers may differ depending on when you’re reading this):

asgiref==3.2.10cffi==1.14.0chardet==3.0.4coloredlogs==14.0dj-database-url==0.5.0Django==3.0.8django-cors-headers==3.4.0django-heroku==0.3.1djangorestframework==3.11.0ghostscript==0.6gunicorn==20.0.4humanfriendly==8.2img2pdf==0.3.6lxml==4.5.2ocrmypdf==10.2.0pdfminer.six==20200517pikepdf==1.17.2Pillow==7.2.0pluggy==0.13.1psycopg2==2.8.5pycparser==2.20pycryptodome==3.9.8pyreadline==2.1pytesseract==0.3.4pytz==2020.1reportlab==3.5.44sortedcontainers==2.2.2sqlparse==0.3.1tqdm==4.48.0whitenoise==5.1.0

Now that we have our requirements.txt, we’ll move on to creating our Procfile. Again within the top-level directory, create a file titled only Procfile without a file extension. Within that file, add a single line: web: gunicorn tutorial_backend.wsgi —- log-file -.

Be default, Heroku will use Python 3.6 as our runtime, so if we use something other than that a runtime.txt file containing nothing more than the Python version used for this project — in this case python-3.8.3

Next we’ll need to modify our settings.py file to make use of the django-heroku package we initially installed. At the top of the settings.py, add import django_heroku and then go the bottom of the file and add the following:

# Activate Django-Heroku.
django_heroku.settings(locals())

Before we move on, let’s make a new branch and push our code.

Set up a Heroku account/Download Heroku CLI

First, you need to head over to the Heroku site and set up a free account (you can sort this out on your own.) After that gets taken care of, you’ll start up a new App starting with the back end.

We’ll title our back end application predictably:

After signing up create new App. Title it something like “tutorial-backend”. underscores are not allowed
Title your back end App in a way to make its function clear. The name “tutorial-backend” was already taken, apparently.

You do not need to add your new app to a pipeline. After creating the new application, you’ll be greeted with a screen guiding you through using the Heroku CLI (Command Line Interface). These instructions are pretty simple and straightforward, so I’ll leave them to you. After pushing your code to Heroku, go back to the “Deploy” screen on your app’s page. Under the “GitHub — Connect to GitHub” button, connect to your repo either by typing in your project’s repo name or by using the dropdown menu. After successfully connecting to GitHub, scroll down a bit and hit the “Enable Automatic Deploys” button. This way, every time we update our application and push it to our GitHub repo, it automatically updates our app on Heroku. In addition to this, access to the CLI gives us access to more fine-grained tools than the website alone.

One example of how the Heroku CLI is more powerful than the dashboard is the capacity to enter a bash shell via typing heroku run bash which will give you the power to run Linux commands inside the Heroku server your app lives in.

After you’re both connected to GitHub and have enabled automatic deploys, you’ll be ready to move onto the next steps.

Deploy Our First App

If we hit “Open app” button in the upper right of our app’s toolbar (or by the https://tutorial-backend.herokuapp.com/ url), we can test to see if our first deployment worked.

Oh crap!

Well, our app deployed but clearly doesn’t run due to a missing dependency. Helping mitigate stuff like this is why we added our Aptfile with the Tesseract library, but that clearly didn’t work like we intended. We clearly need to include more software to make our back end app work.

Adding Buildpacks

We need to have buildpacks set for our application for things to deploy, let alone run post deployment (as you can clearly deploy an app and also have errors when you run the app 😅). A key buildpack is the heroku/python buildpack. Without this buildpack, Heroku won’t even know what language you are trying to use in your application. If it’s not already there, you can add this simply by going to the “Settings” section of your Heroku project and clicking the Python icon in the “Buildpacks” section. Via the Heroku command line

In your Heroku app’s Settings page, click the right-hand side “Add buildpack” button to specify a Python buildpack.
Defining a Python buildpack for the tutorial-backend app.

Now that our language buildpack is set, we can move onto loading the other files we need. Go back to your App’s settings page and add two more buildpacks. Because the OCR capability of our back end requires a wide variety of dependencies, these next two buildpacks are going to ensure we maintain that capability while the application is deployed.

Add the following two buildpacks, bringing your total number of buildpacks to three:

https://github.com/heroku/heroku-buildpack-apt
https://github.com/pathwaysmedical/heroku-buildpack-tesseract
heroku/python, heroku-buildpack-apt, and the pathwaysmedical tesseract buildpack are what you should have as buildpacks
heroku/python, heroku-buildpack-apt, and the pathwaysmedical tesseract buildpack are what you should have as buildpacks

The order of these buildpacks matters, so make sure to have them in this order:

heroku/python
https://github.com/heroku/heroku-buildpack-apt
https://github.com/pathwaysmedical/heroku-buildpack-tesseract

The new buildpack heroku-buildpack-apt is important, as it requires us to create an Aptfile in our top-level directory. The Aptfile is how you tell Heroku you want more software packages installed than are specified in your requirements.txt file. In this case, our Aptfile is going to contain the following two items:

tesseract-ocr
tesseract-ocr-eng
libpng-dev
libtesseract-dev

Using the heroku-buildpack-apt buildpack without a corresponding Aptfile will result in your tutorial-backend app not properly working even after other deployment requirements are set.

Adding Config Vars

We’re almost there, but now we need to add some variables to further configure our deployed back end.

Config Vars

Because our OCR app needs access to the Tesseract library to actually do the OCR, we made an Apt file to bring in those extra libraries. To get our OCR back end to actually work post-deployment in a variety of environments, we need to change another setting. Go back to the Settings section in Heroku and click Reveal Config Vars.

We should have three Config Vars: DATABASE_URL (already there), SECRET_KEY, and TESSDATA_PREFIX

Now, the DATABASE_URL should already be supplied for you, so now we need to dip back into our command line and type heroku run bash . This will start up a bash shell within our app as it’s deployed on Heroku and allow us to search for the TESSDATA_PREFIX value we need. Type find iname tessdata and there should be two lines of values that come up. There ought to be a line stating something like ./.apt/usr/share/tesseract-ocr/4.00/tessdata , so copy and paste that value into the second line of your “Config Vars” to where the result is TESSDATA_PREFIX ./.apt/usr/share/tesseract-ocr/4.00/tessdata . So now you should have one line with the DATABASE_URL and another with the just-discovered TESSDATA_PREFIX value. Learn more about configuration variables here.

Remember back in the first article where we hid our Django SECRET_KEY as an environment variable? We still need to have some sort of secret key for our Django project to work properly with Heroku, so our last config variable is going to be the default ‘backend-heroku-secret-key’ we defined earlier So now that we have all of the pieces in place, make sure to push to your GitHub repo to re-deploy your app. Once depoyed, launch the app and you should see you see a nice, fresh Django REST landing page:

We got it!

Great! Now let’s test our deployed back end with Postman to see if it works. This process will work exactly like it did before except our URL will be that of our deployed Heroku app rather than the localhost URL. You ought to get a successful response and see the result when you reload your deployed REST back end.

Prepping our React frontend

Now that we have our back end app deployed and working, we can move onto the front end. You’ll start the front end app on Heroku the same way we started it for our back end, so I’ll trust you’ll get that taken care of on its own.

Organize your lockfiles

Like Python’s requirements.txt file, React programs also have files meant to keep track of dependencies. We’ll be using a package-lock.json file for this project, so any other lockfiles in project ought to be deleted.

Load the Buildpack

Just like for our back end, we need to load a buildpack for our front end. You can either load this buildpack from the Settings page or type the commands heroku buildpacks:add BUILDPACK NAME into your CLI. In this case we’re adding the https://github.com/mars/create-react-app-buildpack.

Change the URL the Front End Looks At

Now that our back end has moved we need to edit the URL our frontend needs to communicate with. The line reading let url = ‘localhost:8000’; needs to be changed to read

let url = ‘https://tutorial-backend1.herokuapp.com/';

Testing the Front End

Let’s test things out! Start up both your front and back end apps and then, like we did before when they were hosted locally, send a file to the back end. Make sure to give your title and content something descriptive to make it clear you sent the request from the front end hosted on Heroku. Now reload your back end and you ought to see a successful response like the following:

Picture of a successful POST request sent from our front end, where both apps are on Heroku.
Your apps should now be able to successfully talk to each other now!

We’re Not Done Yet

What I’ve shown you so far is just the bare minimum. Of course you could manually check to see if the PDF sent to the Django app has actually been OCR’d, but this is something we’d like to see displayed on the front end itself. There is also a lot of PDF manipulation you can do other than just making text readable. All these are updates to this project, so stay tuned.

Next up in this series is how to upload your files to an Amazon S3 bucket and display those files along with their title and description.

--

--