PDF OCR via React, Django REST Framework, and Heroku, Part 5: Deploy to Heroku
Now that we have our front and back ends working locally and our prerequisites taken care of we can step up the difficulty by deploying them to an online service provider. We’ll be using Heroku for this, as it is relatively simple to use (assuming you set things up correctly to being with 😄).
Deployment Prerequisites
Now that we have everything working as we need it to locally, we need to prepare to deploy our apps to Heroku so our work can OCR documents from anywhere. We are releasing our local application into the wild where anyone can interact with it, so we need to make sure the app is safe and secure.
Preparing our Django app
The least we have to do is make sure the DEBUG
setting in our settings.py
file is set to False
. Remember all those long, helpful error messages Django gave you if there was an error in your application? Those give out sensitive information we can’t have just anyone looking at.
If we run python manage.py check --deploy
command we’ll see more security settings to change to make sure our app is safe post-deployment. Let’s run that now.
Uh oh . . .
Well, it looks like we have some work to do. Now, our app will work fine if we’re told we have some security vulnerabilities. That being said, it’s good practice to get our app as secure as we can. In a nutshell, I configured my security settings as follows:
#Security settings for deployment# SECURITY WARNING: keep the secret key used in production secret!SECRET_KEY = os.environ.get('DJANGO_SECRET_KEY', 'reference-project-secret-key')# SECURITY WARNING: don't run with debug turned on in production!DEBUG = FalseALLOWED_HOSTS = ['.herokuapp.com']CSRF_COOKIE_SECURE = 'True'SECURE_REFERRER_POLICY = 'origin'SECURE_SSL_REDIRECT= TrueSESSION_COOKIE_SECURE = TrueCORS_ORIGIN_ALLOW_ALL = FalseCORS_WHITELIST = ['*.herokuapp.com']
Creating the requirements.txt, Procfile, and runtime.txt
Heroku needs either a requirements.txt
file, a Pipfile
, or setup.py
file in order to see what our dependencies are. As we are using Poetry to manage our requirements, we’ll need to type poetry run pip freeze > requirements.txt
within our top-level directory. Opening our new file, we see all the programs we added via Poetry and their own dependencies (version numbers may differ depending on when you’re reading this):
asgiref==3.2.10cffi==1.14.0chardet==3.0.4coloredlogs==14.0dj-database-url==0.5.0Django==3.0.8django-cors-headers==3.4.0django-heroku==0.3.1djangorestframework==3.11.0ghostscript==0.6gunicorn==20.0.4humanfriendly==8.2img2pdf==0.3.6lxml==4.5.2ocrmypdf==10.2.0pdfminer.six==20200517pikepdf==1.17.2Pillow==7.2.0pluggy==0.13.1psycopg2==2.8.5pycparser==2.20pycryptodome==3.9.8pyreadline==2.1pytesseract==0.3.4pytz==2020.1reportlab==3.5.44sortedcontainers==2.2.2sqlparse==0.3.1tqdm==4.48.0whitenoise==5.1.0
Now that we have our requirements.txt
, we’ll move on to creating our Procfile. Again within the top-level directory, create a file titled only Procfile
without a file extension. Within that file, add a single line: web: gunicorn tutorial_backend.wsgi —- log-file -
.
Be default, Heroku will use Python 3.6 as our runtime, so if we use something other than that a runtime.txt
file containing nothing more than the Python version used for this project — in this case python-3.8.3
Next we’ll need to modify our settings.py file to make use of the django-heroku
package we initially installed. At the top of the settings.py
, add import django_heroku
and then go the bottom of the file and add the following:
# Activate Django-Heroku.
django_heroku.settings(locals())
Before we move on, let’s make a new branch and push our code.
Set up a Heroku account/Download Heroku CLI
First, you need to head over to the Heroku site and set up a free account (you can sort this out on your own.) After that gets taken care of, you’ll start up a new App starting with the back end.
We’ll title our back end application predictably:
You do not need to add your new app to a pipeline. After creating the new application, you’ll be greeted with a screen guiding you through using the Heroku CLI (Command Line Interface). These instructions are pretty simple and straightforward, so I’ll leave them to you. After pushing your code to Heroku, go back to the “Deploy” screen on your app’s page. Under the “GitHub — Connect to GitHub” button, connect to your repo either by typing in your project’s repo name or by using the dropdown menu. After successfully connecting to GitHub, scroll down a bit and hit the “Enable Automatic Deploys” button. This way, every time we update our application and push it to our GitHub repo, it automatically updates our app on Heroku. In addition to this, access to the CLI gives us access to more fine-grained tools than the website alone.
One example of how the Heroku CLI is more powerful than the dashboard is the capacity to enter a bash shell via typing
heroku run bash
which will give you the power to run Linux commands inside the Heroku server your app lives in.
Deploy Our First App
If we hit “Open app” button in the upper right of our app’s toolbar (or by the https://tutorial-backend.herokuapp.com/
url), we can test to see if our first deployment worked.
Well, our app deployed but clearly doesn’t run due to a missing dependency. Helping mitigate stuff like this is why we added our Aptfile with the Tesseract library, but that clearly didn’t work like we intended. We clearly need to include more software to make our back end app work.
Adding Buildpacks
We need to have buildpacks set for our application for things to deploy, let alone run post deployment (as you can clearly deploy an app and also have errors when you run the app 😅). A key buildpack is the heroku/python buildpack
. Without this buildpack, Heroku won’t even know what language you are trying to use in your application. If it’s not already there, you can add this simply by going to the “Settings” section of your Heroku project and clicking the Python icon in the “Buildpacks” section. Via the Heroku command line
Now that our language buildpack is set, we can move onto loading the other files we need. Go back to your App’s settings page and add two more buildpacks. Because the OCR capability of our back end requires a wide variety of dependencies, these next two buildpacks are going to ensure we maintain that capability while the application is deployed.
Add the following two buildpacks, bringing your total number of buildpacks to three:
https://github.com/heroku/heroku-buildpack-apt
https://github.com/pathwaysmedical/heroku-buildpack-tesseract
The order of these buildpacks matters, so make sure to have them in this order:
heroku/python
https://github.com/heroku/heroku-buildpack-apthttps://github.com/pathwaysmedical/heroku-buildpack-tesseract
The new buildpack heroku-buildpack-apt
is important, as it requires us to create an Aptfile
in our top-level directory. The Aptfile
is how you tell Heroku you want more software packages installed than are specified in your requirements.txt
file. In this case, our Aptfile
is going to contain the following two items:
tesseract-ocr
tesseract-ocr-eng
libpng-dev
libtesseract-dev
Using the heroku-buildpack-apt
buildpack without a corresponding Aptfile
will result in your tutorial-backend
app not properly working even after other deployment requirements are set.
Adding Config Vars
We’re almost there, but now we need to add some variables to further configure our deployed back end.
Config Vars
Because our OCR app needs access to the Tesseract library to actually do the OCR, we made an Apt file to bring in those extra libraries. To get our OCR back end to actually work post-deployment in a variety of environments, we need to change another setting. Go back to the Settings section in Heroku and click Reveal Config Vars.
Now, the DATABASE_URL should already be supplied for you, so now we need to dip back into our command line and type heroku run bash
. This will start up a bash shell within our app as it’s deployed on Heroku and allow us to search for the TESSDATA_PREFIX
value we need. Type find iname tessdata
and there should be two lines of values that come up. There ought to be a line stating something like ./.apt/usr/share/tesseract-ocr/4.00/tessdata
, so copy and paste that value into the second line of your “Config Vars” to where the result is TESSDATA_PREFIX ./.apt/usr/share/tesseract-ocr/4.00/tessdata
. So now you should have one line with the DATABASE_URL and another with the just-discovered TESSDATA_PREFIX value. Learn more about configuration variables here.
Remember back in the first article where we hid our Django SECRET_KEY as an environment variable? We still need to have some sort of secret key for our Django project to work properly with Heroku, so our last config variable is going to be the default ‘backend-heroku-secret-key’
we defined earlier So now that we have all of the pieces in place, make sure to push to your GitHub repo to re-deploy your app. Once depoyed, launch the app and you should see you see a nice, fresh Django REST landing page:
Great! Now let’s test our deployed back end with Postman to see if it works. This process will work exactly like it did before except our URL will be that of our deployed Heroku app rather than the localhost URL. You ought to get a successful response and see the result when you reload your deployed REST back end.
Prepping our React frontend
Now that we have our back end app deployed and working, we can move onto the front end. You’ll start the front end app on Heroku the same way we started it for our back end, so I’ll trust you’ll get that taken care of on its own.
Organize your lockfiles
Like Python’s requirements.txt
file, React programs also have files meant to keep track of dependencies. We’ll be using a package-lock.json file for this project, so any other lockfiles in project ought to be deleted.
Load the Buildpack
Just like for our back end, we need to load a buildpack for our front end. You can either load this buildpack from the Settings page or type the commands heroku buildpacks:add BUILDPACK NAME
into your CLI. In this case we’re adding the https://github.com/mars/create-react-app-buildpack
.
Change the URL the Front End Looks At
Now that our back end has moved we need to edit the URL our frontend needs to communicate with. The line reading let url = ‘localhost:8000’;
needs to be changed to read
let url = ‘https://tutorial-backend1.herokuapp.com/';
Testing the Front End
Let’s test things out! Start up both your front and back end apps and then, like we did before when they were hosted locally, send a file to the back end. Make sure to give your title and content something descriptive to make it clear you sent the request from the front end hosted on Heroku. Now reload your back end and you ought to see a successful response like the following:
We’re Not Done Yet
What I’ve shown you so far is just the bare minimum. Of course you could manually check to see if the PDF sent to the Django app has actually been OCR’d, but this is something we’d like to see displayed on the front end itself. There is also a lot of PDF manipulation you can do other than just making text readable. All these are updates to this project, so stay tuned.
Next up in this series is how to upload your files to an Amazon S3 bucket and display those files along with their title and description.