All your websites are belong to us (with a reverse proxy)
I have taken the habit to use a single domain as an umbrella for multiple websites. Which in short means we can benefit from different software stacks being hosted on different platforms.
As a bonus, we can use a custom domain on GitHub Pages with our own SSL certificate.
§tl;dr
Apache httpd ProxyPass
and ProxyPassReverse
are our best friend to mount an external URL (and its descendants) onto a folder of our very own domain.
§Why using a reverse proxy?
We are in 2016 and mentioning reverse proxy in a conversation sounds odd and pretty much dated from a different century. But hey, who cares?
I use reverse proxies for various reasons:
- to isolate components
So instead of putting everything in a larger and larger monolithic website, we can manage them as different git repositories and have a different build process as well (like/photography
and/talks
on this website) - to manage different stacks
In the case of a conference with a yearly edition or so, we can iterate over the software stack and change accordingly to our needs without having to upgrade legacy editions nor to keep continuing them because we feel obliged to. - to provide a transparent experience to our users
We can host content in different places and still provide a coherent experience to a user without having them to feel the spread of our infrastructure. - to upgrade individual components
One by one rather than the entire stacks. Which makes the life easier in term of Q&A scope. Obviously, we fall in the microservices trap so the more individual projects we have, the more scattered our attention and efforts can be. - to run Docker containers or web applications
We can prevent to expose them directly on the port 80 or 443 – although this is not the point of this article as we are rather focused on proxying external content.
It is a good way to hide complex and purposeful components under a same and apparently unique domain. This is for example how websites like the BBC feel like one website whereas they are in reality composed of dozens and dozens of different websites developed by independent teams.
§How does it work?
An HTTP request directed to our hosting provider will usually look like the following examples:
1 |
|
By default we assume the folders /cheese
and /doc
are contained in the same directory as the root of the website.
Let’s say we actually have decided to opt in for a whitelabeled content provider for a part of the website and moved another part of it to a static website hosted on GitHub Pages. The above example would evolve into:
1 |
|
It should be clear enough so let’s dive a bit more in how to achieve this.
§Configuring ProxyPass
The configuration of a reverse mainly relies on a declaration of Apache ProxyPass
for each path (and its descendants) we would like to host elsewhere:
1 |
|
And that’s pretty much it!
§Configuring Proxy directives
I found Apache ProxyPass
documentation to be quite clear actually (or maybe I spent too much time reading it). We can manage to exclude folders from the proxying or match only specific patterns with ProxyPassMatch
. I guess all we need is a use case before starting to use them 😊.
§ProxyPreserveHost
This setting has an influence on how our VirtualHost proxy server will advertise the Host
HTTP header to the client.
With ProxyPreserveHost On
:
1 |
|
With ProxyPreserveHost Off
:
1 |
|
So in general we will want to have it set to Off
, especially in the case of web browsers and relative link computation.
§ProxyPassReverse
This reverse directive indicates Apache how to treat location headers emitted by the backend of the proxy.
In other words, if the backend emits some headers like Location
and Content-Location
, our proxy will rewrite them to match our VirtualHost.
Without ProxyPassReverse
:
1 |
|
With ProxyPassReverse
:
1 |
|
§ProxyPassReverseCookieDomain
This is exactly the same principle as ProxyPassReverse
but to rewrite the hostnames contained in any Cookie
header emitted by the backend.
§SSLProxyEngine
This one will enable the proxy module to deal with signed requests. We could definitely have an HTTP to HTTPS or, better, HTTPS to HTTP – to secure insecure parts of our website. Or to secure them… with a different SSL certificate.
And that’s precisely one advantage to use a reverse proxy in front of GitHub Pages to use our custom domain and our own certificate.
§Reverse Proxy over HTTPS
GitHub serves every GitHub Pages websites over HTTPS if they have been created after June 15th 2016. So we will have to make sure both our server can talk over SSL with GitHub by enabling mod_ssl
.
1 |
|
As an alternative, we can also run the following to globally enable mod_ssl
:
1 |
|
By doing so, we do not need to write the LoadModule
line.
If we cannot enable mod_ssl
, well we are screwed so best is to raise a support ticket to our hosting service provider if there is a way to enable it.
§Conclusion
Is the reverse proxy technique limited to Apache httpd? Of course not:
- Nginx has proxy_pass and a good reverse proxy tutorial;
- Varnish is more powerful but slightly harder to dive into its documentation;
- Node.js express-http-proxy package will help mount proxy routes in our Express application.
I personally use it to proxy authentication at the app level and query internal APIs.