How to fix poor web design and other annoyances by transparently applying XSLT stylesheets to pages you visit using an nginx forward proxy.
TLDR; there once was a long preface here that would explain how tech sucks and that this article is about using nginx and XSLT to cut out crap from your favourite websites. You only need to know the latter. Moving on...
Important note: We're not using nginx as a reverse proxy, we're using it as a "regular" HTTP proxy. "Regular" as in the kind you set in your browser's network options. This is called a "forward proxy". If you google for how to use nginx as a proxy, virtually all hits will tell you how to use it as a reverse proxy. This is different. Now that we've cleared that up, let's get started!
First things first: we need to install nginx. If you're on a sane operating system, you could just use your operating system's package manager. The thing is, as of the time of this writing, nginx's XSLT module can only handle well formed XML as input. Sadly, well formed XML is rather scarce on the net, because web browsers are rather lenient about enforcing XML rules. nginx's XSLT module is based on libxslt, however, which is actually able to handle typical HTML as input as well.
Fortunately, there is a patchset for this. There's talk about it on the nginx mailing list, so with a bit of luck it'll be included in the main release sometime soon. For now, we can just grab the nginx fork with this patch applied. Note that you only need to do this if you care for the XSLT filtering and you'd like to be able to filter HTML and shoddy XHTML. If you know you'll only be filtering proper XML, you can just grab a binary package or the regular distribution tarball.
Assuming you will compile from source, open a terminal, cd to the appropriate directory and use the typical compilation procedure:
$ ./configure --prefix=$HOME --with-http_xslt_module
$ make
$ make install
You may need to use su or sudo for the last step, depending on which --prefix you chose. Nginx will work fine from your home directory, however.
If you chose to use a binary package, it should come with instructions on how to install so follow those.
Open the nginx configuration file in a text editor. If you went with the $HOME directory above, then you should edit the file $HOME/conf/nginx.conf. If you used a binary installer from your distribution, it's probably in /etc/nginx/nginx.conf or similar.
$ vim $HOME/conf/nginx.conf
Substitute vim for your favourite text editor. Also, as above you may need to use sudo if it's installed anywhere but your home directory.
You need to either replace the "normal" http server section or create a new one, depending on whether you'd also like to use nginx as a regular web server. The following snippet, which should go in your http section, will create a normal web proxy on port 8080:
server {
listen 8080;
location / {
resolver 8.8.8.8;
proxy_pass http://$http_host$uri$is_args$args;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}
For a plain http proxy, your configuration file should now look something like this:
worker_processes 1;
events {
worker_connections 1024;
}
http {
include mime.types;
default_type application/octet-stream;
sendfile on;
keepalive_timeout 65;
gzip on;
server {
listen 8080;
location / {
resolver 8.8.8.8;
proxy_pass http://$http_host$uri$is_args$args;
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
}
}
Fairly simple so far, right? Save the file, run nginx and verify that it's working by setting your browser's http proxy to localhost, port 8080, and surfing to, e.g. google.com. You can verify that your browser was using nginx by opening the logs/access.log file, which should list the requests your browser sent. If it doesn't work, open the logs/error.log file and see why. Make sure it's working before heading to the next part.
Now that your shiny new proxy is working, let's get to the fun part! (Queue evil laughter right about here)
To apply your own XSLT stylesheet to pages on your favourite website you proceed very much like you would if you were configuring nginx as a web server and you'd like to add a new virtual host. Basically, all you need to do is define a new server section with a specific host name. I'm going to use xkcd.com as an example here, because it's one of my favourite web comics:
server {
listen 8080;
server_name xkcd.com;
server_name www.xkcd.com;
location /.css {
alias /path/to/local/directory/css;
try_files $uri.css =404;
}
location / {
resolver 8.8.8.8;
proxy_set_header Accept-Encoding "";
proxy_pass http://xkcd.com$uri$is_args$args;
}
location ~ ^/([0-9]+/)?$ {
resolver 8.8.8.8;
proxy_set_header Accept-Encoding "";
proxy_pass http://xkcd.com$uri$is_args$args;
xslt_types application/xhtml+xml text/html;
xslt_stylesheet /path/to/local/directory/xslt/xkcd.com.xslt;
}
}
This new server section goes right next to your previous server section, still in the http section. Make sure the listen port in both match up.
In this example, we're doing two things: first, we inject a /.css directory onto the server, which we serve from a local directory, and second we apply a style sheet to all the URLs that contain comics on xkcd.com. xkcd.com is, fortunately, serving proper xhtml, although it does so with the text/html mime type, which is why we use xslt_types to add text/html to the mime types to filter. If it were serving plain HTML or broken XHTML, we would use the following instruction right before xslt_types:
xslt_html_parser on;
This would require the patched version of nginx we compiled from scratch earlier, and this is why I recommended using that version.
As an example, we could use the following xkcd.com.xslt (in the xslt/ directory):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns="http://www.w3.org/1999/xhtml"
version="1.0"
exclude-result-prefixes="xhtml">
<xsl:output method="xml" version="1.0" encoding="UTF-8"
doctype-public="-//W3C//DTD XHTML 1.1//EN"
doctype-system="http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"
indent="no"
media-type="application/xhtml+xml" />
<xsl:strip-space elements="*" />
<xsl:preserve-space elements="xhtml:pre" />
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="xhtml:form">
</xsl:template>
<xsl:template match="xhtml:div[@id='bottom']">
</xsl:template>
<xsl:template match="xhtml:div[@id='topContainer']">
</xsl:template>
<xsl:template match="xhtml:div[@id='transcript']">
</xsl:template>
<xsl:template match="xhtml:meta[@http-equiv]">
</xsl:template>
<xsl:template match="xhtml:link[@rel='shortcut icon']">
</xsl:template>
<xsl:template match="xhtml:link[@rel='icon']">
</xsl:template>
<xsl:template match="xhtml:link[@rel='apple-touch-icon-precomposed']">
</xsl:template>
<xsl:template match="xhtml:link[@rel='stylesheet']">
<link rel='stylesheet' type='text/css' href='/.css/xkcd.com' />
</xsl:template>
<xsl:template match="xhtml:link[@type='application/atom+xml']">
</xsl:template>
<xsl:template match="xhtml:ul[@class='comicNav'][position()>1]">
</xsl:template>
<xsl:template match="xhtml:br">
</xsl:template>
<xsl:template match="text()[preceding-sibling::xhtml:br]">
</xsl:template>
<xsl:template match="comment()">
</xsl:template>
<xsl:template match="xhtml:script">
</xsl:template>
<xsl:template match="xhtml:div[@id='ctitle']">
<h1><xsl:apply-templates select="node()" /></h1>
</xsl:template>
<xsl:template match="xhtml:div[@id='middleContainer']">
<xsl:apply-templates select="node()" />
</xsl:template>
<xsl:template match="xhtml:a/@accesskey">
</xsl:template>
<xsl:template match="xhtml:ul">
<ul><xsl:apply-templates select="node()" /></ul>
</xsl:template>
<xsl:template match="xhtml:div[@id='comic']">
<xsl:choose>
<xsl:when test="xhtml:noscript">
<p>
<a><xsl:copy-of select="xhtml:noscript/@href"/>
<img><xsl:copy-of select="xhtml:noscript/xhtml:img/@src"/>
<xsl:copy-of select="xhtml:noscript/xhtml:img/@alt"/></img>
</a>
<blockquote>
<p><xsl:value-of select="string(xhtml:noscript/xhtml:img/@title)"/></p>
</blockquote>
</p>
</xsl:when>
<xsl:when test="xhtml:a">
<p>
<a><xsl:copy-of select="xhtml:a/@href"/>
<img><xsl:copy-of select="xhtml:a/xhtml:img/@src"/>
<xsl:copy-of select="xhtml:a/xhtml:img/@alt"/></img>
</a>
<blockquote>
<p><xsl:value-of select="string(xhtml:a/xhtml:img/@title)"/></p>
</blockquote>
</p>
</xsl:when>
<xsl:otherwise>
<p>
<img><xsl:copy-of select="xhtml:img/@src"/>
<xsl:copy-of select="xhtml:img/@alt"/></img>
<blockquote><p>
<xsl:value-of select="string(xhtml:img/@title)"/>
</p></blockquote>
</p>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
... and the following xkcd.com.css (in the css/ directory):
html
{
background: #96A8C8;
font-size:16px;
font-variant:small-caps;
font-family:Lucida,Helvetica,sans-serif;
font-weight:500;
text-decoration: none;
padding: 10px;
}
body
{
width: 780px;
margin: 0 auto;
background:white;
border-style:solid;
border-width:1.5px;
border-color:#071419;
border-radius: 12px;
padding: 10px 0;
text-align: center;
}
ul
{
padding:0;
list-style-type:none;
}
ul li
{
display: inline;
}
ul li a
{
background-color:#6E7B91;
color: #FFF;
border: 1.5px solid #333;
font-size: 16px;
font-weight: 600;
padding: 1.5px 12px;
margin: 0 4px;
text-decoration: none;
border-radius: 3px;
box-shadow: 0 0 5px 0 gray;
}
ul a:hover
{
background-color: #FFF;
color: #6E7B91;
box-shadow: none;
}
Now that you have nginx all set up, just save and restart nginx, then visit xkcd.com. The example style sheet should apply a transformation much like this:
... becomes ...
Now, admittedly, xkcd.com doesn't have any advertisements that would need removing and there is the mobile version of xkcd.com which looks pretty much the same as the output from our XSLT. This is just an example to demonstrate the technique as such. Remember: XSLT is turing complete. That means you can actually perform any transformation you could think of.
You should now be able to apply any custom XSLT style sheet to any website of your liking. Congratulations! Now you can actually do something about it if the webdesigner of a given page you need to use regularly should be taken behind the barn to meet a nice, friendly neighbourbood bullet... *cough*
About the only limitation at this point is that you can't transform https websites because those can't be intercepted. Fortunately, that's not too many of the annoying ones, mwahaha.
I may put up a demonstration web proxy on my server, I'm still undecided about that. Either way, enjoy and spread the news! And if you happen to write some awesome stylesheets for some annoying pages, do tell and I may put 'em up right here.
Last Modified: 2012-06-13T16:03:00Z
Written by Maggie Danger (EffinMaggie on Twitter).