Saturday, August 2, 2014

Modify web content with Apache's mod_substitute and mod_headers

http://www.openlogic.com/wazi/bid/351267/modify-web-content-with-apaches-mod_substitute-and-mod_headers

Ever heard of mod_substitute or mod_headers? These two Apache modules give you additional control over the content Apache serves. They can be useful in creating a staging environment, fixing unsupported web applications, or just adding custom HTTP headers for troubleshooting and monitoring.

Modifying content with mod_substitute

Mod_substitute allows you to modify the web content Apache serves to clients after all web code has been executed and all other Apache directives have been processed. It lets you replace strings without touching the web code. It works on both content coming from Apache (such as static pages and server-side scripts) and forwarded content in cases when Apache acts as proxy.
Mod_substitute is part of the default Apache installations in most Linux distributions, including CentOS and Ubuntu. In CentOS mod_substitute is enabled by default, but in Ubuntu you have to enable it with the command a2enmod substitute. You can confirm mod_substitute is installed on your server with the command apachectl -t -D DUMP_MODULES |grep substitute_module. The command output should include the name of the module if mod_susbstitute is installed.
Mod_substitute can be used per location context within a given Apache instance. This means that you can apply its rules either to a whole site (Location /) or recursively for a directory and its subdirectories (Location /somedirectory). You can add mod_substitute directives either to the global Apache context in the main /etc/httpd/conf.d/httpd.conf file or to a specific virtual host.
Here is an example of a mod_substitute directive that changes a URL for a production site from www.example.org to that of a staging site at test.example.org. You can use this to create a staging environment:

AddOutputFilterByType SUBSTITUTE text/html
Substitute s/www\.example\.org/test.example.org/i

The first directive AddOutputFilterByType SUBSTITUTE text/html creates an output filter for the HTML part of the web content. The Substitute directive uses a regular expression to search for a string (wwww\.example\.org) and replace it with the a different string (test.example.org). The i flag indicates a case-insensitive search. Another flag you might find useful, n, defines the second argument as a fixed string instead of a regular expression.
Mod_substitute can replace text, links, and even HTTP headers. Being able to replace all of these items is useful if you wish to have a staging site for an application like WordPress or Joomla that is configured by your production FQDN (example.org). If an application is configured to work at one FQDN, it often will not work properly when accessed under another, with broken links or images that won't load because of the different FQDN. Mod_substitute resolves this problem.
Substituting HTTP headers works as described only with Apache 2.2, which is still widely considered the most stable and production-ready version. In Apache 2.4, the code above will not make a substitution in the HTTP headers because of core functionality changes in the Apache web server software. If you want to make substitutions in the headers you have to use mod_headers.

Modifying HTTP headers with mod_headers

Having the ability to modify HTTP headers allows you to control HTTP parameters such as redirects and create custom HTTP headers. Mod_headers saves you from having to reconfigure header information in unsupported web application or in staging environments. It also lets you add custom headers or remove unwanted ones. Custom headers can be useful in many situations – for instance, if you have a multinode balanced Apache environment and you wish to identify which node serves each request.
Like mod_substitute, mod_headers comes with the default Apache installations in most Linux distributions. While it's enabled by default in CentOS, you have to enable it in Ubuntu with the command a2enmod headers.
As previously mentioned, since Apache version 2.4 you can no longer manage the HTTP headers with mod_substitute – you should use mod_headers. However, even though mod_headers is in Apache 2.2, for changing headers such as redirects it's better to use mod_substitute.
To change HTTP redirects from www.example.org to test.example.org with mod_headers, use the following directive in either the global Apache configuration or in a vhost context:
Header edit Location ^http://www\.example\.org http://test.example.org
To test whether this directive worked, you could create a PHP file with the following content:

When you access this file with your browser, you should be redirected
 to http://test.example.org instead of http://www.example.org if your 
mod_headers rule works correctly in Apache 2.4. For Apache 2.2. you can 
should instead use mod_substitute.
With mod_headers you can also remove headers or add custom ones. This can be of use, for example, when you have several load-balanced web servers and for troubleshooting reasons you wish to identify which one is handling your request. You can do that by adding the directive Header add Node Node1 to add a custom HTTP header called Node and identify a given server as Node1. You would set the Node value to Node2 for the second balanced web server, and so on.
To verify that these header settings work as intended, you need a browser plugin that traces HTTP header information, such as HTTP Headers for Google Chrome. Alternatively, from the Linux command line, you can use lynx and its -head option to view all headers for a page: lynx -head -dump http://example.org.
Mod_headers can also work with dynamic variables, which means that, for example, you can add a header for reporting how much time it takes Apache to server a request. To do that, use the directive Header set Loadtime "%D", which creates a new header Loadtime and reports the time in milliseconds. You could use this header to monitor the performance of the web server by extending Nagios for custom monitoring.
However, not all headers can be modified with mod_headers. For example, the Server header, which specifies the HTTP server name (Apache) and its version, cannot. If you want to modify those headers, you should instead use ModSecurity and its SecServerSignature setting, as described in the Wazi article on how to protect and audit your web server with ModSecurity.
As you can see, mod_substitute and mod_headers are simple to use but powerful and extremely useful. Excellent modules such as these are among the reasons Apache continues to be the preferred web server.

No comments:

Post a Comment