Monday, May 7, 2012

Making Apache and Tomcat Work Together

http://olex.openlogic.com/wazi/2012/making-apache-and-tomcat-work-together


Last year I was approached by a Swiss customer looking for help. Starting with an already working Tomcat application, the company wanted to configure the Apache HTTP Server in front of it as reverse proxy, virtual host broker, and URI translator.
Why use Apache as a front-end server? It might seem more convenient to keep things simple, letting a back-end server like Tomcat serve clients directly – and in some cases it is, depending on the type of back-end server and the specific requirements of the project.
If your main server software is secure and fast enough for your environment and provides all the features you need, you probably don’t need a proxy layer. It usually doesn’t make sense to run Apache in front of lighttpd, for example, especially considering that a layer of proxying increases the performance and maintenance load.
Yet in a lot of cases you will be running a specialized back-end server to provide functionality that Apache isn’t capable of. You might deploy Tomcat, for example, so you can run Java Servlets. In these cases it is often beneficial to pay the performance and complexity costs of adding Apache to the mix.
Apache is a secure, flexible, and fast general-purpose HTTP server. It comes with a huge variety of modules that provide functionality for all kinds of special purposes, from LDAP authentication to request compression. Its configuration is straightforward, using the traditional near-flat Unix configuration syntax instead of XML. Specialized back-end servers are usually slower than Apache when it comes to serving static files such as images or office software files.

Choosing a Communication Protocol

There are several ways to make Apache and Tomcat talk to each other, of which three are most popular:
  • mod_proxy is the most basic way to make Apache pass on requests to a back-end server and relay back its responses to the client. The latter process is called reverse proxying. It works not only with Tomcat but with every back-end server that supports the HTTP protocol.
  • mod_proxy_ajp – The Apache JServ Protocol (AJP) is a simple binary packet format that offers greater speed compared to plain HTTP(S) back-end communication. mod_proxy_ajp adds support for the AJP protocol to mod_proxy. It is part of Apache’s default distribution.
  • mod_jk is another way to make Apache talk to a Servlet back-end server, and the recommended way to make Apache talk to Tomcat, but
    it is more complex to configure than the other two due to its flexibility. mod_jk is maintained by the Tomcat community.
Obsolete modules that are no longer maintained and thus not recommended for Apache-Tomcat communication include mod_webapp (also called warp), mod_jserv, and jk2.
One good way to proceed is to set up everything using mod_proxy first; configuration is simple and all communication is clear-text HTTP. If you need additional speed later, you can add mod_proxy_ajp to the mix within minutes. If you want to take advantage of mod_jk’s performance, stability, and flexibility, you can invest more time to install and configure it properly.

Basic Setup

Let’s assume that Tomcat is running on local port 8080, serving our application and accessed directly. Apache is running on port 80 with all modules in the standard distribution available, including mod_proxy and mod_rewrite.
As long as you stay with standard HTTP proxying using mod_proxy and don’t switch to AJP, any back-end server is fine for initial testing of the
proxying chain. But at some point you need to switch to your actual Tomcat setup, since it will have its own requirements that you need to cater to using proxying and rewrite rules.
With that in mind, the basic starting point for our two-way proxying calls for configuration settings that look like this:
ProxyPass / http://localhost:8080/
ProxyPassReverse / http://localhost:8080/
These configuration line examples, and the ones that follow, are snippets that need to go into your Apache’s global server context or virtual host section. The first line above relays all client requests to / and below to a back-end server running on port 8080 using HTTP. The second line provides the same functionality for the reverse direction. Try this first to make sure that everything works as intended on a basic level.

Path Translation and Exceptions

Let’s add a common requirement. We’re still going to serve our application to the client at the top level /, but the corresponding
back-end path will be /application/. Servlet applications running under Tomcat are often set up in this way to provide multiple applications in one
server instance. Our setup now looks like this:
ProxyPass / http://localhost:8080/application/
ProxyPassReverse / http://localhost:8080/application/
This seemingly small change has larger implications than it might seem at first glance, because your back-end application is not aware of this translation and still hands out its usual paths. In some cases the application can be adjusted to match the new paths; often this isn’t possible, or changing all paths in CSS and JS files would be too cumbersome and error-prone.
In this case we need to find all the spots that need additional proxying rules. Broken images or CSS files show up plainly enough, and Firebug or similar tools can help you spot broken script files. Other issues may come up when AJAX requests are made to a non-existent location, so it’s important to do a lot of testing and verification.
For each spot that needs additional proxying rules you can either add a rule for mod_rewrite to rewrite it or add mod_proxy directives. In our particular project we decided to use the latter approach to keep special proxy mappings apart from other rewrite rules.
Let’s presume that our back-end application refers to some static image files served by a second application running in Tomcat at the location /application2/images/, and another set of static JavaScript files at /application2/js/. We don’t want to expose the whole tree under /application2/ for security reasons, so we use tighter mappings as follows:
ProxyPass /application2/images http://localhost:8080/application2/images
ProxyPassReverse /application2/images http://localhost:8080/application2/images
ProxyPass /application2/js http://localhost:8080/application2/js
ProxyPassReverse /application2/js http://localhost:8080/application2/js
The first matching proxying rule terminates the mod_proxy decision process, so we have to add those special case lines in front of the previous directive block.
Proxying exceptions provide another way to influence the proxy decision process. Suppose that our Apache instance serves a couple of static PDFs at /productpresentation/; we don’t need or want to relay those requests to Tomcat. A special form of the ProxyPass directive lets us specify locations that must not be proxied:
ProxyPass /productpresentation !
Again, this needs to go in front of our other rules.

Cookie Support

Chances are that your application uses cookies to track client sessions across requests. Cookies include a hostname and a path to let the client know when to send the cookie back as part of a request. Due to our path rewrites, however, the cookie paths are no longer valid and need to be rewritten as well. Your application may act in funny ways, such as keeping you stuck on one page, if its cookies are not being sent back, especially if user login tracking is involved.
You can determine the cookies sent by your back-end application in a variety of ways; browser inspection tools or cookie-specific extensions let you see them, but a cURL request will do fine as well.
The cookie path is set by the application. Once more we might not be able to change the back-end application’s behavior, but mod_proxy can be instructed to rewrite these paths as well so the client will send the cookie later as desired:
ProxyPassReverseCookiePath /application/ /
The first argument is a match specification for a cookie path that has to be rewritten. The second argument specifies what the result needs to look like after rewriting.
You might also have issues with the cookie domain. A similar directive exists to adjust this, called ProxyPassReverseCookieDomain. Suppose your Apache instance takes requests for virtual host publicvhost.com, but your backend application thinks its hostname is backendhostname.local. The syntax would be:
ProxyPassReverseCookieDomain publicvhost.com backendhostname.local

Error Handling

Another common requirement is a unified look among error pages that doesn’t give away the back-end server software’s name and version. To accomplish this we use the ProxyErrorOverride directive, telling Apache to let its own error-handling mechanism take over:
ProxyErrorOverride Off
With this directive we can use the full range of error handling directives. A most basic example for quick testing:
ErrorDocument 404 "Nothing here!"

Enter mod_rewrite

We used a couple of mod_rewrite rules in the project I was working on. Those rules tend to be highly project-specific, so we won’t go through all of them. The most important thing to know is how mod_rewrite interacts with Tomcat proxying.
From the detailed flow diagram of Apache rewrite processing you can see that rewrite rules are applied early in the request handling process. Contrary to the order of proxying or rewrite rules among themselves, it doesn’t matter where we place the proxying and rewrite blocks in relation to each other.
Let’s say we have a rewrite map generated by some other process that needs to be applied. We can implement this using the following rewrite directive block anywhere in the section where our proxying rules reside:
   RewriteEngine On
   RewriteMap lcmap int:tolower
   RewriteMap shorturls txt:/var/www/rewrite_map.txt
   RewriteRule ^/([^/]+)$ ${shorturls:/$1} [R,L]
What we discussed here barely scratches the surface of what you can do with Apache as a reverse proxy for Tomcat, but we did cover the most important building blocks to start with, which may save you precious hours of initial setup. Once you have this setup running, you can spend more time on the gritty bits of your project-specific setup, such as load balancing or complex address rewriting and redirection.

No comments:

Post a Comment