I've always considered the reverse proxy capability of Apache httpd as one of the real (hidden) gems of the web server. Of course, httpd has a lot of gems: multiple MPMs; a plethora of content creation and rewriting capabilities; dynamic loadable modules; performance and concurrency easily matching its peers; in-depth Lua, Perl and PHP support; and, of course, the vast number of external, 3rd party modules out there. But, for me, the reverse proxy has always been one of its crowning achievements.
So even though Apache httpd excels at delivering both static and dynamic content, an extremely common use-case is for httpd to be used as a "simple" reverse proxy (aka: "gateway"). In this scenario, httpd acts basically like a switch, accepting requests but handing those requests off to servers on the backend. Those backend servers ("origin servers") are where the content really lives, or is created, but the outside never sees them; never even knows they exists. As far as the Web is concerned, that front end gateway is sole server.
The advantages of this setup are numerous and obvious; The implementation provides for high-availability, load balancing, failover, reliability, etc... but only if the gateway web server, the reverse proxy itself, has that capability. Fortunately, Apache httpd does. It has all that and more.
Because it is such a common use-case, and because this capability to so vital to the design and architecture of enterprise web infrastructure, including Cloud setups, I've focused a lot of adding features and improvements to httpd's reverse proxy. Along with the other committers on the httpd project, not only has load balancing been added (and has been for quite awhile), but there are various load balancing methods included, with the ability to add your own implementation very easily. With the balancer-manager, the devops admin gets not only a view into the current state of the reverse proxy, but can also dynamically change various reverse proxy parameters on-the-fly, with these changes surviving a server restart. The reverse proxy supports not only HTTP, but also FastCGI, Websockets, AJP and others. And just recently, I finished work on something that has been on my TODO list for awhile, and something people have wanted for awhile as well: Dynamic Health Checking.
In the normal situations, before httpd sends a request to the backend origin server, it checks to see if it is still "alive" and able to handle the request. Now this is great but it would be even better if, in parallel, httpd was also checking if those backend servers were alive or not independent of requests being passed to them. In other words, not only static health checks but also dynamic checks as well.
Well, now Apache httpd can do exactly that.
Right now this capability exists just on the trunk branch of the server, but I anticipate it being fast-tracked backported to the 2.4.x branch. There are also some addition features I'd like to add in, such as better interaction with the balancer-manager before it is backported. But before too long, the Apache httpd reverse proxy will have this capability and be even better than it is now, and continuing to be even better than its peers, whether they are Open Core or commericial or truly Open Source.
Try it out! And if you are interested in helping develop Apache httpd, jump in and join the fun. Unlike other web server projects, contributor and commit privs can obtained by anyone, not just specially picked people, like "employees" or stuff.
OK, so my house is basically the official address for the Apache Software Foundation. If you check out the address for the ASF on various documents, as well as our D&B listing, it'll refer back to my old homestead. This is because back when we founded the ASF, out of the Apache Group, we needed to list some address on our records, and since I had basically just moved, my address seemed the most stable. And then, after serving initially as EVP and Secretary, those roles kind of cemented my address as the canon location for the ASF.
Over the years this has resulted in a variety of interesting situations, such as random people coming up to ask for job openings, taking pictures of "Apache HQ" and even numerous visits by the FBI as they investigate various web-related issues. But this also means that I receive a lot of postal mail directed to the ASF as well, and these are just as interesting. There are the numerous requests for help, letters demanding to know why this "Apache stuff" is on their devices and demands that we stop installing our "junk" on their computers, and the occasional "Thank You" letter from someone truly appreciative and what we do.
And very rarely, a letter which calls me to action.
I received such a letter this week.
The letter was from Jamal Reid, an inmate at the State Correctional Institution at Dallas in Pennsylvania. While serving his time, he is trying to learn programming and the selection of books available to him are woefully out of date. He sees programming, and career as an Application Developer, as his ticket out from a criminal life. He wants to learn and he wants to be a viable candidate, with the necessary skills, for a job in programming once he completes his sentence. All he asks is that, if possible, if we could "donate a few books" to help.
Well, I already have a box of books ready to ship out. But I am reaching out to everyone who follows me, or also believes in the power of community within the Open Source eco-system, to also donate what books you can. During this time of year, one of the most cherished stories is A Christmas Carol, a tale of redemption. Let's share the potential of that redemption.
His address is in the letter linked to above, and I strongly encourage you to read it. I also include it here to make it even easier.
State Correctional Institution
1000 Follies Rd
Dallas, PA 18612
This may come off as "sour grapes", or being defensive, but that's not that case. Instead, it's to show how, even now, when we should know better, technological decisions are sometimes still based on FUD or misinformation, and sometimes even due to "viral" marketing. Instead of making a choice based on "what is best for us", decisions are being made on "what is everybody else doing" or, even worse, "what's the new hotness." Sometimes, being the Old One ain't easy, since you are seen as past your prime.
Last week, I presented at ApacheCon NA 2015, and one of my sessions was "What's New In Apache httpd 2.4." The title itself was somewhat ironic, since 2.4 has been out for a coupla years already. But in the presentation, I address this up front: the reason why the talk was (and is) still needed is that people aren't even looking at httpd anymore. And they really should be.
Now don't get me wrong, this isn't about "market share" or anything as trivial as that. Although we want as many people to use Apache projects as possible, we are much more focused about creating quality software, and not so much about being a "leader" in that software space. And we are well aware that there are use cases where httpd is not applicable, or the best solution, and that's great too.
First of all, there is still this persistent claim that httpd is slow... of course, part of the question is "slow how?" If you are concerned about request/response time, then httpd allows you to optimize for that, and provides the Prefork MPM. Sure, this MPM is more heavyweight, but it also provides the lowest latency and fastest transaction times possible. If instead you are concerned about concurrency, then the Event MPM is for you. The Event MPM in 2.4 has performance equal to other performance-focused Web Servers, such as Apache Traffic Server and NGINX. Even so, whenever you hear people talk about httpd, the first thing they will say is that "Apache is slow, where-as 'foo' was built for speed."
There is also the claim that httpd is too feature-full (or bloated)... Well, I guess we could say "guilty as charged." One of the design goals of httpd is to be a performant, general web-server. So it includes lots of features that one would want, or need, at the web-server layer. So yes, it includes caching, and proxy capability, and in-line content filtering, and authentication/authorization, and TLS/SSL (frontend and reverse-proxy), and in-server language support, etc... But if you don't need any of these capabilities, you simply don't load the modules in; the module design allows you to pick and choose what capabilities you do, or don't want, which means that your httpd instance is specific to your needs. If you want a web-server with all the bells and whistles, great. But if you want just a barebones, fast reverse proxy, you can have that too. Of course, I won't go into the irony of httpd being "blasted" for being bloated, while the hotness-of-the-day is praised for adding features that httpd has had for years. *grin*
Finally, we get to the main point: That Apache httpd is old, after all, we just celebrated our 20th anniversary; httpd is "old and busted", Foo is "new hotness"(*). Well, again, guilty as charged. The Apache httpd project is old, but httpd itself isn't. It is designed for the needs of today's, and tomorrow's, web: Async/event-driven design (if required), dynamic reverse-proxying with advanced load-balancing (and end-to-end TLS/SSL support), run-time dynamic configuration, LUA support (module development and runtime), etc...
You know what else is old? Linux.
Makes you think, huh?
The reverse proxy feature of Apache httpd is an area which I like to particularly hack on. It's always been, imo, one of the killer features of httpd, but it is especially useful in the Cloud. Being able to dynamically allocate, enable/disable and load balance between back-ends is a MAJOR capability lacking in almost all other "generic" (and even "special-purpose") web servers. And the functionality of the reverse proxy in Apache 2.4 is pretty much state-of-the-art.
But there's a problem, which is not unique to Apache itself. In general, load balancers such as Apache do their load balancing determination based on values it calculates on the back-ends. Sure, Apache has a pretty nice set of load-balancing algorithms, but it still implies that the front-end (the reverse proxy) is responsible for predicting the load on the back-end. This is certainly not optimal.
Red Hat's mod_cluster works around this by creating a "unique" communication channel between Apache and JBoss, which allows JBoss to tell Apache some basic and useful loading parameters on the JBoss server itself. This is great, but limited (and hardly universal). What we really need, imo, is a general, universally agreed-upon method of the back-end sending server load data to the front-end. Something that all back-ends can easily generate (and enable/disable, of course) and something all reverse-proxies can easily parse and use. In other words, we need some sort of unofficial standard (which could eventually become a de-facto standard).
Using the assumption that starting simple is best, what I've been looking at is adding a simple HTTP Response header to Apache (via mod_header) which returns some subset of load parameters, the idea being that when the back-end sends the actual response, part of the meta-data it returns is some measure of its health and/or load. The front-end can then tuck that data away and use it in its load-balancing determination (while, of course, ensuring that the header is not forwarded to the client). Currently, in the trunk version of httpd (but hoping to backport it to 2.4), I have traditional Unix-type load-average and the percentage of how "idle" and "busy" the web-server is. But is that enough info? Or is that too much? How much data should the front-end want or need? Maybe a single agreed-upon value (ala "load average") is best... maybe not. These are the kinds of questions to answer.
Knowing that Apache httpd powers between 57% and 89% of the web (based on which surveys you use and/or trust), it seems logical to discuss this, and work out the particulars, on the Apache httpd development mailing list. I think the time for something like this "universal web-server load" value is long overdue. If interested, join the fun on email@example.com.
There are some topics which are just too expansive for a simple tweet. This is one of them.
Lately, there have been quite a few posts extolling the assumed decrease in the viability and "reason" for the Apache Software Foundation. It's always fashionable to lump all FOSS foundations, and related entities (such as Github), into one combined group and pick out the "winners" and "losers" and those whose stars are rising and those whose glory days are fading away. With the hubbub around DVCS and git/Github, people look at the ASF, and our measured approach to incorporating git into our workflow policies, and declare that since we have not drank the Kool-Aid, the ASF's days are done.
But all this misses the point about what the ASF is, and who we are, and why we are. I hope this blog post will clarify some things.
Page 1 of 8, totaling 36 entries