Saturday, February 16, 2013

Using Node.js to write RESTful web services

http://www.openlogic.com/wazi/bid/267042/using-nodejs-to-write-restful-web-services


If you want to build lightweight, fast, REST APIs, or provide server support for AJAX-intense Rich Internet Applications such as Gmail, or develop scalable systems that can support thousands of concurrent users with ease, give Node.js a try. Node, as its developers often call it for short, is a server-side, JavaScript-based, web server based on Google's high-performance V8 engine, geared to receiving and answering HTTP requests. You do all your Node coding in JavaScript, meaning that if you are already proficient with that language for client-side programming, you'll feel at home using it for your server-side development. Of course, Node goes beyond client-side JavaScript capabilities; it also provides database access, file access, process forking, and everything necessary for web serving duties.
Node is a single-thread, event-driven server; requests are handled one at a time, by a single common process, but whatever system calls you do run in background, in parallel, firing events when done, so they don't block the server's ability to process other different requests. This is similar to the nginx server model, and different from, say, Apache's, which is a multiple-thread server; each request is serviced by a different process. With multiple processes it doesn't matter if a thread takes a long time, because it won't block other processes, but the high RAM requirements of each process restricts the maximum number of possible connections. In Node, you have to be careful and write code that will run asynchronously, so that while a request is being handled, other requests can come in. With this approach, Node can handle thousands of simultaneous connections without conmensurate RAM requirements, because the waiting events require little extra memory.

Our goals

One of the preferred use cases for Node is web services, so we'll revisit the sample application from my Apache and RESTful Web Services article, but instead of working with Apache, PHP, and XML, we'll go with Node, JavaScript, and JSON (JavaScript Object Notation). We'll provide the same countries/regions/cities web services, but we'll serve JSON instead of XML – though you can certainly produce XML output; see the "Enhancements" sidebar. We'll also point out some advantages and disadvantages of working this way, so you can make an informed decision about whether you want to use Node for your own work.
Callbacks, continuations, and closures
In Node.js, you create callback functions and pass them as parameters to other functions, so they will get executed when they finish processing. You'll probably have to brush up on closures, especially because you use them to implement Node continuation-passing style. There are many ways of writing continuations; I wrote the countries service in two different ways to show alternatives. Also note the getCountries() method in the countries service, which illustrates another problem: wanting to use the original value of a variable. Closures provide access to the variable itself, so when you want to access a loop variable, you always get its final value. You can use a partial application technique to handle this; check the code to see how.
Before we start, let's look at how Node processes a request:
  1. A client requests a service.
  2. Node receives the request.
  3. Routing code (which you write) analyzes the request and calls an appropriate function.
  4. The function does some work, possibly calling MySQL to get some data.
  5. While MySQL does its thing, in parallel Node services other waiting requests, if present.
  6. When MySQL is done, a callback function is inserted in the Node queue, to be processed when possible.
  7. At some time in the future, Node executes that callback, which does some extra work and sends results back to the client.
It is said that in Node, "everything runs in parallel except your code." Most API calls are asynchronous, meaning that when you call an API, you provide a callback so when the routine is done, it will call your code to continue processing. Node isn't really well suited for CPU-intensive chores (though there are some coding workarounds you could use) because they would block all other requests from being processed; a "DIY-DOS" (Do It Yourself Denial of Service) attack! Finally, you have to work at a lower level than with, say, Apache and PHP; you must deal with HTTP on your own.

Programming the server

To get started with Node, we first have to decide how to organize the project. Node provides modules as a convenient way of packaging code in independent parcels. Our router_rest.js module contains the main code of our server, routing calls to the countries_service.js, regions_service.js, and cities_service.js modules. In addition, we have an auxiliar.js module with functions common to all services. You should check out all the modules from my GitHub repository and examine them as we go along.
The main server code resembles the following pseudocode:
// Define variables to access each service. Note the usage of
// node.js modules via the "require(...)" calls.

var services = {
  countries: require("./countries_service"),
  regions: require("./regions_service"),
  cities: require("./cities_service")
}

function sendResults(res, params, status, statusText, headers, result) {
  // Send a HTTP response, with the given "status" code and a
  // "statusText" explanation, a "Connection:close" header so the
  // connection won't be kept alive, a "Content-Type:application/json"
  // header to specify the result type, some optional additional headers,
  // and a result (if any) in JSON or JSONP format.
}

function routeCall(req, res, body) {
  // Analyze the URL and the request body (if present) and the
  // pathname itself to get query parameters. Check whether the result
  // will be in JSON or JSONP format by seeing if a "callback" parameter
  // is given.
  //
  // If a "_method" parameter was included, let it override the
  // actual HTTP method.
  //
  // Analyze the URL to decide what service to call, and if
  // present in the "services" array above, dispatch the call to
  // it, with a callback pointing to the "sendResults" function;
  // otherwise, send back a 404 error.
}

process.on('uncaughtException', function(err) {
  // Provide for unexpected exceptions, so the server won't crash
  // Usually, report the error.
})

require("http").createServer(function (req, res) {
  // This is the main function, that actually acts as a server
  // We'll listen on port 8888, just not to interfere with Apache
  //
  // For PUT/POST methods, wait until the complete request body
  // has been read, and then call routeCall.
  // For GET/DELETE methods, call routeCall directly
  }
}).listen(8888)
Getting the PUT/POST body requires coding that PHP programmers, used to $_POST and $_REQUEST variables, won't ever have written. You have to provide a callback for the "data" event, which fires every time the server gets an extra part of the HTTP request body, and another callback for the "end" event, which fires when you have all the body contents and can move on to routing the service request. For bodyless GET/DELETE methods, you can directly route the request. The code below summarizes this:
if (req.method==="PUT" || req.method==="POST") {
  var body = "";
  req.on("data", function(data){ body += data; });
  req.on("end", function(){ return routeCall(req, res, body); });
} else
  routeCall(req, res, "");
Sending results is just a matter of using the writeHead() and end() methods; the actual code also provides for extra headers, and for outputting JSONP, but let's leave those details out, as they don't add much.
headers = {};
headers["Connection"] = "close";
headers["Content-Type"] = "application/json";
res.writeHead(status, statusText, headers);
res.end(JSON.stringify(result));
Finally, error handling is a must; without it, any error would crash your server with all its connections. A domain module is in development to ease error handling, but it isn't stable yet. When you encounter an error, at a minimum you should provide some logging. Even though the client won't get any answer (and will eventually get a timeout error) at least the server will keep running.

Programming the services

Each service will have three functions: one for GET calls, another for DELETE, and a third for both PUT and POST requests, which have almost the same logic. All the services are actually similar; fire off some MySQL statements, then process the results. You access MySQL tables in continuation style to avoid blocking other processes; you specify the statement to be executed, and you provide a callback that is to be executed when the result of the query is ready and your processing can continue. For example, to get all countries matching some conditions, I wrote something like the following:
aux.conn.query("SELECT * FROM countries WHERE " + someCondition,
  function (err, rows) {
    if (err)
      // there was an error; report it to the client

    if (rows.length == 0) // no data found?
      return callback(404, "NOT FOUND");

    var result = [];
    for (var i in rows)
      // build up result[i] with data from rows[i]

    return callback(200, "OK", {}, result);
  });
Enhancements
You can achieve a lot with plain Node.js, but many available modules can help you write more concise code. For example, you might build your XML output string by string, but you're probably better off with a module such as xmlbuilder. Other modules, such as xml2js and xml-stream, can help accept XML input.
You can also simplify routing tasks with modules such as restify, which is more targeted to REST web services than the more commonly known express module, or connect, upon which express is built, or journey.
In the area of security, many modules can handle authentication, such as Passport, everyauth, and connect-auth. Cryptography is available with the Crypto module, standard with Node.
For scalability, cluster provides a way to work with multicore machines, with several features such as graceful or hard shutdowns and restarting the server, and it automatically spawns one worker per CPU for optimal performance. Note, however, that this module is still marked experimental, so plan for possible changes.
MySQL queries return an array with all rows; there also are options to process the results one row at a time. If no countries are found, a 404 status code is appropriate; otherwise, loop through the rows to build up a "result" array to send back to the client. If an error occurs, you have to handle it and send a 500 status code to the client.
Things get more complicated if you want to return all regions of a country; see the source code for that. Basically, you have to execute several MySQL queries, all of which update the "result" object in their corresponding callbacks. In order to know when you are done (remember, callbacks happen asynchronously, in the future, and their order of execution cannot be guaranteed) you have to count how many queries are ready until the count reaches the expected number. See this design pattern in the countries and regions services.
DELETE requests are simple: Try to delete the appropriate records and see if some error is produced. (If your data model doesn't include foreign keys, you have to check other tables first before attempting to delete anything.) If you get no error, the request was successful, so send back a 204 status; otherwise, analyze what happened and provide a reasonable result code and status to the client: either "404 NOT FOUND" or "403 CANNOT DELETE REGIONS WITH CITIES" if foreign keys disallow the request.
PUT and POST requests can be more complex, because they can involve INSERT or UPDATE commands. Also, in some cases the primary key won't be given, and will instead be determined by MySQL; use "result.insertId" to get the created key. If you insert a record, you should return "201 CREATED" plus an extra "Location" header with the RESTful URI of the newly created row. On error, return a 409 status with an appropriate explanation. Actual code is just an INSERT attempt, eventually followed by an UPDATE sentence, so we won't show it here.

In conclusion

Node.js is in active development, and not yet at the 1.0 official level. Browsing the online documentation shows some sections marked as experimental or unstable, implying recently introduced features, and ongoing API changes can make life harder for you. But Node is stable enough to be in use at companies such as Dow Jones, eBay, Hewlett-Packard, LinkedIn, Microsoft, Mozilla, Walmart, and Yahoo! Consider all its pros and cons to decide whether to deploy it. For suitable uses, Node lets you take advantage of your JavaScript experience to easily write RESTful services.

No comments:

Post a Comment