Debug Servlets, or 'HTTP Won; Use It'

I’m speaking at Strata about this stuff in February, so this and a few other posts are by way of preparation for that talk.

I learned this trick at Google, and I’ve used it in every system I’ve helped build since. Expose, on an HTTP server inside of your process, as much useful information as possible. Do this as you go, to make development and debugging easier. Reap rewards in production. In distributed systems, HTTP is doubly-important: you’re constantly jumping between machines, and HTTP is critically easier than SSH’ing over to read a file.

It hardly needs re-iterating, but I’ll re-iterate why HTTP is your friend here: everyone’s got an HTTP client. Everyone understands how HTTP flows through their networks and firewalls. You’re only requiring your users to memorize one URL, something they’re already pretty good at.

The good news is that you have to do very little: you can instrument much of your system essentially for free.
I’m going to go through a bunch of things that you ought to expose, pointing out examples from open source systems that I know well through my work at Cloudera. Though this post is Java-biased, this trick works everywhere. (Hue, which is python-based, does this too.)

What to expose?

Configuration

Hadoop daemons exposes the configuration that they’re using over /conf. Hadoop’s configuration is tricky, because there are defaults in code, defaults in XML files inside of jars, and multiple XML files that are loaded in a specific order. So, Hadoop’s config servlet not only tells you what the runtime values are, but where they came from. Believe you me, I’ve won several bets with this.

"Example conf servlet screenshot"

Logs

The following snippet will pick out your log4j file and return it as a string. (Embedding it into your HTTP server is left as an exercise.) Hadoop does this by exposing its logs directory with a static servlet, which works too. You can get at logs manually, of course, but they’re easy to get to in the browser.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import java.io.IOException;
import java.io.File;
import java.io.FileInputStream;
import java.util.Enumeration;
import org.apache.log4j.Logger;
import com.google.common.io.ByteStreams;

  public String getTailOfServerLog() throws IOException {
    Enumeration<?> e = Logger.getRootLogger().getAllAppenders();
    while (e.hasMoreElements()) {
      Object a = e.nextElement();
      if (a instanceof FileAppender) {
        FileAppender fa = (FileAppender) a;
        String filename = fa.getFile();
        File file = new File(filename);
        long length = file.length();
        FileInputStream is = new FileInputStream(file);
        String result = "";
        try {
          is.skip(Math.max(0, length - 50 * 1024));
          result = new String(ByteStreams.toByteArray(is), "UTF-8");
          return result;
        } finally {
          is.close();
        }
      }
    }
    return "No logs found.";
  }

Log Configuration

Changing the log levels dynamically in log4j turns out to be possible. Embed org.apache.hadoop.log.LogLevel$Servlet. LogLevelServlet in your code. (I’ve seen slightly better, with drop-downs for the available loggers, but Hadoop doesn’t.) You can also do this via JMX, but the easiest way to do that is visualvm, and then you need to set up X11 forwarding, and at this point it was easier to just restart the server.

JMX

The JVM exposes tons of information, including GC metadata, classpaths, operating system process data, and more. If you visit /jmx on a Hadoop daemon, it’ll dump out the JMX data as JSON, via JmxJsonServlet. Here’s how it looks:

"Example jmx json servlet"

The traditional way of exposing JMX is to set some JVM flags, and then connecting remotely with jconsole, jvisualvm (with the MBeans plugin), or jmxsh, jmxterm, cmdline-jmxclient, twiddle, etc.

1
2
3
 -Dcom.sun.management.jmxremote.port = 9999
 -Dcom.sun.management.jmxremote.authenticate = false
 -Dcom.sun.management.jmxremote.ssl = false

Most of these tools require you to configure JMX up front.

Full-fledged JMX lets you not just read instrumentation information about the JVM, but also change it. That’s really poor form: it’s one thing to expose counters, and it’s another entirely to allow various write operations. JmxJsonServlet exposes the good bits.

Look for a future post on how to get at the JMX metrics from the command line, without having set up either JmxJsonServlet or the JMX JVM parameters ahead of time.

If you want to expose some JMX metrics yourself, here’s a small example. In typically verbose Java fashion, you need both the interface and the class. Be careful not to take locks in your JMX code. It’s not fun when your metrics stop working in a deadlock.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import java.lang.management.ManagementFactory;
import javax.management.MBeanServer;
import javax.management.ObjectName;

/**
 * Simple JMX Example.
 *
 * Looks something like this when eventually converted to JSON:
 *  {
 *    "name" : "demo:type=DemoMXBean",
 *    "modelerType" : "jmx$Demo",
 *    "One" : 1
 *  }
*/
public class jmx {
  public interface DemoMXBean {
    int getOne();
  }
  public static class Demo implements DemoMXBean {
    @Override
    public int getOne() {
      return 1;
    }
  }
  public static void main(String[] args) throws Exception {
    MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
    if (mbs == null) {
      System.err.println("Failed to find MBeanServer");
      return;
    }
    mbs.registerMBean(new Demo(), new ObjectName("demo:type=DemoMXBean"));
    while (true) {
      Thread.sleep(1000);
    }
  }
}

Don’t be tempted to use JMX for actual management. If you want RPCs to be able to remotely manage your system, do that however you’re doing RPCs. I’m glad that the JVM is well-instrumented, but the whole thing is otherwise clunky.

Versions

Versions. Expose the version and build number/hash of your system. While you’re at it, expose server time too, since timezones are tricky.

Metrics

Your application has its own metrics. Perhaps that’s database latency. Perhaps that’s the number of records cleaned up in the last phase. Perhaps it’s the number of peers connected to it. Perhaps it’s your birthday. Make it easy for your system’s developers to add these as they go.

Checkout Yammer’s Metrics system here. Hadoop’s Metrics2 and Metrics work fine too. You can use the JMX snippet above to expose metrics directly with JMX.

The important thing is to start exposing metrics.

Stack Dumps

Especially for systems that have background tasks or are prone to deadlocks, being able to access the stack dump easily is key. /stacks is Hadoop’s. "Example stacks servlet"

Poor Man’s Profiler, Thread Time, ehCache

I haven’t open sourced these, but, if you’re too lazy to hook up a real profiler, you can do “Poor Man’s Profiling.” You can use a bit of ridiculous awk and jstack to do it at the shell. Or you can hook up a servlet and do the aggregation inside of the process.

Similarly, you can ask Java for the total CPU time used by various threads, to get a sense of where your CPU is going.

If you use ehCache, you can hook up a servlet to dump its contents, which is overwhelming, but useful for tracking doing cache coherence issues.

Application-specific Status

Pretty much every daemon in Hadoop exposes some daemon-specific data. HBase and HDFS are great examples here, showing, for example, information about how various regions are doing, and what transaction the journal is at. Solr does the same thing.

"HBase servlet" "Namenode" "Solr"

Pre-built Solutions

JavaMelody is the best pre-built solution I know. In addition to showing much of the conten’t we’ve already talked about, it keeps history using JRobin and presents graphs. One of JavaMelody’s strenghts is that it can hook up to your JDBC and web servlets, and show you statistics for those, without your having had to lift a finger. It’s LGPL if that makes a difference for you.

"Java Melody"

JAMon, not to be confused with the templating engine, works along the same lines, but I’ve found JavaMelody much more useful.

NewRelic is hosted. If that works for you, folks rave about ‘em.

Jolokia is a newish library bridging JMX and HTTP. It can be installed as a JVM agent (and as a plugin to several other things), and they’ve got some cool integration with Cubism.js.

"Jolokia"

A Quick Note on Security

All of the systems I’ve discussed are not connected to the Internet at large. There’s a trade-off between exposing information and appropriately paranoid security. Assess your risks appropriately.

That’s all…

A plea for distributed systems developers—as you’re building your systems, expose runtime information over HTTP.

Let me know if you’ve got other stand-by debugging tools in this vein.