Virtualization All the Way Down; Nesting Dolls; Static Linking

Let’s imagine my Mac running VMware Fusion. Inside that, I’m running Ubuntu. In there, I’ve got a Docker container running a CentOS 6.4 base image. Once there, I’ve used virtualenv to create a Python environment with my favorite version of Flask. (Or maybe I chose Java, then Tomcat, running multiple WARs!) (Or maybe AWS, then Docker, then node.) "Flask in Docker in VMware on Mac"

Look at all the ways we can distribute and run software these days:

Layer Examples Build Target Set
OS Image PXE Hardware Archicture (x86_64)
VM Image VMware, AMI, OpenStack, vagrant Virtualization Platform
OS Packages RPM, deb packages OS Version
OS-level containers Docker, LXC, Solaris Zones Containerization Platform
Language runtime configuration CLASSPATH for Java, virtualenv for python, rvm for Ruby Language runtime version
Language VM manipulation Java Application Servers (e.g., Tomcat) hosting multiple applications; OSGI Container Platform
Static Linking gcc -static; Go binaries OS-architecture pair (e.g., linux_arm)
Browser Web Apps Browser Version and Platform
PaaS Google App Engine, Heroku PaaS Platform

The Build Target Set column suggests what you have to vary as you build. For example, if you’re building OS Packages, then you need to build one package per supported OS Version. If you’re building VMs, you have to build one VM per supported hypervisor. If you’re building web sites, you have to test with every browser version.

A few observations:

All this, of course, reminds us of static linking. (Static linking may be a third rail of computer programming. See Rob Pike, for example.) Shipping an AMI or a VM with all of your bits baked in is isomorphic to shipping a statically linked binary. (The isomorphism is hammered through in this paper about MirageOS which smooshes the app and the OS into a single thing.)

When we ship software, we depend on some platform (be it x86, the JVM, Python, the browser), some local state (the shared libraries that happen to be installed, which is a source of potential trouble), and the bits we actually package up (be it on a website or a CD). We’re now moving the boundaries around aggressively and in different ways. Most of the mechanisms in the table above are trying to squish into nothing the potentially odious middle state, but it’s notable how different a JVM-based approach (somewhat OS independent) is from a Docker-based approach (the OS is part of the distribution).

Thanks to @henryr for commenting on an earlier draft and pointing me towards MirageOS.

Slides–Tips and Tricks for Debugging Distributed Systems

I talked debugging distributed systems at Strata today. Thank you to all those who came! Slides:

Quick Preview: Tips and Tricks for Debugging Distributed Systems

I’m talking on Wednesday at Strata about Tips and Tricks for Debugging Distributed Systems. You should come check it out.

As a preview, let’s talk about two pretty pictures.

Network Visualization

"Network Visualization"

I’m running some typical distributed systems (HDFS, MapReduce, Impala, HBase, Zookeeper) on a small, seven-node cluster. The diagram above has individual processes and the TCP connections they’ve established to each other. Some processes are “masters” and they end up talking to many other processes.

This diagram gets to the bottom of what makes this stuff hard (and interesting). Everything is interconnected and dependent on other pieces. So if one of the pieces breaks away, the others have to adjust. For the most part, software like HDFS and MapReduce are designed to deal with that gracefully, but when the corner cases come to visit, referring to and understanding how the processes communicate is key.

Looking for Outliers in Log Rates

"Log Rates"

In systems which have a lot of actors playing similar roles (e.g., Datanodes in HDFS or TaskTrackers in MapReduce), you can smell out an issue just by looking at how fast the log files are growing. In the picture above, there’s one datanode that’s logging at a faster rate than the others. That’s something worth following up on.

See you at Strata!

Come to the talk—we’ll talk more about several more tricks to diagnosing distributed systems. Swing by Ballroom CD at 4:50 on Wednesday, 2/27. I’ll be having office hours in the Expo Hall, Table B, at 10:10am Thursday.

Debug Servlets, or ‘HTTP Won; Use It’

I’m speaking at Strata about this stuff in February, so this and a few other posts are by way of preparation for that talk.

I learned this trick at Google, and I’ve used it in every system I’ve helped build since. Expose, on an HTTP server inside of your process, as much useful information as possible. Do this as you go, to make development and debugging easier. Reap rewards in production. In distributed systems, HTTP is doubly-important: you’re constantly jumping between machines, and HTTP is critically easier than SSH’ing over to read a file.

It hardly needs re-iterating, but I’ll re-iterate why HTTP is your friend here: everyone’s got an HTTP client. Everyone understands how HTTP flows through their networks and firewalls. You’re only requiring your users to memorize one URL, something they’re already pretty good at.

The good news is that you have to do very little: you can instrument much of your system essentially for free.
I’m going to go through a bunch of things that you ought to expose, pointing out examples from open source systems that I know well through my work at Cloudera. Though this post is Java-biased, this trick works everywhere. (Hue, which is python-based, does this too.)

What to expose?

Configuration

Hadoop daemons exposes the configuration that they’re using over /conf. Hadoop’s configuration is tricky, because there are defaults in code, defaults in XML files inside of jars, and multiple XML files that are loaded in a specific order. So, Hadoop’s config servlet not only tells you what the runtime values are, but where they came from. Believe you me, I’ve won several bets with this.

"Example conf servlet screenshot"

Logs

The following snippet will pick out your log4j file and return it as a string. (Embedding it into your HTTP server is left as an exercise.) Hadoop does this by exposing its logs directory with a static servlet, which works too. You can get at logs manually, of course, but they’re easy to get to in the browser.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import java.io.IOException;
import java.io.File;
import java.io.FileInputStream;
import java.util.Enumeration;
import org.apache.log4j.Logger;
import com.google.common.io.ByteStreams;

  public String getTailOfServerLog() throws IOException {
    Enumeration<?> e = Logger.getRootLogger().getAllAppenders();
    while (e.hasMoreElements()) {
      Object a = e.nextElement();
      if (a instanceof FileAppender) {
        FileAppender fa = (FileAppender) a;
        String filename = fa.getFile();
        File file = new File(filename);
        long length = file.length();
        FileInputStream is = new FileInputStream(file);
        String result = "";
        try {
          is.skip(Math.max(0, length - 50 * 1024));
          result = new String(ByteStreams.toByteArray(is), "UTF-8");
          return result;
        } finally {
          is.close();
        }
      }
    }
    return "No logs found.";
  }

Log Configuration

Changing the log levels dynamically in log4j turns out to be possible. Embed org.apache.hadoop.log.LogLevel$Servlet. LogLevelServlet in your code. (I’ve seen slightly better, with drop-downs for the available loggers, but Hadoop doesn’t.) You can also do this via JMX, but the easiest way to do that is visualvm, and then you need to set up X11 forwarding, and at this point it was easier to just restart the server.

JMX

The JVM exposes tons of information, including GC metadata, classpaths, operating system process data, and more. If you visit /jmx on a Hadoop daemon, it’ll dump out the JMX data as JSON, via JmxJsonServlet. Here’s how it looks:

"Example jmx json servlet"

The traditional way of exposing JMX is to set some JVM flags, and then connecting remotely with jconsole, jvisualvm (with the MBeans plugin), or jmxsh, jmxterm, cmdline-jmxclient, twiddle, etc.

1
2
3
 -Dcom.sun.management.jmxremote.port = 9999
 -Dcom.sun.management.jmxremote.authenticate = false
 -Dcom.sun.management.jmxremote.ssl = false

Most of these tools require you to configure JMX up front.

Full-fledged JMX lets you not just read instrumentation information about the JVM, but also change it. That’s really poor form: it’s one thing to expose counters, and it’s another entirely to allow various write operations. JmxJsonServlet exposes the good bits.

Look for a future post on how to get at the JMX metrics from the command line, without having set up either JmxJsonServlet or the JMX JVM parameters ahead of time.

If you want to expose some JMX metrics yourself, here’s a small example. In typically verbose Java fashion, you need both the interface and the class. Be careful not to take locks in your JMX code. It’s not fun when your metrics stop working in a deadlock.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import java.lang.management.ManagementFactory;
import javax.management.MBeanServer;
import javax.management.ObjectName;

/**
 * Simple JMX Example.
 *
 * Looks something like this when eventually converted to JSON:
 *  {
 *    "name" : "demo:type=DemoMXBean",
 *    "modelerType" : "jmx$Demo",
 *    "One" : 1
 *  }
*/
public class jmx {
  public interface DemoMXBean {
    int getOne();
  }
  public static class Demo implements DemoMXBean {
    @Override
    public int getOne() {
      return 1;
    }
  }
  public static void main(String[] args) throws Exception {
    MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
    if (mbs == null) {
      System.err.println("Failed to find MBeanServer");
      return;
    }
    mbs.registerMBean(new Demo(), new ObjectName("demo:type=DemoMXBean"));
    while (true) {
      Thread.sleep(1000);
    }
  }
}

Don’t be tempted to use JMX for actual management. If you want RPCs to be able to remotely manage your system, do that however you’re doing RPCs. I’m glad that the JVM is well-instrumented, but the whole thing is otherwise clunky.

Versions

Versions. Expose the version and build number/hash of your system. While you’re at it, expose server time too, since timezones are tricky.

Metrics

Your application has its own metrics. Perhaps that’s database latency. Perhaps that’s the number of records cleaned up in the last phase. Perhaps it’s the number of peers connected to it. Perhaps it’s your birthday. Make it easy for your system’s developers to add these as they go.

Checkout Yammer’s Metrics system here. Hadoop’s Metrics2 and Metrics work fine too. You can use the JMX snippet above to expose metrics directly with JMX.

The important thing is to start exposing metrics.

Stack Dumps

Especially for systems that have background tasks or are prone to deadlocks, being able to access the stack dump easily is key. /stacks is Hadoop’s. "Example stacks servlet"

Poor Man’s Profiler, Thread Time, ehCache

I haven’t open sourced these, but, if you’re too lazy to hook up a real profiler, you can do “Poor Man’s Profiling.” You can use a bit of ridiculous awk and jstack to do it at the shell. Or you can hook up a servlet and do the aggregation inside of the process.

Similarly, you can ask Java for the total CPU time used by various threads, to get a sense of where your CPU is going.

If you use ehCache, you can hook up a servlet to dump its contents, which is overwhelming, but useful for tracking doing cache coherence issues.

Application-specific Status

Pretty much every daemon in Hadoop exposes some daemon-specific data. HBase and HDFS are great examples here, showing, for example, information about how various regions are doing, and what transaction the journal is at. Solr does the same thing.

"HBase servlet" "Namenode" "Solr"

Pre-built Solutions

JavaMelody is the best pre-built solution I know. In addition to showing much of the conten’t we’ve already talked about, it keeps history using JRobin and presents graphs. One of JavaMelody’s strenghts is that it can hook up to your JDBC and web servlets, and show you statistics for those, without your having had to lift a finger. It’s LGPL if that makes a difference for you.

"Java Melody"

JAMon, not to be confused with the templating engine, works along the same lines, but I’ve found JavaMelody much more useful.

NewRelic is hosted. If that works for you, folks rave about ‘em.

Jolokia is a newish library bridging JMX and HTTP. It can be installed as a JVM agent (and as a plugin to several other things), and they’ve got some cool integration with Cubism.js.

"Jolokia"

A Quick Note on Security

All of the systems I’ve discussed are not connected to the Internet at large. There’s a trade-off between exposing information and appropriately paranoid security. Assess your risks appropriately.

That’s all…

A plea for distributed systems developers—as you’re building your systems, expose runtime information over HTTP.

Let me know if you’ve got other stand-by debugging tools in this vein.