Let’s imagine my Mac running VMware Fusion.
Inside that, I’m running Ubuntu. In there, I’ve got
a Docker container running a CentOS 6.4 base image.
Once there, I’ve used virtualenv to create a Python
environment with my favorite version of Flask.
(Or maybe I chose Java, then Tomcat, running
multiple WARs!) (Or maybe AWS, then Docker,
then node.)
Look at all the ways we can distribute and run software these days:
Layer
Examples
Build Target Set
OS Image
PXE
Hardware Archicture (x86_64)
VM Image
VMware, AMI, OpenStack, vagrant
Virtualization Platform
OS Packages
RPM, deb packages
OS Version
OS-level containers
Docker, LXC, Solaris Zones
Containerization Platform
Language runtime configuration
CLASSPATH for Java, virtualenv for python, rvm for Ruby
The Build Target Set column suggests what you have to vary as you build. For example,
if you’re building OS Packages, then you need to build one package per supported OS Version.
If you’re building VMs, you have to build one VM per supported hypervisor. If you’re
building web sites, you have to test with every browser version.
A few observations:
The Java application server/OSGI stuff is silliness. I can’t imagine a world where you’re better served by one JVM running two applications rather than two JVMs running two applications.
The virtualenv stuff is extremely frustrating and fragile. I’ve got
Stockholm syndrome, so I think Java is relatively straight-forward with
$CLASSPATH, but of course it’s still nutty with its wildcard-handling,
order dependencies, and obtuse load ordering.
It turns out that
every runtime needs its own, separate package/dependency/compatibility story.
(C is the exception that proves the rule: the OS package manager
is the package/dependency story for dynamically linked C
binaries.) The impedance mismatch between runtimes is too great
to use RPMs for Java applications or Ruby gems or python
And, oh, what’s the browser doing there in the bottom?
Isn’t it interesting that the client side of a web app,
including JQuery, Bootstrap, and all those other goodies,
gets shipped (modulo caching) to its users every time! And
different tabs can use different versions of jQuery.
Web apps are also interesting for their interaction
with the time axis. If you’ve got a
Tamagotchi pet,
it comes shipped with software that never updates.
If you’re using a web browser, the software that you’re
interacting with updates without you knowing it all the time.
All this, of course, reminds us of static linking.
(Static linking may be a third rail of
computer programming. See Rob Pike, for
example.)
Shipping an AMI or a VM with all of your bits baked
in is isomorphic to shipping a statically linked
binary. (The isomorphism is hammered through in
this paper about
MirageOS which smooshes the app
and the OS into a single thing.)
When we ship software, we depend on some platform (be it x86, the JVM, Python,
the browser), some local state (the shared libraries that happen
to be installed, which is a source of potential trouble),
and the bits we actually package up (be it on a website or
a CD). We’re now moving the boundaries around aggressively
and in different ways. Most of the mechanisms
in the table above are trying to squish into nothing the potentially
odious middle state, but it’s notable how different
a JVM-based approach (somewhat OS independent) is from
a Docker-based approach (the OS is part of the distribution).
Thanks to @henryr for commenting
on an earlier draft and pointing me towards MirageOS.
As a preview, let’s talk about two pretty pictures.
Network Visualization
I’m running some typical distributed systems (HDFS, MapReduce, Impala, HBase,
Zookeeper) on a small, seven-node cluster. The diagram above has individual
processes and the TCP connections they’ve established to each other.
Some processes are “masters” and they end up talking to many other processes.
This diagram gets to the bottom of what makes this stuff hard (and
interesting). Everything is interconnected and dependent on other
pieces. So if one of the pieces breaks away, the others have
to adjust. For the most part, software like HDFS and MapReduce are
designed to deal with that gracefully, but when the corner cases
come to visit, referring to and understanding how the processes
communicate is key.
Looking for Outliers in Log Rates
In systems which have a lot of actors playing similar roles (e.g., Datanodes in
HDFS or TaskTrackers in MapReduce), you can smell out an issue
just by looking at how fast the log files are growing. In the picture
above, there’s one datanode that’s logging at a faster rate than
the others. That’s something worth following up on.
See you at Strata!
Come to the talk—we’ll talk more about several
more tricks to diagnosing distributed systems.
Swing by Ballroom CD at 4:50 on Wednesday, 2/27. I’ll
be having office hours in the Expo Hall, Table B,
at 10:10am Thursday.
I’m speaking at Strata about this stuff
in February, so this and a few other posts are by way of preparation for that talk.
I learned this trick at Google, and I’ve used it in every system I’ve helped
build since. Expose, on an HTTP server inside of your process, as much useful
information as possible. Do this as you go, to make development and debugging
easier. Reap rewards in production. In distributed systems, HTTP is
doubly-important: you’re constantly jumping between machines, and HTTP is
critically easier than SSH’ing over to read a file.
It hardly needs re-iterating, but I’ll re-iterate why HTTP is
your friend here: everyone’s got an HTTP client. Everyone
understands how HTTP flows through their networks and firewalls.
You’re only requiring your users to memorize one URL, something
they’re already pretty good at.
The good news is that you have to do very little: you can
instrument much of your system essentially for free.
I’m going to go through a bunch of things that you ought to
expose, pointing out examples from open source systems
that I know well through my work at Cloudera.
Though this post is Java-biased, this trick works everywhere.
(Hue,
which is python-based,
does this too.)
What to expose?
Configuration
Hadoop daemons exposes the configuration
that they’re using over /conf. Hadoop’s configuration
is tricky, because there are defaults in code, defaults in
XML files inside of jars, and multiple XML files that
are loaded in a specific order. So,
Hadoop’s config
servlet
not only tells you what the runtime
values are, but where they came from. Believe you me,
I’ve won several bets with this.
Logs
The following snippet will pick out your log4j file and return it
as a string. (Embedding it into your HTTP server is left as an exercise.)
Hadoop does this by exposing its logs directory
with a static servlet, which works too.
You can get at logs manually, of course, but they’re easy to
get to in the browser.
Changing the log levels dynamically in log4j turns out to be possible. Embed
org.apache.hadoop.log.LogLevel$Servlet.
LogLevelServlet in your code. (I’ve seen slightly better,
with drop-downs for the available loggers, but Hadoop doesn’t.)
You can also do this via JMX,
but the easiest way to do that is visualvm, and then you need
to set up X11 forwarding, and at this point it was easier
to just restart the server.
JMX
The JVM exposes tons of information, including
GC metadata, classpaths, operating system process data, and more.
If you visit /jmx on a Hadoop daemon, it’ll dump out the
JMX data as JSON, via JmxJsonServlet. Here’s how it looks:
The traditional way of exposing JMX is to set some JVM flags,
and then connecting remotely with jconsole,
jvisualvm (with the MBeans plugin), or
jmxsh, jmxterm, cmdline-jmxclient, twiddle, etc.
Most of these tools require you to configure JMX
up front.
Full-fledged JMX lets you not just read instrumentation information about the
JVM, but also change it. That’s really poor form: it’s one thing to expose
counters, and it’s another entirely to allow various write operations.
JmxJsonServlet exposes the good bits.
Look for a future post on how to get at the JMX metrics
from the command line, without having set up either
JmxJsonServlet or the JMX JVM parameters ahead of time.
If you want to expose some JMX metrics yourself, here’s
a small example. In typically verbose Java fashion, you need
both the interface and the class. Be careful not
to take locks in your JMX code. It’s not fun
when your metrics stop working in a deadlock.
importjava.lang.management.ManagementFactory;importjavax.management.MBeanServer;importjavax.management.ObjectName;/** * Simple JMX Example. * * Looks something like this when eventually converted to JSON: * { * "name" : "demo:type=DemoMXBean", * "modelerType" : "jmx$Demo", * "One" : 1 * }*/publicclassjmx{publicinterfaceDemoMXBean{intgetOne();}publicstaticclassDemoimplementsDemoMXBean{@OverridepublicintgetOne(){return1;}}publicstaticvoidmain(String[]args)throwsException{MBeanServermbs=ManagementFactory.getPlatformMBeanServer();if(mbs==null){System.err.println("Failed to find MBeanServer");return;}mbs.registerMBean(newDemo(),newObjectName("demo:type=DemoMXBean"));while(true){Thread.sleep(1000);}}}
Don’t be tempted to use JMX for actual management. If you want
RPCs to be able to remotely manage your system, do that however
you’re doing RPCs. I’m glad that the JVM is well-instrumented,
but the whole thing is otherwise clunky.
Versions
Versions. Expose the version and build number/hash
of your system. While you’re at it, expose server
time too, since timezones are tricky.
Metrics
Your application has its own metrics. Perhaps
that’s database latency. Perhaps that’s the number
of records cleaned up in the last phase. Perhaps
it’s the number of peers connected to it. Perhaps
it’s your birthday. Make it easy for your system’s
developers to add these as they go.
Checkout Yammer’s Metrics system
here. Hadoop’s Metrics2 and Metrics work fine too. You can use the
JMX snippet above to expose metrics directly with JMX.
The important thing is to start exposing metrics.
Stack Dumps
Especially for systems that have background
tasks or are prone to deadlocks, being able to access
the stack dump easily is key. /stacks
is Hadoop’s.
Poor Man’s Profiler, Thread Time, ehCache
I haven’t open sourced these, but, if you’re
too lazy to hook up a real profiler, you can
do “Poor Man’s Profiling.” You can use
a bit of ridiculous awk and
jstack to do it at the shell.
Or you can hook up a servlet and do the aggregation inside
of the process.
Similarly, you can ask Java for the total CPU
time used by various threads, to get a sense of
where your CPU is going.
If you use ehCache, you can hook up a servlet
to dump its contents, which is overwhelming,
but useful for tracking doing cache coherence
issues.
Application-specific Status
Pretty much every daemon in Hadoop exposes some daemon-specific
data. HBase and HDFS are great examples here, showing, for example,
information about how various regions are doing, and what transaction
the journal is at. Solr does the same thing.
Pre-built Solutions
JavaMelody is the best pre-built
solution I know. In addition to showing much of the conten’t we’ve already
talked about, it keeps history using JRobin and presents graphs. One of
JavaMelody’s strenghts is that it can hook up to your JDBC and web servlets,
and show you statistics for those, without your having had to lift a finger.
It’s LGPL if that makes a difference for you.
JAMon, not to be confused
with the templating engine, works
along the same lines, but I’ve found JavaMelody much more useful.
NewRelic is hosted. If that works
for you, folks rave about ‘em.
Jolokia is a newish library bridging
JMX and HTTP. It can be installed as a JVM agent (and
as a plugin to several other things), and they’ve got some cool
integration with Cubism.js.
A Quick Note on Security
All of the systems I’ve discussed are not connected to the
Internet at large. There’s a trade-off between
exposing information and appropriately paranoid security.
Assess your risks appropriately.
That’s all…
A plea for distributed systems developers—as you’re
building your systems, expose runtime information over HTTP.
Let me know if you’ve got other stand-by debugging tools
in this vein.