Getting started with a TensorFlow surgery classifier with TensorBoard data viz

Originally published at Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The most challenging part of deep learning is labeling, as you’ll see in part one of this two-part series, Learn how to classify images with TensorFlow. Proper training is critical to effective future classification, and for training to work, we need lots of accurately labeled data. In part one, I skipped over this challenge by downloading 3,000 prelabeled images. I then showed you how to use this labeled data to train your classifier with TensorFlow. In this part we’ll train with a new data set, and I’ll introduce the TensorBoard suite of data visualization tools to make it easier to understand, debug, and optimize our TensorFlow code.

Given my work as VP of engineering and compliance at healthcare technology company C-SATS, I was eager to build a classifier for something related to surgery. Suturing seemed like a great place to start. It is immediately useful, and I know how to recognize it. It is useful because, for example, if a machine can see when suturing is occurring, it can automatically identify the step (phase) of a surgical procedure where suturing takes place, e.g. anastomosis. And I can recognize it because the needle and thread of a surgical suture are distinct, even to my layperson’s eyes.

My goal was to train a machine to identify suturing in medical videos.

I have access to billions of frames of non-identifiable surgical video, many of which contain suturing. But I’m back to the labeling problem. Luckily, C-SATS has an army of experienced annotators who are experts at doing exactly this. My source data were video files and annotations in JSON.

The annotations look like this:

        "annotations": [
                "endSeconds": 2115.215,
                "label": "suturing",
                "startSeconds": 2319.541
                "endSeconds": 2976.301,
                "label": "suturing",
                "startSeconds": 2528.884
        "durationSeconds": 2975,
        "videoId": 5
        "annotations": [
        // ...etc...

I wrote a Python script to use the JSON annotations to decide which frames to grab from the .mp4 video files. ffmpeg does the actual grabbing. I decided to grab at most one frame per second, then I divided the total number of video seconds by four to get 10k seconds (10k frames). After I figured out which seconds to grab, I ran a quick test to see if a particular second was inside or outside a segment annotated as suturing (isWithinSuturingSegment() in the code below). Here’s

# Grab frames from videos with ffmpeg. Use multiple cores.
# Minimum resolution is 1 second--this is a shortcut to get less frames.
# (C)2017 Adam Monsen. License: AGPL v3 or later.
import json
import subprocess
from multiprocessing import Pool
import os
frameList = []
def isWithinSuturingSegment(annotations, timepointSeconds):
    for annotation in annotations:
        startSeconds = annotation['startSeconds']
        endSeconds = annotation['endSeconds']
        if timepointSeconds > startSeconds and timepointSeconds < endSeconds:
            return True
    return False
with open('available-suturing-segments.json') as f:
    j = json.load(f)
    for video in j:
        videoId = video['videoId']
        videoDuration = video['durationSeconds']
        # generate many ffmpeg frame-grabbing commands
        start = 1
        stop = videoDuration
        step = 4 # Reduce to grab more frames
        for timepointSeconds in xrange(start, stop, step):
            inputFilename = '/home/adam/Downloads/suturing-videos/{}.mp4'.format(videoId)
            outputFilename = '{}-{}.jpg'.format(video['videoId'], timepointSeconds)
            if isWithinSuturingSegment(video['annotations'], timepointSeconds):
                outputFilename = 'suturing/{}'.format(outputFilename)
                outputFilename = 'not-suturing/{}'.format(outputFilename)
            outputFilename = '/home/adam/local/{}'.format(outputFilename)
            commandString = 'ffmpeg -loglevel quiet -ss {} -i {} -frames:v 1 {}'.format(
                timepointSeconds, inputFilename, outputFilename)
                'outputFilename': outputFilename,
                'commandString': commandString,
def grabFrame(f):
    if os.path.isfile(f['outputFilename']):
        print 'already completed {}'.format(f['outputFilename'])
        print 'processing {}'.format(f['outputFilename'])
p = Pool(4) # for my 4-core laptop, frameList)

Now we’re ready to retrain the model again, exactly as before.

To use this script to snip out 10k frames took me about 10 minutes, then an hour or so to retrain Inception to recognize suturing at 90% accuracy. I did spot checks with new data that wasn’t from the training set, and every frame I tried was correctly identified (mean confidence score: 88%, median confidence score: 91%).

Here are my spot checks. (WARNING: Contains links to images of blood and guts.)

Image Not suturing score Suturing score
Not-Suturing-01.jpg 0.71053 0.28947
Not-Suturing-02.jpg 0.94890 0.05110
Not-Suturing-03.jpg 0.99825 0.00175
Suturing-01.jpg 0.08392 0.91608
Suturing-02.jpg 0.08851 0.91149
Suturing-03.jpg 0.18495 0.81505

How to use TensorBoard

Visualizing what’s happening under the hood and communicating this with others is at least as hard with deep learning as it is in any other kind of software. TensorBoard to the rescue! from part one automatically generates the files TensorBoard uses to generate graphs representing what happened during retraining.

To set up TensorBoard, run the following inside the container after running

pip install tensorboard
tensorboard --logdir /tmp/retrain_logs

Watch the output and open the printed URL in a browser.

Starting TensorBoard 41 on port 6006
(You can navigate to

You’ll see something like this:

I hope this will help; if not, you’ll at least have something cool to show. During retraining, I found it helpful to see under the “SCALARS” tab how accuracy increases while cross-entropy decreases as we perform more training steps. This is what we want.

Learn more

If you’d like to learn more, explore these resources:

Here are other resources that I used in writing this series, which may help you, too:

If you’d like to chat about this topic, please drop by the ##tfadam topical channel on Freenode IRC. You can also email me or leave a comment below.

This series would never have happened without expert help from Eva Monsen, Brian C. Lane, Rob Smith, Alex Simes, VM Brasseur, Bri Hatch, Rikki Endsley and the all-star editors at

Learn how to classify images with TensorFlow

Originally published at Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Recent advancements in deep learning algorithms and hardware performance have enabled researchers and companies to make giant strides in areas such as image recognition, speech recognition, recommendation engines, and machine translation. Six years ago, the first superhuman performance in visual pattern recognition was achieved. Two years ago, the Google Brain team unleashed TensorFlow, deftly slinging applied deep learning to the masses. TensorFlow is outpacing many complex tools used for deep learning.

With TensorFlow, you’ll gain access to complex features with vast power. The keystone of its power is TensorFlow’s ease of use.

In a two-part series, I’ll explain how to quickly create a convolutional neural network for practical image recognition. The computation steps are embarrassingly parallel and can be deployed to perform frame-by-frame video analysis and extended for temporal-aware video analysis.

This series cuts directly to the most compelling material. A basic understanding of the command line and Python is all you need to play along from home. It aims to get you started quickly and inspire you to create your own amazing projects. I won’t dive into the depths of how TensorFlow works, but I’ll provide plenty of additional references if you’re hungry for more. All the libraries and tools in this series are free/libre/open source software.

How it works

Our goal in this tutorial is to take a novel image that falls into a category we’ve trained and run it through a command that will tell us in which category the image fits. We’ll follow these steps:

a directed graph from label to train to classify

  1. Labeling is the process of curating training data. For flowers, images of daisies are dragged into the “daisies” folder, roses into the “roses” folder, and so on, for as many different flowers as desired. If we never label ferns, the classifier will never return “ferns.” This requires many examples of each type, so it is an important and time-consuming process. (We will use pre-labeled data to start, which will make this much quicker.)
  2. Training is when we feed the labeled data (images) to the model. A tool will grab a random batch of images, use the model to guess what type of flower is in each, test the accuracy of the guesses, and repeat until most of the training data is used. The last batch of unused images is used to calculate the accuracy of the trained model.
  3. Classification is using the model on novel images. For example, input: IMG207.JPG, output: daisies. This is the fastest and easiest step and is cheap to scale.

Training and classification

In this tutorial, we’ll train an image classifier to recognize different types of flowers. Deep learning requires a lot of training data, so we’ll need lots of sorted flower images. Thankfully, another kind soul has done an awesome job of collecting and sorting images, so we’ll use this sorted data set with a clever script that will take an existing, fully trained image classification model and retrain the last layers of the model to do just what we want. This technique is called transfer learning.

The model we’re retraining is called Inception v3, originally specified in the December 2015 paper “Rethinking the Inception Architecture for Computer Vision.”

Inception doesn’t know how to tell a tulip from a daisy until we do this training, which takes about 20 minutes. This is the “learning” part of deep learning.


Step one to machine sentience: Install Docker on your platform of choice.

The first and only dependency is Docker. This is the case in many TensorFlow tutorials (which should indicate this is a reasonable way to start). I also prefer this method of installing TensorFlow because it keeps your host (laptop or desktop) clean by not installing a bunch of dependencies.

Bootstrap TensorFlow

With Docker installed, we’re ready to fire up a TensorFlow container for training and classification. Create a working directory somewhere on your hard drive with 2 gigabytes of free space. Create a subdirectory called local and note the full path to that directory.

docker run -v /path/to/local:/notebooks/local --rm -it --name tensorflow tensorflow/tensorflow:nightly /bin/bash

Here’s a breakdown of that command.

  • -v /path/to/local:/notebooks/local mounts the local directory you just created to a convenient place in the container. If using RHEL, Fedora, or another SELinux-enabled system, append :Z to this to allow the container to access the directory.
  • --rm tells Docker to delete the container when we’re done.
  • -it attaches our input and output to make the container interactive.
  • --name tensorflow gives our container the name tensorflow instead of sneaky_chowderhead or whatever random name Docker might pick for us.
  • tensorflow/tensorflow:nightly says run the nightly image of tensorflow/tensorflowfrom Docker Hub (a public image repository) instead of latest (by default, the most recently built/available image). We are using nightly instead of latest because (at the time of writing) latest contains a bug that breaks TensorBoard, a data visualization tool we’ll find handy later.
  • /bin/bash says don’t run the default command; run a Bash shell instead.

Train the model

Inside the container, run these commands to download and sanity check the training data.

curl -O
echo 'db6b71d5d3afff90302ee17fd1fefc11d57f243f  flower_photos.tgz' | sha1sum -c

If you don’t see the message flower_photos.tgz: OK, you don’t have the correct file. If the above curl or sha1sum steps fail, manually download and explode the training data tarball(SHA-1 checksum: db6b71d5d3afff90302ee17fd1fefc11d57f243f) in the local directory on your host.

Now put the training data in place, then download and sanity check the retraining script.

mv flower_photos.tgz local/
cd local
curl -O
echo 'a74361beb4f763dc2d0101cfe87b672ceae6e2f5' | sha1sum -c

Look for confirmation that has the correct contents. You should see OK.

Finally, it’s time to learn! Run the retraining script.

python --image_dir flower_photos --output_graph output_graph.pb --output_labels output_labels.txt

If you encounter this error, ignore it:
TypeError: not all arguments converted during string formatting Logged from file, line 82.

As proceeds, the training images are automatically separated into batches of training, test, and validation data sets.

In the output, we’re hoping for high “Train accuracy” and “Validation accuracy” and low “Cross entropy.” See How to retrain Inception’s final layer for new categories for a detailed explanation of these terms. Expect training to take around 30 minutes on modern hardware.

Pay attention to the last line of output in your console:

INFO:tensorflow:Final test accuracy = 89.1% (N=340)

This says we’ve got a model that will, nine times out of 10, correctly guess which one of five possible flower types is shown in a given image. Your accuracy will likely differ because of randomness injected into the training process.


With one more small script, we can feed new flower images to the model and it’ll output its guesses. This is image classification.

Save the following as in the local directory on your host:

import tensorflow as tf, sys
image_path = sys.argv[1]
graph_path = 'output_graph.pb'
labels_path = 'output_labels.txt'
# Read in the image_data
image_data = tf.gfile.FastGFile(image_path, 'rb').read()
# Loads label file, strips off carriage return
label_lines = [line.rstrip() for line
    in tf.gfile.GFile(labels_path)]
# Unpersists graph from file
with tf.gfile.FastGFile(graph_path, 'rb') as f:
    graph_def = tf.GraphDef()
    _ = tf.import_graph_def(graph_def, name='')
# Feed the image_data as input to the graph and get first prediction
with tf.Session() as sess:
    softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
    predictions =, 
    {'DecodeJpeg/contents:0': image_data})
    # Sort to show labels of first prediction in order of confidence
    top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
    for node_id in top_k:
         human_string = label_lines[node_id]
         score = predictions[0][node_id]
         print('%s (score = %.5f)' % (human_string, score))

To test your own image, save it as test.jpg in your local directory and run (in the container) python test.jpg. The output will look something like this:

sunflowers (score = 0.78311)
daisy (score = 0.20722)
dandelion (score = 0.00605)
tulips (score = 0.00289)
roses (score = 0.00073)

The numbers indicate confidence. The model is 78.311% sure the flower in the image is a sunflower. A higher score indicates a more likely match. Note that there can be only onematch. Multi-label classification requires a different approach.

For more detail, view this great line-by-line explanation of

The graph loading code in the classifier script was broken, so I applied the graph_def = tf.GraphDef(), etc. graph loading code.

With zero rocket science and a handful of code, we’ve created a decent flower image classifier that can process about five images per second on an off-the-shelf laptop computer.

In the second part of this series, we’ll use this information to train a different image classifier, then take a look under the hood with TensorBoard. If you want to try out TensorBoard, keep this container running by making sure docker run isn’t terminated.

Free Software Claus is Coming to Town

I help organize a conference for Free Software enthusiasts called SeaGL. This year I’m proud to report that Shauna Gordon McKeon and Richard Stallman (aka “RMS”) are keynote speakers.

I first invited RMS to Seattle 13 years ago, and finally in 2015 it all came together. In his words:

My talks are not technical. The topics of free software, copyright vs community, and digital inclusion deal with ethical/political issues that concern all users of computers.

So please do come on down to Seattle Central College on October 23rd and 24th, 2015 for SeaGL!

3 Reasons Why You Should Never Use Enlocked

Enlocked advertises easy, secure email. Sounds good to me! My current solution (Thunderbird+Enigmail) works, barely, but it is a big pain in the tukhus. I’d go for something better. Heck, I’d pay for it. And Enlocked is free!

I gave their Chrome plugin a try. Installation was a breeze and it worked exactly as advertised. It integrates almost seamlessly into GMail (when replying, quoted text is still encrypted, but they’ll probably fix that soon). It really was friendly enough for anyone! But I’m not dusting off the old blog just to tell you that. No ma’am.

Unicorn and Cow (and sentry)

1. They encrypt and decrypt using their own key.

If you’ve ever spent the not-insignificant time to learn and use PGP yourself, you’ll know that one point of going through all the trouble is complete, end-to-end encryption. You don’t have to trust your email handlers. Any of them. And there can be many! So, uh, you just never give your private key to anyone, ok? Everyone gets their own keys (there are plenty for everyone, and they’re free!). That’s the way PGP works.

I should say that I’m not positive Enlocked uses their own key. It could just be a key they generate using some secret they securely get through you via OpenID or something fancy like that (even so, they’re free to brute force your secret day and night since they have the key). But without knowing for sure, you might as well assume it’s their key and they can decrypt your messages anytime they darn well please. Or if someone forces them to decrypt messages (like a government, or someone with lots of power or money), same result.

2. They encrypt and decrypt on their servers.

From their How it Works page:

The systems at Enlocked only have access to your messages for the short time we are encrypting or decrypting them, and then our software instantly removes any copies.

This is really more of the first reason (no end-to-end encryption), but it’s just another place where their inevitable security breach could occur.

3. Their software is closed-source.

If you know me you know I’m a Free² Software zealot, so you expect this kind of propaganda from me. But transparency is really important where the actual encryption and decryption takes place. They must at least make their client-side code available for review.

Sorry Enlocked, nobody serious about security will adopt your software until you address these issues.

Disclaimer: I’m no security expert. But Bruce Schneier is. If you really want to get schooled on security, read anything he’s written. For instance: Secrets and Lies: Digital Security in a Networked World.

Does the FSF need better top-down social skills?

Larry Cafiero and Joe Brockmeier are two big voices for technological freedom. They’re both pretty fired up about RMS’s f-you epitaph of Jobs.

Generally you want the figurehead of a public foundation to be, uh, attractive. Intellectually, maybe even physically. Right? Not only does the cause itself have to make sense, these people need to attract other people to their cause. And they usually “say the right things”, smile, wear a suit, whatever. But I always thought these requirements only applied to other causes (besides Free Software).

Certainly RMS lacking those traits didn’t keep me from FLOSS. I heard about RMS and the proprietary printer a while back, and that’s all it took to get me hooked on FLOSS. I could identify immediately because I write software, and proprietary code is a pain. His cause just makes sense, even if he doesn’t. But I’ve been justifying his abnormal behavior because, well, he started something new! Something important. He knew it was important, and dedicated his life to this thing that many, many folks never even know exists. Something that affects all our lives, every day, more and more. Software must support our Freedom, or we are not free.

So he won me over, but I’m a nerd. I’m used to eccentrics in my field. Truth wins, period. And I still don’t know if it matters if RMS is a polished, smiley, public-friendly dude or not. Would Free Software be farther along today if RMS were kinder, more respectful, or somehow a better “public figure”? Would DRM have never been allowed to exist? Would the government pass laws that software for implanted medical devices be Free?

simple AJAX/JSP example: sum of two numbers

It’s been a while since I’ve done any front-end Web programming, so when Eva proposed a friendly challenge to quickly create a simple AJAX calculator, I gladly accepted. It took her about 20 minutes on an ASP.NET stack, and took me… *cough* …a couple of hours using JSP.

The challenge was fun because I played with and gained respect for JQuery and the Eclipse WTP. I think it took me longer than Eva because I first looked for tiny AJAX examples in Ruby on Rails and Django. After a couple of aborted attempts, I decided to use JSP after finding this nice example.

I’m sharing my result since I wasn’t able to find one quite as succinct. You can throw the war file in a Tomcat “webapps” directory or import it into Eclipse (ideally the Java EE version with WTP) to hack it. The WTP even has a nifty HTML WYSIWYG design view.

Mifos in the Google Summer of Code 2009

Mifos has been accepted for the Google Summer of Code 2009! Working on Mifos has been my full-time job since October of 2007. The Google Summer of Code is an awesome program funded by Google wherein students get paid to work on FLOSS. Yay!

If you’re an eligible and interested student, check out our ideas page, hop on IRC during US/Pacific business hours, ask away on the mailing list, download the code, try building it, etc. and we’ll get you signed up!

Elegant Lead Sheets are Back!

As the holidays are fast approaching, many musicians will be called forth to back a multitude of sing-alongs. Be prepared! Musicians that care memorize or use sheet music, and nerdy musicians love Chordie!

Chordie turns text files with embedded chord names into beautiful, stafless PostScript lead sheets.

Chordie is a fork of Chord, and is written in under 5,000 lines of K&R C. Chordie currenly only runs on *NIX-like operating systems, but there may be ports to other operating systems someday.

UPDATE: Chordii is the new name for this project.

Warm Fuzzies for WordPress

WordPress is an excellent example of a well-run Free Software project.
Public pages on coding standards, WordPress IRC, and reporting bugs are clear, comprehensive, and elegant. Installation and upgrading are straightforward: explained in both “5-minute” and “detailed” forms. They use Trac, an awesome bug tracking system.

I can’t say much about the source except to say that what I’ve seen is simple and coherent. Database interaction, for instance, is abstracted in a way that makes sense. Ohloh has some more useful details and statistics.