Tag Archives: programming

LFNW 2016 command-line talk follow-up materials

Here are follow-up materials for my talk at LFNW 2016.

video

notes

Open raw notes in a new window/tab

2016-04-23 LinuxfestNW

The Command Line 

Adam - published author, speaker, co-founder of SeaGL
notes by James Zeringue ([email protected])

Thank you blug/btc for hosting! 

The command line is a great place to leverage other's knowledge and hard work, which they have already shared!

        Programs: 
                                at, rename, locate

                                Highly customizable automation at the tip of your fingers.

The Command line as a tank, which is given away for free:

	"Save your money! Take one of our free tanks!"

	Station wagon buyer "i don;t know how to maintain a tank!"

	"volunteers will fix it for you for free while you sleep!"

	Buyer: "stay away from our home!!"


CMD	j dotfiles	(change dir and list contents)
	screemkey
	Ctrl-r		Reverse interactve command history search

SHELL	fish		friendly interactive shell
		fish automatically lists completion possibilities 
		auotmagic syntax highligting on the command line
		advanced tab-completion with list of possible matches w/ descriptions
		web-based config

CMD	alt-.		argument history, repeat to go back

	globs		groups of files expanded by the shell

	watch		run multiple commands and monitor their output at intervals
			highlights changed output in the commands

	bash-completion-prompt
	dirjump	(j)	maintains index of visited paths, then performs a smart autocomplete when using j command

"What is the difference between python and shell version?"
A: The shell is an interactive programming language, while python is more of a programming langunage.

cmd	locate		find files by index (always uses pre-cached index, contrast with find which reads filenames in real-time)
		update-db

cmd	pv		progress meter (see pipes)

cmd	progress	detects and monitors interesting commands and monitors

cmd	toilet		(colorful banners in the console)

cmd	nmap		find hosts on network, much more
		* masscan has Adam's favorite README on github ( https://github.com/robertdavidgraham/masscan )

vim
---
	text editor, standard in any *nix system
			enhanced status line capabilities
		tight integration with git
		Syntastic enables inline error checking on save with line highlighting
		fugitive
	multiple windows
	folds - hide/show information in sections
	'snippets' - "ultisnips+youcompleteme"
			templated auto complete and code snippet plugins
	(xkcd about vim users spending time editting vimrc files)

HOWTO - make this environment portable?
	automate it! host all dotfiles and rcfiles and download them on-demand
		check github for a management solution

screen
tmux	terminal multiplexer
		sits between you and the command line
		allows multiple connections to the same terminal for collaboration

You should VERSION-CONTROL your plaintext!
	"indispensible" tool
	github, others

how to upgrade MongoDB 2.6 to 3.x on Ubuntu

sudo mv /etc/apt/sources.list.d/mongodb* /tmp/
echo "deb http://repo.mongodb.org/apt/ubuntu "$(lsb_release -sc)"/mongodb-org/3.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.0.list
sudo apt-get update && sudo apt-get install -y mongodb-org

And I also had to do fix my replica set in the MongoDB shell (necessary for Meteor oplog tailing):

var a = {"_id" : "rs0", "version" : 1,"members" : [{"_id" : 1, "host" : "localhost:27017"}]};
rs.reconfig(a, {force:true});

UPDATE 2015-10-01: Alas, one of my coworkers found even all the above wasn’t enough–he had to blow away his old MongoDB install.

sudo mv /var/lib/mongodb /tmp
sudo apt-get purge mongodb-org-server
sudo apt-get install -y mongodb-org-server

We also use one-member replica sets in dev (Meteor uses the oplog), so edit /etc/mongodb.conf and include something like replSet=rs0, then restart mongo (sudo service mongodb restart). Finally, initialize the replica set:

var a = {"_id" : "rs0", "version" : 1,"members" : [{"_id" : 1, "host" : "localhost:27017"}]};
rs.initiate(a);

It appears that collections can be restored by simply copying files like blah.0, blah.1, blah.2 and blah.ns from /tmp/mongodb to /var/lib/mongodb while the MongoDB server is stopped.

Oplog: a tail of wonder and woe

First, your TL;DR:

  1. Stress test your Meteor app.
  2. Oplog tailing may be less efficient for particular workloads.

Background

My work involves using crowdsourcing to assess and improve technical skill. We’re focusing on improving basic technical skills of surgeons first because—no surprise here—it matters. A more skilled surgeon means patients with less complications. Being healthy, not dying. Good stuff.

One way we gather data is a survey app where crowdworkers watch a short video of real human surgery and answer simple questions about what they saw. For example:

  • Do both hands work well together?
  • Is the surgeon efficient?
  • Are they rough or gentle?

Turns out the crowd nails this! Think of it this way: most anyone can recognize standout performers on the basketball court or a playing a piano, even if they’re not an expert at either. Minimal training and this “gut feel” are all we need to objectively measure basic technical skill.

Meteor

So, a survey app. Watch a video, answer a few questions. Pretty straightforward. We built one in-house. Meteor was a great choice here. Rapid development, easy deployment, JavaScript everywhere, decent Node.js stack out of the box, all that.

And of course we used oplog tailing right from the start because much of what read about oplog tailing made it sound like it was the only way to go. Sure, you’ll want oplog tailing for realtime (<10sec delayed) data when you have multiple apps connecting to the same MongoDB database. But if you don’t need that, you may not need it at all, and you may not want it.

Traffic pattern

Our traffic is very bursty. We publish a HIT on Amazon Mechanical Turk. Within minutes, the crowd is upon our survey app. Our app generally does fine, but folks complained of very slow survey completion times when we started hitting somewhere around 80 DDP(?) sessions in Kadira. Each DDP session in our survey app should equate to one simultaneous active user (hereafter “user”).

Here’s what we want to know:

  1. Why does our app slow down when it does?
  2. Can it scale [linearly]?
  3. Are there any small code or configuration changes we could do to get a lot more performance out of the existing hardware?

Spoilers:

  1. Meteor pegs the CPU when oplog tailing is enabled.
  2. Yes, if we disable oplog tailing.
  3. Yes, disabling oplog tailing and clustering our app.

Stress test

We created a stress test to get a better feel for the performance characteristics of our app.

The test uses nightwatch to emulate a turker completing a survey. Load the survey app, click radio buttons, enter comments, and throw in a few random waits. Many threads of the nightwatch test are spawned and charge on in parallel. The machine running nightwatch needs to be pretty beefy. I preferred a browser-based stress test because I noticed client-server interactions amplified the amount and frequency of DDP traffic (hello Mr. Reactivity). It was also easier to write and run nightwatch then pick the exact DDP traffic to send.

Notes on our app:

  • We use mup to deploy to Ubuntu EC2 servers on AWS.
  • Tested configuration uses one mup-deployed Meteor app.
  • The app connects to a local MongoDB server running a standalone one-member replica set (just to get the oplog).
    • I also tested with Modulus, scaled to one 512mb servo in us-east-1a. Non-enterprise Modulus runs with oplog tailing disabled, and the app connects to MongoDB on a host other than localhost.
  • Our app uses iron:router.
  • Our app doesn’t need to be reactive. Surveyees work in isolation. But this is how we wrote the app, so that’s what I tested.

Results

I ran a series of stress tests. Ramp up traffic, capture metrics, change code and/or server configuration, repeat. Here are the results.

Takeaways:

  • Each row in the spreadsheet represents one test.
  • Every test ran for 5 minutes.
  • When one “user” completes a survey, another one begins (so the number of users is kept more or less constant during evey 5-minute test).
  • There are lots of notes and Kadira screenshots in the results spreadsheet. For the Kadira screenshots, the relevant data is on the rightmost side of the graphs.
  • I think Kadira session counts are high. Maybe it isn’t counting disconnects, maybe DDP timeouts take a while, or maybe the nightwatch test disconnects slowly.
  • Row 3. At 40 users, the CPU is pegged. Add any more users and it takes too long for them to complete a survey.
  • Row 5. Notice how doubling the cores does not double the number of test passes (less than linear scaling along this dimension).
  • Row 6. Ouch, we’re really not scaling! Might need to investigate the efficiency of meteorhacks:cluster.
  • Row 7. Oplog tailing is disabled for this and all future tests. MongoDB CPU load is roughly doubled from the 40-user, 1-core, oplog-tailing-enabled test.
  • Row 9. Too much for one core: 6.5% of the tests failed.
  • Row 11. This is what we want to see! 2x cores, 2x users, 2x passes. In other words, with oplog tailing disabled and double the number of cores, we supported double the number of users and doubled test passes.
  • I should have also tested 160 users, 4 cores, oplog disabled. I didn’t. Live with it.
  • Disabling oplog tailing seemed to allow the processing load to shift more to MongoDB. MongoDB appeared to be able to handle same more… gracefully.
  • I didn’t get very far with Modulus. I’m very interested in their offering, but I just couldn’t get users (test runs) through our app fast enough to make further testing worthwhile.
  • A DNS issue prevented capturing Kadira status while running on Modulus.
  • cluster lives up to its promise—adding cores and spreading load.
  • I don’t think we’re moving much data, but any reactivity comes at a price at scale (even our so far little bitty scale).
  • Our survey app could and should be modified to use much less reactivity since, as I mentioned earlier, it is unnecessary.

Server-side profiles

This is somewhat of an addendum, but I figured it might be useful.

Here’s what the Meteor Node.js process does when 10 users hitting our survey app running on one core.

Oplog tailing enabled:

Pie chart server profile with oplog<br /><br />
tailing

Oplog tailing disabled:

Pie chart server profile without oplog<br /><br />
tailing

Takeaways:

  • Note that these pie charts only show %CPU usage. CPU and network are the primary resources our app uses, so this is fine.
  • The profile data for each slice (when you drill down) are very low-level. It’s hard to make any quick conclusions (or I just need more practice reading these).
  • When oplog tailing is enabled, the Cursor.fetch slice is about twice as big, and none of the methods causing that CPU load are ours. Perhaps this is the oplog “tailing” in action?
  • When oplog taling is disabled, drilling into Cursor.fetch shows us exactly what specific methods of ours are causing CPU load. Even if oplog tailing is more efficient, this level of introspection was priceless. We need this until we learn to better debug patterns in our code that lead to more CPU when oplog tailing is enabled.
  • The giant ~30% slice of “Other” is a bit of a bummer. What’s going on in there? Low-level/native Node.js operations like the MongoDB driver doing its thing? Sending/receiving network traffic?
  • Kadira monitoring isn’t free CPU-wise, but it is worth it.
  • What should these pie charts look like in a well-optimized application under load? Perhaps the biggest slice should belong to “Other”?

Further reading:

Feedback/questions/comments/corrections welcome! I’d espeically love to hear about your experiences scaling Meteor.

protips: Tiny Tiny RSS on Bluehost

The Bluehost hosting account must be configured to use a recent version of PHP. After creating a subdomain, I had to delete the .htaccess file to make sure the latest version of PHP was used.

I used the periodic cron method to update my feeds. I used the “twice daily” common schedule. Here’s my command:

/usr/php/54/usr/bin/php-cli $HOME/public_html/www.example.com/tt-rss/update.php --feeds --quiet

The explicit path is required because tt-rss needs a recent version of php meant for the cli (for example, with register_argc_argv enabled).

Cloudflare might muck up the JavaScript or CSS or slow things down. I disabled it.

My Hadoop/MapReduce article in Linux Journal

I’m proud that LJ accepted my Hadoop/MapReduce article for the April 2013 issue! If you’re new to MapReduce and are interested in learning about same, this article is for you.

 

I’ll also be presenting a talk based on the article at LinuxFest Northwest 2013.

Web Framework Flavor of the Month

I’ve been playing with Meteor a bit lately. It’s a “kitchen sink” system for writing web apps, complete with a database (MongoDB), server-side (Node.js), and client-side stuff. It’s all JavaScript.

It’s pretty fun for little experiments. I can imagine certain kinds of websites it would be good for (web-based chat, HTML5 games, collaborative editors, and one-webpage apps — same stuff I think vanilla Node.js excels at) and some it would not (mobile, CRUD with an RDBMS). I’m wondering if it would/should work well with larger web apps.

I’m afraid of JavaScript, but I think it’s finally time for me to overcome that fear. What better way to do so than to use JavaScript everywhere (database, server, client, APIs)?!

Meteor isn’t the only game around, it’s just the one I’ve looked at.

You are NOT a Software Engineer!

I enjoyed You are NOT a Software Engineer! by Chris Aitchison. It’s a fun analogy. Writing software certainly does feel more like something roughly planned and growing organically or evolving rather than something perfectly specified and executed. And I think this is OK.

Another thing we coders often forget: we are also authors. We write code for humans (others and our future selves) to read. I want you to be stoked when you read what I write! And coding is writing.

Wanted: Simple 2D Game Framework

ab_dI want to write a simple kid’s game. It would show something like “A B _ D”, then speak “What letter is missing?”. If you hit the “c” key, it would say “Congratulations!”. If you hit any other key, “Try again!”.

Anyone have pointers on game-creation frameworks? I’m looking for something cross-platform and very high-level. I want to be able to write and play the game in a few hours max.

These look hopeful: ScratchLÖVE, RacketAlice, Pygame.

I want this crawl version to be as simple as possible. Eventually I might want to add score tracking and animations.

I could also create a web-based game that would work, say, in a web browser on an iPad, but this smells a bit more complex than I’m hoping for right now.

Here are a couple other related links I came across while poking around:

List largest MongoDB collections

I wanted to know the top five largest collections in my MongoDB database in terms of document count. This JavaScript gets the job done.

// config
var dbname = 'dev_bv';
var measure = 'count'; // or 'size'
var numTopCollections = 5;
 
function updateTopCollections(collection, stats, topCollections) {
    var thisCollectionObj = {
        'name' : collection,
        'count' : stats.count,
        'size' : stats.size
    };
    for(var i = 0; i < topCollections.length; i++){
        if (stats[measure] > topCollections[i][measure]) {
            topCollections.splice(i, 0, thisCollectionObj);
            break;
        }
    }
    if (topCollections.length < numTopCollections) {
        topCollections.push(thisCollectionObj);
    }
    if (topCollections.length > numTopCollections) {
        topCollections.pop();
    }
}
 
db = db.getSiblingDB(dbname);
var collections = db.getCollectionNames();
var topCollections = [];
 
for(var i = 0; i < collections.length; i++){
    if (collections[i].match(/^system/)) {
        continue;
    }
    var stats = eval('db.' + collections[i] + '.stats()');
    updateTopCollections(collections[i], stats, topCollections);
}
 
printjson (topCollections);

Save it to a file, edit variables in the config section, and execute like so:

mongo --quiet topCollections.js

Here’s a gist of same: https://gist.github.com/4150940