Skip to content

Sysadmin Sunday 81

May 27, 2012

This is Sysadmin Sunday, a post of interesting links from throughout the previous week.

Subscribe to our RSS feed and follow us on Twitter for interesting links throughout the week.

Goodbye global lock – MongoDB 2.0 vs 2.2

May 23, 2012

Perhaps the most oft-cited problem with MongoDB is the infamous global lock. In general terms, this means that the entire server is locked when you perform a write operation. This sounds bad but is actually blown out of proportion compared to the real world in production impact. It has been improved over the versions and MongoDB 2.0 includes significant improvements in relation to how the server will yield for these kinds of operations.

Indeed, in our own setup we used to throttle inserts through Memcached before going into MongoDB 1.8 but having upgraded to 2.0, we were able to eliminate the throttling and insert directly to MongoDB. This improvement is illustrated well by some benchmarks at the end of last year comparing v1.8 to 2.0.

FaultingWrites

Nevertheless, a major focus for the upcoming 2.2 release has been removing the global lock and introducing database level locking as an initial step towards collection level locking and potentially even more granular concurrency in future releases.

There are 2 parts to the improvements in v2.2:

  • Elimination of the global reader/writer lock – database level locks as the first step.
  • PageFaultException architecture – yield lock on page fault.

The first is the true database level locking but may require some architecture changes to your application e.g. if you use one large collection it makes little difference because you’re still writing to a single database. However, the second improves concurrency within a single collection and it is this that will likely provide immediate benefits for users upgrading.

Dwight Merriman, CEO of 10gen and one of the MongoDB original authors gave a good talk at MongoSF about the internals of these changes so it is recommended that you watch the video explaining both of these points.

10gen do not provide official benchmarks because they tend to be irrelevant to real world usage. For your own purposes you should use something like benchRun to see how your queries will be affected by upgrades. That said, benchmarks can be useful in certain situations, such as to demonstrate these kind of differences between versions.

Rick Copeland did some excellent benchmarks to look at the improvements between v1.8 and 2.0 so I decided to run them against MongoDB 2.1(.1) as well as 2.0 and 1.8. Remember that v2.1 is the development version which will turn into the 2.2 stable release.

Comparing v1.8, v2.0 and v2.1

I used exactly the same code that Rick used in his original benchmarks by launching his AMI on Amazon EC2 (m1.large), set up the same database and ran the benchmarks for faulting reads and writes. I didn’t run the non-faulting benchmarks because the major changes in 2.2 are to do with how page faults and locks are handled and reading/writing to memory isn’t going to provide any interesting differences – it’ll always be fast! And when you get to the large data volumes MongoDB is supposed to be good at, you’re unlikely to have all your data + indexes in RAM and instead be using the working set in memory concept.

MongoDB Faulting Ops/s

In the above graph I’m essentially reproducing Rick’s results then adding the MongoDB 2.1 tests. The difference is significant – there is no dropoff in performance regardless of the number of faulting writes against reads. The reason for this is because the global lock is completely gone, which is illustrated by the graph below.

MongoDB Faulting Lock

Here, I took Rick’s experiment further to collect statistics from mongostat in order to understand what is happening on the mongod and throughout the entire run the time spent in the global lock was 0%.

Conclusions

The global lock is gone in MongoDB 2.2 which offers major improvements if you use many databases, but the real impact for anyone upgrading is how yielding works with the PageFaultException improvements. This is because of the way MongoDB will detect the page fault and touch the page before the mutation has occurred during the write.

The first graph shows that MongoDB is able to maintain consistent performance with no drop off during the tests.

Since Rick’s code just does queries against a single DB these benchmarks are showing the improvements just from the second PageFaultException improvements, which is probably what most people upgrading will be interested in. It would also be interesting to benchmark activity across multiple databases across the versions to see how that has improved.

Sysadmin Sunday 80

May 20, 2012

This is Sysadmin Sunday, a post of interesting links from throughout the previous week.

Subscribe to our RSS feed and follow us on Twitter for interesting links throughout the week.

Sysadmin Sunday 79

May 13, 2012

This is Sysadmin Sunday, a post of interesting links from throughout the previous week.

Subscribe to our RSS feed and follow us on Twitter for interesting links throughout the week.

Removing Memcached because it’s too slow

May 11, 2012

We’ll shortly be deploying some changes to the Server Density codebase to remove Memcached as component in the system. We currently use it for 2 purposes:

  1. UI caching: the initial load of your account data e.g. server lists, alert lists, users lists, are taken directly from the MongoDB database and then cached until you made a change to the data, when we invalidate the cache.
  2. Throttling: the performance impact of the global lock in MongoDB 1.8 was such that we couldn’t insert our monitoring postback data directly into MongoDB – it had to be inserted into Memcached first then throttled into MongoDB via a few processor daemons (as opposed to larger numbers of web clients).

Performance map

This has worked well for over a year now but with the release of MongoDB 2.0, the impact of the global lock is significantly reduced because of much smarter yielding. This is only set to get better with database level locking in 2.2 and further concurrency improvements in future releases.

We’ve already removed throttling from other aspects of our codebase but our performance metrics show that we’re now finally able to remove Memcached completely, because directly accessing MongoDB is significantly faster. Indeed, our average database query response time is 0.43ms compared to 24.2ms from Memcached.

Database throughput

Response time

We have a number of MongoDB clusters and these figures are for our primary data store where all application data lives (separate from our historical time series data). There are x2 shards made up of x4 data nodes in each shard, x2 per data centre (Washington DC and San Jose in the US). They are dedicated servers running Ubuntu 10.04 LTS with 8GB RAM, Intel Xeon-SandyBridge Quad Core 3.4Ghz CPUs, 100GB SSDs for the MongoDB data files and connected to a 2Gbps internal network.

Nodes

Removing Memcached as a component simplifies our system even further so our core technology stack will only consist of Apache, PHP, Python, MongoDB and Ubuntu. This eliminates the need for Memcached itself running on a separate cluster, the Moxi proxy to handle failover, additional monitoring for another component and a different scaling profile. Getting memcached libraries for PHP and Python is also a pain if you want to use officially supported packages (through Ubuntu LTS) especially when you want to use later releases. And we can get rid of that extra 24ms of response time.

Sysadmin Sunday 78

May 6, 2012

This is Sysadmin Sunday, a post of interesting links from throughout the previous week.

Subscribe to our RSS feed and follow us on Twitter for interesting links throughout the week.

Native to web iOS apps, or there and back again

May 4, 2012

Smaug

This is an interesting tale of startups, resource allocation, the allure of write once/deploy anywhere and the reality that native is always better if you want a properly native looking app that performs well. On 16 July 2009 (a time between the Dawn of Færie and the Dominion of Men) we submitted the first version of our iPhone app to Apple. This was just a few months after the founding of Server Density in April 2009 and quickly became a major selling point for our hosted server monitoring tool, with push notifications particularly popular.

The app was written by my co-founder, Harry Wincup, using the project to learn how to write Objective C and iPhone apps for the first time including writing our own custom graphing engine.

iPhone v1

iPhone v1

The main issue we faced was allocating time to work on improving the app. Since it was just the two of us in the whole company, after the app was released we moved onto other areas of development, although we did release several bug fixes in the following months. Harry was also our only designer and frontend coder and was overworked on new features for our main web UI. We hired a backend engineer in Dec 2009 and our team then remained just 3 until around March 2011 when we closed an angel round and started hiring more engineers.

Since most of our development was using web technologies – backend PHP and Python and frontend CSS, JS, etc – our team included more people with those abilities but we didn’t have the resources to hire a dedicated iOS engineer and so kept the app mostly in maintenance mode. With the fast development of tools like Phonegap in 2011, we started playing around with them to see if it’d make sense to create our apps in HTML and CSS so that anyone in the team could work on them. And in Nov 2011 we released v2 of the app rewritten using Phonegap.

The idea was to make it easy to add new features and deploy to both Android and iOS from the same codebase. Indeed, that worked well as several engineers were able to work on the app alongside Harry. However, there were many quirks we had to resolve during development including implementing our own custom view stack controller to make the app appear as native as possible.

iOS v2

iOS v2

Then this year a new member of the team, Rob Elkin, joined us to work on an internal project but as he also had extensive iOS development experience he set about rewriting the iOS app to see how quickly it could be done natively. This was because of a number of problems we found after releasing the app – a number of bugs within the HTML/CSS/JS implementation but the biggest issue being performance.

The current iOS app works nicely if you have a small number of alerts, devices and services. However, it lags horribly once you get over a certain number. This would’ve been fine back in 2009 but we’re now seeing much larger deployments and not only do we need a responsive UI, we need to improve the management of items on screen. This is easy to do with a native UITableView because you can include search, indexes and it’s much easier to optimise for speed even with custom elements. It is possible to create custom plugins natively but then why not just implement it all natively in the first place (some good technical points about this here).

Unoptimised v2 iPad app

Unoptimised v2 iPad app

Having reimplemented the app internally with great results, we decided to run this as a proper project. Rob was joined by one of our designers, Daniele, and time was allocated to work full time on v3 of the app. And also do a proper iPad version. The existing app supports the iPad but it’s really just scaled up whereas we wanted to do a proper iPad customised UI.

We also thought it’d be easier to use our existing graphing within Phonegap because it’s already JS based, but in reality native Objective C libraries are no significantly better and run optimised for the iOS hardware. We removed graphing in our v2 release but it was a big part of our roadmap for the next releases, so although it’s not going to be in the upcoming v3 release, it is planned for 3.1 and is next on the todo list. Especially for the iPad.

This isn’t to say that you should never use Phonegap, but that different priorities apply at different times. If you have the budget for dedicated platform engineers then my opinion is that you should always implement a native solution.

So what’s new in Server Density for iOS v3?

v3 has been reimplemented natively so is significantly faster and more responsive, especially with larger numbers of items in the list views. We’ve also focused more on alerting to make it easier to see open and recent alerts, and their current statuses. And we have a fully optimised iPad UI. We specifically built it to be similar to the existing app to avoid jarring changes so most of the improvements are behind the scenes. Screenshots show it best.

We expect to submit the app to Apple next week.

v1 = Native
v2 = Phonegap (HTML, CSS, JS)
v3 = Native

Server list app comparison

Server list app comparison

Device list app comparison

Device list app comparison

Alerts list app comparison

Alerts list app comparison

Services list comparison

Services list comparison

New iPad app devices

New iPad app devices

New iPad app alerts

New iPad app alerts