Posted by & filed under HowTo, MongoDB Basics.

For those of you new to using MongoDB, MongoDB space usage can seem quite confusing.  In this article, I will explain how MongoDB allocates space and how to interpret the space usage information in our ObjectRocket dashboard to make judgements about when you need to compact your instance or add a shard to grow the space available to your instance.


First, let’s start off with a brand new Medium instance consisting of a single 5 GB shard.  I am going to populate this instance with some test data ( in a database named “ocean”.  Here’s what the space usage for this instance looks like after adding some test data and creating a few indexes (for the purposes of this article, I deliberately added additional indexes that I knew would be fairly large in size relative to the test data set):

Example space usage after populating with test data and indexes

How is it that 315 MiB of data and 254 MiB of indexes means we are using 2.1 GiB out of our 5 GiB shard?  To explain, let’s begin with how MongoDB stores data on disk as a series of extents. Because our ObjectRocket instances run with the smallfiles option, the first extent is allocated as 16 MB. These extents double in size until they reach 512 MB, after which every extent is allocated as a 512 MB file. So our example “ocean” database has a file structure as follows:

# ls -lh ocean/
total 1.5G
-rw------- 1 mongodb mongodb  16M Aug 20 22:30 ocean.0
-rw------- 1 mongodb mongodb  32M Aug 20 20:44 ocean.1
-rw------- 1 mongodb mongodb  64M Aug 20 22:23 ocean.2
-rw------- 1 mongodb mongodb 128M Aug 20 22:30 ocean.3
-rw------- 1 mongodb mongodb 256M Aug 20 22:30 ocean.4
-rw------- 1 mongodb mongodb 512M Aug 20 22:30 ocean.5
-rw------- 1 mongodb mongodb 512M Aug 20 22:30 ocean.6
-rw------- 1 mongodb mongodb  16M Aug 20 22:30 ocean.ns
drwxr-xr-x 2 mongodb mongodb 4.0K Aug 20 22:30 _tmp

These extents store both the data and indexes for our database. With MongoDB, as soon as any data is written to an extent, the next logical extent is allocated. Thus, with the above structure, ocean.6 likely has no data at the moment, but has been pre-allocated for when ocean.5 becomes full. As soon as any data is written to ocean.6, a new 512 MB extent, ocean.7, will again be pre-allocated. When data is deleted from a MongoDB database, the space is not released until you compact — so over time, these data files can become fragmented as data is deleted (or if a document outgrows its original storage location because additional keys are added). A compaction defragments these data files because during a compaction, the data is replicated from another member of the replica set and the data files are recreated from scratch.

An additional 16 MB file stores the namespace, this is the ocean.ns file. This same pattern occurs for each database on a MongoDB instance. Besides our “ocean” database, there are two additional system databases on our shard: “admin” and “local”. The “admin” database stores the user information for all database users (prior to 2.6.x, this database was used only for admin users). Even though the admin database is small, we still have a 16 MB extent, a pre-allocated 32 MB extent, and a 16 MB namespace file for this database.

The second system database is the “local” database. Each shard we offer at ObjectRocket is a three-member replica set. In order to keep these replicas in sync, MongoDB maintains a log, called the oplog, of each update. This is kept in sync on each replica and is used to track the changes that need to be made on the secondary replicas. This oplog exists as a capped collection within the “local” database. At ObjectRocket we configure the size of the oplog to generally be 10% of shard size — in the case of our 5 GB shard, the oplog is configured as 500 MB. Thus the “local” database consists of a 16 MB extent, a 512 MB extent, and a 16 MB namespace file.

Finally, our example shard contains one more housekeeping area, the journal. The journal is a set of 1 – 3 files that are approximately 128 MB each in size. Whenever a write occurs, MongoDB first writes the update sequentially to the journal. Then periodically a background thread flushes these updates to the actual data files (the extents I mentioned previously), typically once every 60 seconds. The reason for this double-write is writing sequentially to the journal is often much, much faster than the seeking necessary to write to the actual data files. By writing the changes immediately to the journal, MongoDB can ensure data recovery in the event of a crash without requiring every write to wait until the change has been written to the data files. In the case of our current primary replica, I see we have two journal files active:

# ls -lh journal/
total 273M
-rw------- 1 mongodb mongodb 149M Aug 20 22:26 j._1
-rw------- 1 mongodb mongodb 124M Aug 20 22:30 j._2

MongoDB rotates these files automatically depending on the frequency of updates versus the frequency of background flushes to disk.

So now that I’ve covered how MongoDB uses disk space, how does this correspond to what is shown in the space usage bar from the ObjectRocket dashboard that I showed earlier?

  • NS value, 48 MB — the sum of the three 16 MB namespace files for the three databases I mentioned, ocean, admin, and local.
  • Data value, 315 MiB — the sum of the value reported for dataSize in db.stats() for all databases (including system databases).
  • Index value, 253.9 MiB, — the sum of the value reported for indexSize in db.stats() for all databases (including system databases).
  • Storage value, 687.2 MiB — the sum of data plus indexes for all databases plus any unreclaimed space from deletes.
  • Total File value, 2.0 GiB –  how much disk we are using in total on the primary replica. Beyond the space covered by the Storage value and the NS value, this also includes any preallocated extents but not the space used by the journal

Given these metrics, we can make some simple calculations to determine whether this instance is fragmented enough to need a compaction.  To calculate the space likely lost due to fragmentation, use the following:

100% – (Data + Indexes) / Storage

In the case of our example instance, this works out to 17% (100% – (315 MiB Data + 253.9 MiB Index) / 687.2 MiB Storage = 17%).  I would recommend compacting your instance when the fragmentation approaches 20%.

Another calculation we can do is whether we need to add a shard to this instance based on our overall space usage.  To calculate your overall space usage do the following:

(Total File / (Plan Size * number of shards)) * 100%

For our example instance, this works out to 40% ((2 GiB / 5 GiB * 1 shard) * 100% = 40%).  We generally recommend adding a shard when overall space usage approaches 80%.  If you notice your space usage reaching 80%, contact Support and we can help you add a shard to your instance.

Jeff Tharp

Customer Data Engineer at ObjectRocket, enjoys big dogs and supporting Big Data after dark.

More Posts

Posted by & filed under Customer Success Stories, HowTo, Performance.

Appboy is the world’s leading marketing automation platform for mobile apps. We collect billions of data points each month by tracking what users are doing in our customers’ mobile apps and allowing them to target users for emails, push notifications and in-app messages based on their behavior or demographics. MongoDB powers most of our database stack, and we host dozens of shards across multiple clusters at ObjectRocket.


One common performance optimization strategy with MongoDB is to use short field names in documents. That is, instead of creating a document that looks like this:


{first_name: “Jon”, last_name: “Hyman”}, use shorter field names so that the document might look like:


{fn: “Jon”, ln: “Hyman”}. Since MongoDB doesn’t have a concept of columns or predefined schemas, this structure is advantageous because field names are duplicated on every document in the database. If you have one million documents that each have a “first_name” field on them, you’re storing that string a million times. This leads to more space per document, which ultimately impacts how many documents can fit in memory and, at large scale, may slightly impact performance, as MongoDB has to map documents into memory as it reads them.


In addition to collecting event data, Appboy also lets our customers store what we call “custom attributes” on each of their users. As an example, a sports app might want to store a user’s “Favorite Player,” while a magazine or newspaper app might store whether or not a customer is an “Annual Subscriber.” At Appboy, we have a document for each end user of an app that we track, and on it we store those custom attributes alongside fields such as their first or last name. To save space and improve performance, we shorten the field names of everything we store on the document. For the fields we know in advance (such as first name, email, gender, etc.) we can do our own aliasing (e.g., “fn” means “first name”), but we can’t predict the names of custom attributes that our customers will record. If a customer decided to make a custom attribute named “supercalifragilisticexpialidocious,” we don’t want to store that on all their documents.


To solve this, we tokenize the custom attribute field names using what we call a “name store.” Effectively, it’s a document in MongoDB that maps values such as “Favorite Player” to a unique, predictable, very short string. We can generate this map using only MongoDB’s atomic operators.


The name store document schema is extremely basic: there is one document for each customer, and each document only has one array field named “list.” The idea is that the array will contain all the values for the custom attributes and the index of a given string will be its token. So if we want to translate “Favorite Player” into a short, predictable field name, we simply check “list” to see where it is in the array. If it is not there, we can issue an atomic push to add the element to the end of the array (db.custom_attribute_name_stores.update({_id:


X, list: {$ne : “Favorite Player”}}, {$push: {list: “Favorite Player”}})), reload the document and determine the index. Ideally, we would have used $addToSet, but $addToSet does not guarantee ordering, whereas $push is documented to append to the end by default.


So at this point, we can translate something like “Favorite Player” into an integer value. Say that value is 1. Then our user document would look like


{fn: “Jon”, ln: “Hyman”, custom: {“1″ : “LeBron James”}}. Field names are short and tidy! One great side effect of this is that we don’t have to worry about our customers using characters that MongoDB can’t support without escaping, such as dollar signs or periods.


Now, you might be thinking that MongoDB cautions against constantly growing documents and that our name store document can grow unbounded. In practice, we have extended our implementation slightly so we can store more than one document per customer. This lets us put a reasonable cap on how many array elements we allow before generating a new document. The best part is that we can still do all this atomically using only MongoDB! To achieve this, we add another field to each document called “least_value.” The “least_value” field represents how many elements have been added to previous documents before this one was created. So if we see a document with “least_value” 100 and a “list” of [“Season Ticket Holder”, “Favorite Player”], then the token value for “Favorite Player” is 101 (we’re using zero-based indexing). In this example, we are only storing 100 values in the “list” array before creating a new document. Now, when inserting, we modify the push slightly to operate on the document with the highest “least_value” value, and also ensure that “list.99” does not exist (meaning that there is nothing in index 99 in the “list” array). If an element already exists at that index, the push operation will do nothing. In that case, we know we need to create a new name store document with a “least_value” equal to the total number of elements that exist across all the documents. Using an atomic $findAndModify, we can create the new document if it does not exist, fetch it back and then retry the $push again.


If our customer has more than just a few custom attributes, reading back all the name store documents to translate from values to tokens can be expensive in terms of bandwidth and processing. However, since the token value of a given field is always the same once it has been computed, we cache the tokens to speed up the translation.


We’ve applied the “name store token” paradigm in various parts of our application to cut down on field name sizes while continuing to use a flexible schema. It can also be helpful for values. Let’s say that a radio station app stores a custom attribute that is an array of the top 50 performing artists that a user listens to. Instead of having an array with 50 strings in it, we can tokenize the radio station names and store an array of 50 integers on the user instead. Querying users who like a certain artist now involves two token lookups: one for the field name and one for the value. But since we cache the translation from value to token, we can use a multi-get in our cache layer to maintain a single round-trip to the cache when translating any number of values.


This optimization certainly adds some indirection and complexity, but when you store hundreds of millions of users like we do at Appboy, it’s a worthwhile optimization. We’ve saved hundreds of gigabytes of expensive SSD space through this trick.


Want to learn more? I’ll be discussing devops at Appboy during the Rackspace Solve NYC Conference on Sept 18th at the Cipriani .


Jon Hyman

Jon Hyman is the Cofounder and CIO of Appboy, Inc. -- Appboy is the leading platform for marketing automation for apps. The company’s suite of services empower mobile brands to manage the customer lifecycle beyond the download.

More Posts - Website

Follow Me:

Posted by & filed under Company, Features, ObjectRocket Features.

Today, we’re excited to announce a new addition to the ObjectRocket platform - ObjectRocket for Redis. We love Redis at ObjectRocket - Redis is built for high performance, has versatile data structures and great documentation allowing developers to easily integrate Redis into highly scalable application stacks. We use it internally and so do many of our customers who have been pushing us hard to release a Redis Database-as–a–Service offering.
We built the service with many of the core features that customers have come to expect from ObjectRocket for MongoDB:

  • All instances are highly available with automatic failover of the Redis master to a replica in the event of a master node failure.
  • We built ObjectRocket for Redis on our own high performance infrastructure using containers to eliminate the noisy neighbor problems of traditional hardware virtualization and make Redis run as fast as possible. Also, we control the entire stack so we have more room for innovation and this also gives us far greater control if there is a problem.
  • ACLs – we embrace a secure-by-default approach at ObjectRocket and require network Access Control List (ACL) entries for every instance.
  • Free backups – we take snapshots of your data to insure you against data loss.

Customers can also focus on their business while feeling secure in the knowledge that ObjectRocket for Redis is backed by Redis Specialists 24/7/365. This is not just marketing speak – Rackspace, our parent company, also owns RedisToGo ( and we have vast experience managing and supporting over 42,000 running Redis instances.

Redis is currently available in our Virginia region, and will become available in more regions throughout August. ObjectRocket for Redis servers have high bandwidth and directly peer with networks like AWS so we’re only a few milliseconds away from your app servers, no matter where they run.

Over the coming months we will continue to release new features and functionality of the product. As always, please don’t be shy about giving us feedback.

Matthew Barker

Matthew Barker is a Product Manager on the Database team at Rackspace -- Overseeing ObjectRocket for Redis, a high performance, highly available & fully managed Redis datastore service & RedisToGo, a leading managed Redis datastore service.

More Posts - Website

Follow Me:

Posted by & filed under HowTo.

At MongoDB World last month MongoDB founder and CTO Eliot Horowitz announced support for pluggable storage engines scheduled for the 2.8 release. This is exciting stuff as it means mongo users will now be able to choose a storage engine that best suits their workload and with the API planned to have full support of All MongoDB features, while not having to give up any of the current functionality that they enjoy. Not only that, but nodes in the same replica set will be able to use different storage engines, enabling all sorts of interesting configurations for varying needs.

The great thing about MongoDB being fully open source is that we don’t have to wait until 2.8 is actually released to play around with these very experimental features. The entirety of the MongoDB source code can be cloned from github and compiled to include any experimental features currently being worked on.

In the example below I’ll show you how to build mongo with the rocksdb example storage engine presented at MongoDB world.

Starting from a freshly installed CentOS 6.5 cloud instance, we’ll grab the basic dependancies:

yum groupinstall 'Development Tools'; yum install git glibc-devel scons

Next will get the MongoDB source code from github:

git clone

Now all that’s left is to compile the source with RocksDB support enabled:

scons --rocksdb=ROCKSDB mongo mongod

Or speed it up by using the -j option to specify the number of parallel jobs to use, if you plan to dedicate the system your compiling on for the time being a good indicator is the number of cores in your machine +1, mine looked like:

scons -j 17 --rocksdb=ROCKSDB mongo mongod

It’s worth noting that pluggable storage engine support and the RocksDB engine are completely experimental at this point so there’s a good chance you’ll encounter errors and be unable to compile from master, that’s to be expected at this stage. If you’d like to keep an eye on how things are progressing the MongoDB dev mailing list is a good place to start.

Once the compile has finished you’ll want to start up a mongod process using the new –storageEngine parameter:

./mongod --storageEngine rocksExperiment

And finally you can test everything by connecting and inserting a simple document, then using db.stats(). You should see RocksDB statistics piped back to you if everything has gone as planned.

As you can see it’s fairly simple to get up and running with experimental features enabled. I’m very excited to see the pluggable storage engine code progress and see more new engines announced as we get closer to the 2.8 release.

Masen Marshall

Technical Lead at ObjectRocket, Chief Helper, Hacker, and Writer.

More Posts

Posted by & filed under HowTo.

MongoDB Inc. has introduced lots of great new enterprise features with release 2.6 of MongoDB, however, one thing still absent is a desktop application to manage your database. Introducing Robomongo, the cross-platform and open source MongoDB management tool. With the following instructions you’ll see how easy it is to integrate RoboMongo with your ObjectRocket MongoDB instance.

Let’s get started! First we’re going to need to note down some details from the ObjectRocket control panel:

  • Database connect string (note that the port is different for SSL vs non-SSL connections)
  • Database username & password

Download and install Robomongo for your OS of choice (at the time of writing the most current version is 0.8.4, which is the release I’m basing these instructions on).

Now open Robomongo. Initially you’ll be greeted with the MongoDB Connections box, click the Create link in the top left of the screen.

MongoDB Connections screen


After clicking the Create link above, you’ll see the following Connection Setting screen. I’ve named my instance ObjectRocket but you may want to use more specific naming if you have several databases.


In the Address field, enter the database connect string you noted down earlier. Remember that if you intend to connect via SSL, the target port will be different. Usually this is your <plain text port> + 10000, so for my example the plain text port is 23042 and the SSL port is 33042.


Connection Settings


Now select the authentication tab and add the user credentials you noted down earlier.

Authentication Tab

If you prefer to use SSL, select the SSL tab at the top and tick Use SSL Protocol. ObjectRocket doesn’t currently support SSL Certificates so disregard that box.

SSL Settings Now press Test to confirm the settings are correct. If everything works you should see a Diagnostic message box similar to below.


Diagonostic Message

Press Save to store your connection. Congratulations, you have successfully connected a great desktop MongoDB management application to your ObjectRocket instance!

But what if you’re using strict ACLs and you work from several locations or your home broadband does not have a static IP? You will have to keep adding your local (changing) public IP address to your instance ACLs in the ObjectRocket control panel before you can work with Robomongo.

Another method is to configure Robomongo to connect to your instance via a (Linux) server with a static IP (for example: one of your application servers, or a cloud server created to act as a proxy) using a SSH tunnel. The following instructions will guide you through the process.


First create yourself a user on a Linux server that has a static public IP. If this is not a server that is already allowed access via your ACL rule set, then remember to add this server’s IP address to your instance ACLs.


Generate a SSH public/private key pair and install the public part to the Linux server that will be our proxy host, an excellent article on how to configure SSH keys can be found here.


Now configure Robomongo to use our SSH proxy host and key.

SSH Connection Settings

Test your connection again, if the test completes without error press save to store your connection settings. You have successfully configured Robomongo to access your ObjectRocket instance via a proxy host over SSH.




Posted by & filed under HowTo.

JSONStudio and ObjectRocket, A match made in Java.


If you have ever worked with MySQL then you have probably used tools like PHPMyAdmin or MySQL Workbench to interface with the database and run ad-hoc queries or generate reports.  These tools have been around for a long time and have matured over time to become valuable tools for the day to day interaction with MySQL.  If you have ever searched for similar products for MongoDB then you should definitely take a look at JSONStudio by jSonar Inc.  It is a web based front end to interact with any MongoDB implementation and offers features like query generation, reporting and even data visualization.  JSONStudio is not just one tool but actually a suite of many different tools under one unified dashboard and I must say it’s list of features are impressive.  The best part about this suite of tools is that it interfaces seamlessly with any ObjectRocket MongoDB instance.


To get started, head on over to and download the free evaluation copy of the tool.  I installed the version for Mac OS X but if you have Linux or Window those packages are listed as well.  The installation guide can be found by hovering Resources in the navigation bar and selecting Guide.  This will take you to the documentation for the current version of the software.


Once you have the software installed and the web service up and running you should get to a screen that looks something like this:


JSONStudio connection pageAll the details you need to hook this up to an ObjectRocket instance can be found in the ObjectRocket Control Panel.  First log in to with your ObjectRocket username and password.  Once authenticated you should see a list of instances like so:



I am going to be connecting to my JSONStudio instance and looking specifically at my JSONTest database.  To get those connection details I first will click on my JSONStudio instance and then select the JSONTest database in the Databases section of my Instance Details page:

I then will need the SSL Connect String and a username from the Users section:


With the connection details in hand we can now connect JSONStudio to the instance.  Fill in the relevant information into the login page like so:

Since I am connecting over SSL I need to check the use SSL box as this passes the correct flag to the driver under the hood to make a secure connection.  I also chose the Secondary Preferred option so that my search queries will favor secondaries instead of the primary.  This can help with performance if the primary is under a heavy write load, but be aware, as mentioned in the MongoDB documentation, reading from secondaries can return stale data in certain circumstances.  Another thing to note is I selected to save the information I just entered such that I can quickly connect back another time.  When you save the datasource it does not save the password, so you will have to type that every time.


Once you hit Login you should see a screen very similar to this:



That should get you started with using JSONStudio by jSonar Inc. with your ObjectRocket MongoDB Instance.  If you run into any issues connecting to your instance please email and we will be more than happy to help you get connected.  Happy querying!

Posted by & filed under ObjectRocket Features.


For a number of months ObjectRocket has had a handful of customers helping our team develop integration with New Relic. Offering a suite of software analytics products, New Relic helps their customers gain actionable, real-time business insights from the billions of metrics their software is producing, including user click streams, mobile activity, end user experiences and transactions.

Today, we’re excited to announce the availability of ObjectRocket’s MongoDB plugin on the New Relic Platform, giving New Relic users increased visibility into their metrics from the ObjectRocket MongoDB service.


Screen Shot 2014-05-06 at 4.11.16 PM

This is the first in a suite of integrations and tools that help increase the ability for customers to peer deeper into the ObjectRocket platform. The plugin is a zero-install plugin—all you need to do is drop your New Relic account key into the ObjectRocket UI, and data will automatically start flowing into New Relic. The plugin is account wide, so each of your instances will start sending data once your account key is set.

So what data does the plugin expose? Well, here is a list:

serverStatus: opcounters.insert
serverStatus: opcounters.query
serverStatus: opcounters.update
serverStatus: opcounters.delete
serverStatus: opcounters.getmore
serverStatus: opcounters.command
serverStatus: connections.current
serverStatus: connections.available
serverStatus: locks.*
serverStatus: network.bytesIn
serverStatus: network.bytesOut
serverStatus: network.numRequests
serverStatus: cursors.totalOpen
serverStatus: cursors.timedOut
serverStatus: asserts.*
serverStatus: globalLock.*
summated for each db:

Installing the plugin

Installation is very simple.

  1. Get your New Relic account key here.
  2. Drop it into ObjectRocket here. (Be patient, it could take a few minutes.)
  3. Click on the tab in New Relic named ‘ObjRocket’.

Of course you will need accounts for both New Relic and ObjectRocket to make this all happen. Happy Graphing you data nerds, you!

Do you have a metric or class of metrics you want exposed? Hit us up, we would love to hear from you.


Kenny has 15 years of experience with various database platforms behind some of the busiest websites in the world. He has had roles as Architect, Director, Manager, Developer, and DBA. He was a key member of the early teams that scaled Paypal and then eBay, and most recently was an early adopter of MongoDB using it for various large projects at Shutterfly. He has been an active MongoDB community member, speaker, MongoDB evangelist, and now Mongo Master.

More Posts - Website

Follow Me:

Posted by & filed under Features.

We have a new look and feel for our Control Panel.

A number of weeks ago we decided based on customer feedback and our own wishlist to re-write our user interface for our web control panel from the ground up. We wanted to ensure we gave our customers a clean and simple control interface for the ObjectRocket service.

Screen Shot 2014-04-29 at 1.09.31 PM

The goal of this project was to simply convert our existing UI over to the new UI. However, there where a couple of items we couldn’t resist fixing. One of them was how we represent space usage. MongoDB has a multi part storage design, and we wanted to more accurately represent how an instance’s storage usage is broken down.

The core of these changes is tied to an internal Rackspace project that enables small teams and projects to quickly and easily incorporate the experience, and iterate quickly. We have been working internally with this very talented team to be the first Rackspace company to use this new UI framework. We couldn’t be more excited, and look forward to helping to push the project forward.

Some highlights of the new Control Panel are:

  • Consistent flow: Pages are organized in a logical drill down manner, and consistently implemented.
  • Space usage indicator: Graphical space usage breakdown across a cluster.
  • Cluster balancer indicator: Graphical shard balance indicator.
  • Dashboard location: Dashboards are now accessed from the instances menu and renamed to ‘Statistics’.
  • Flyouts with help on many pages.

We will be rolling out our new interface this week across the board. We hope you enjoy the new user interface, please don’t be shy about giving us feedback. If you aren’t already running your MongoDB database on ObjectRocket, sign up to check it out.


Kenny has 15 years of experience with various database platforms behind some of the busiest websites in the world. He has had roles as Architect, Director, Manager, Developer, and DBA. He was a key member of the early teams that scaled Paypal and then eBay, and most recently was an early adopter of MongoDB using it for various large projects at Shutterfly. He has been an active MongoDB community member, speaker, MongoDB evangelist, and now Mongo Master.

More Posts - Website

Follow Me:

Posted by & filed under Uncategorized.

We are excited to announce Automated Online Compaction on the ObjectRocket platform for MongoDB.

Automated Online Compaction allows MongoDB instances to be compacted online and in the background on the ObjectRocket platform. The application will only experience a replica set election in order to start using the newly compacted slave. Without this feature, applications experience extended downtime when a collection is compacted or when a database is repaired.

Compactions can be scheduled, and windows defined for when the final stepDown() takes place. Users can turn on the feature and not have to worry about MongoDB fragmenting over time. The instance is kept in a nice tidy form and it’s all automated.

All databases fragment over time, some worse than others depending on the underlaying design. In a generic sense; fragmentation occurs when deletes create spaces that new data or updated data can’t reuse. MongoDB fragments just like most popular databases. Even when using Powerof2Sizes we found that we spent a large amount of our DBA time working to keep customers databases compact. We felt that if we charge base on disk space footprint, it’s only right to help customers keep that footprint tidy. But the stock commands didn’t work for us because of they require service interruptions. Anything we built had to be automated, online, work in parallel at scale, and present a minimal impact to customers.

To this end, we have been working over the last few months to build this feature, and had to release a couple other components in order to make this possible. First, we built a component that allows the user to specify a window when they would like a stepDown() to be performed. Then we needed to build out a complete state machine for MongoDB replica sets. In order for this feature to work properly, our code needed to understand the state of all replica sets in a cluster at any given point in time, understand failures, and understand how to recover. We also needed a scheduler component to allow the scheduled stepDown and compactions. We needed to ensure we took into account backups being run, balancer activity, and overall availability impact to the cluster.

With that work done, we then could build a component that performed complete compactions of a replica set in the background, and almost 100% transparent to the user and calling application. The new feature is called Automated Online Compaction for MongoDB.

Here is how it works:

  1. User requests a compaction manually or they have set a compaction schedule.
  2. Per shard, a SECONDARY is selected for compaction, and compaction starts.
  3. Repeat for all remaining SECONDARY replicas.
  4. Wait until all shards are done.
  5. Wait for stepDown window, an election takes place and the PRIMARY becomes a SECONDARY. The PRIMARY is compact at this point.
  6. Finish up by compacting the previous PRIMARY.

It should be reiterated, while the compaction is done in the background on a SECONDARY, in order to rotate it to PRIMARY an election must take place. Users must be aware of this fact, it’s best practice anyway. Greg has some good thoughts about designing for elections in client code.

Under the covers, here is what we are doing:

We track the stepDown window (if defined) for the instance:

"stepdown_window" : {
    "scheduled" : true,
    "end" : ISODate("2014-04-24T14:16:00Z"),
    "ran_in_window" : false,
    "enabled" : true,
    "start" : ISODate("2014-04-23T14:16:00Z"),
    "weekly" : true

We keep track of each replica set member and it’s state. We update the metadata so we know the state of every slave. State is “syncing” then once completed is “compressed”.

"compression" : {
    "state" : "compressing",
    "balancer_stopped_by_check" : false,
    "updated" : "2014-04-23 20:57:49.449538",
    "shards" : [
            "state" : "compressing",
            "updated" : "2014-04-23 20:57:02.126299",
            "shardstr" : "fee6e6bfe55024e4ae92983d776ecd56/myserver1:27017,myserver2:27017,myserver3:27017",
            "members" : [
                    "state" : "syncing",
                    "updated" : "2014-04-23 20:57:49.449521",
                    "name" : "myserver1:27017"
            "updated_ts" : 1398286622

Once all the SECONDARY slaves are {state:”compressed”} we wait for the scheduled stepDown window:

"state" : "awaiting_stepdown"
"members" : [
        "state" : "compressed",
        "updated" : "2014-04-23 21:07:03.627440",
        "name" : "myserver1:27017"
        "state" : "compressed",
        "updated" : "2014-04-23 21:27:01.384174",
        "name" : "myserver2:27017"

And lastly, we compact the remaining SECONDARY (the previous PRIMARY):

"members" : [
        "state" : "compressed",
        "updated" : "2014-04-23 21:12:01.467046",
        "name" : "myserver1:27017"
        "state" : "compressed",
        "updated" : "2014-04-23 21:32:01.334504",
        "name" : "myserver2:27017"
        "state" : "compressed",
        "updated" : "2014-04-23 22:07:01.419483",
        "name" : "myserver3:27017"

In order to get started with Automated Online Compaction, navigate to your instances view and select the compaction button, then schedule a stepdown window on the settings page. You can optionally choose to run the compaction weekly as well. Additional information can be found in the documentation here and here as well.


Kenny has 15 years of experience with various database platforms behind some of the busiest websites in the world. He has had roles as Architect, Director, Manager, Developer, and DBA. He was a key member of the early teams that scaled Paypal and then eBay, and most recently was an early adopter of MongoDB using it for various large projects at Shutterfly. He has been an active MongoDB community member, speaker, MongoDB evangelist, and now Mongo Master.

More Posts - Website

Follow Me:

Posted by & filed under Company.

Calling Engineers, Developers and DBA’s! We are building something Amazing, come help! Hit us up at:


Kenny has 15 years of experience with various database platforms behind some of the busiest websites in the world. He has had roles as Architect, Director, Manager, Developer, and DBA. He was a key member of the early teams that scaled Paypal and then eBay, and most recently was an early adopter of MongoDB using it for various large projects at Shutterfly. He has been an active MongoDB community member, speaker, MongoDB evangelist, and now Mongo Master.

More Posts - Website

Follow Me: