Arm

Thoughts from a grizzled developer on converting from relational databases to NoSQL (MongoDB) (Part#1)

October 14, 2015

I'm not sure what qualifies as "grizzled", but I've worked with relational databases for most of my 23 year career in tech. I've worked primarily as a designer and developer but have also worked for short periods as an acting DBA. I've worked with most platforms including DB2, Oracle, MySQL, Postgres, Sybase and SQL Server. The first reference book on DB2 that I used was written by Date & Codd, two pioneers of the relational model.

I want to describe our experience of migrating the Watercooler Analytics platform from a relational model (SQL Server) to a NoSQL model (MongoDB). I'm going to write this over a few posts since it's a long story that will hopefully help a few others embarking on the same journey. 

First, I want to summarize our experience. The migration was quite difficult but well worth the effort. In retrospect, it's a bit odd that so much of the tech world depends on a database architecture (relational) that's been with us since the late 60s. I would also say that I would be very reluctant to ever build another application using a relational database. These posts won't be an academic comparison of relational vs. NoSQL, but rather a story of why we switched and why it's a pragmatic option.

In Part#1 of this post, I want to describe why you might want to consider a NoSQL database. To me, the best database for a task is one that can model the data in the form closest to its "natural state". For example, imagine trying to model the human body --- the body has a head, torso, arms and legs. That's a start, but let's take it down a level --- an arm has a hand, a forearm and an upper arm. Now, let's go deeper --- a hand has fingers, fingers have ligaments, tendons and bones. You get the point. Now imagine, trying to model this in a spreadsheet or a relational database. It's not an easy task because the data is not naturally broken into rows and columns. Of course, one can break it into rows and columns but the problem is that there is no abstraction of the complexity --- every table and every column has equal "weight".

For the same reason that object oriented programming is useful since one can encapsulate (hide) complexity, the same applies to NoSQL databases. One is able to easily create a hierarchy of data that abstracts the complexity. From an object-oriented programming perspective, this is pure bliss since you're able to map the objects to the data without the classic "impedance" mismatch that occurs when mapping objects to relational databases. 

One of the issues we struggled with during our first efforts with NoSQL was deciding when to create new collections versus "embedding" attributes within existing collections. The rules we used were that we always chose embedding as the first alternative since, again, it was the most natural way to model the data. However, from a practical performance perspective, we do not embed a set of attributes if we feel the data volume is going to grow at a rate much faster than the parent attributes. For example, we might embed all sorts of attributes within a User collection that describes the user, but we would stop short of embedding the User in the Customer collection since we expect users to grow at a much faster rate than customers.

So, the primary reason you should consider NoSQL is that it is generally superior for managing complexity and, in most cases, for modelling real world data. Of course there are situations where data may lend itself better to the relational model. What I'm arguing is that NoSQL should be the default option for new applications and only in exceptional cases should relational databases be selected.

I'll close off this post with a pet-peeve. Nothing bothers me more than when I see developers using object-data mappers (ODMs) such as Mongoid to effectively recreate a relational database in a NoSQL database. This completely defeats the purpose of both types of database. You get none of the encapsulation benefits of NoSQL and your performance will be terrible since you're still relying on multiple joins which are not optimal in a NoSQL world.

In Part#2, I'll cover what we've learned about performance, data redundancy and tools.


Share this post:    


Recent Posts

Lead roi

The DMO Guide to: Getting Higher ROI from your Sales Leads

Sales managers run into questions on conversion all the time, ...

Love travel 12 1920

Analytics: For DMOs

Analytics is all about searching for patterns in data and, ...

69046a6b25db0c21ed636e7df739fd1b xl

Analytics: For Convention Centres

Analytics is all about searching for patterns in data and, ...

Min case study

Case Study - Meet Minneapolis

Consistently one the best DMOs in North America, Meet Minneapolis ...

Watercooler guy

The Watercooler Analytics Manifesto

When we set out to build the Watercooler Analytics platform, ...