Basics about NOSQL - MongoDB
Nowadays we can see rise of NoSQL databases and their deployments. Main reason of article creation was to dispel doubts about NoSQL, MongoDB use cases. It's objective comparison between NOSQL and SQL technologies, to clarify unclear cases.
Advantages of NOSQL
NoSQL databases were created as solution for the limitation of standard, traditional relational databases technology. What it means? We'll find improvements and features which could not be find in relational databases and can't be implemented by developers, even if producer will add them. When you should think about NOSQL deployment, why it's worth to change database system and when it's worth to do that?
The most common advantages of NoSQL in specified deployments:
- Big Data - massive amounts of data, NoSQL and in our case MongoDB perfectly fits to your solution.
- Dynamic development - frequent, quick changes, you need to have fast and dynamic database, NOSQL is perfect.
- Open Source - most databases are free and open source, so you are don't obligated to buy license for your solution/project.
- Expandable - in your solution/project you will be able to easily scale, implement efficent architecture instead of traditional relational databases architecture.
In brief, as we mentioned in our earlier articles, NoSQL databases features:
- Less complexity
- Lower operating costs
- Easier, less costly scaling
- Better flexibility and agility
Limits of NOSQL
Unfortunately NoSQL is a relatively new technology. It's reason why in the market you will find more tools to relational databases instead of NOSQL. We're speaking about tools for querying, but also for migrate data between database, manage backup etc.. It's caused by youngness of most platforms. Of course this is only temporary problem. With NoSQL project growth - tools will also grow with them, as we expect, so this limit will be solved automatically in the near future.
It's something everybody wants, but most of them don't know how to reach full safety in their solutions and applications. To be honest it's hard to reach. In theoretical point of view, problems with security could appear on every technology. There can be and probably are security issues also on SQL databases. But why we talk about it as limit of NoSQL? Since the NOSQL become very popular. There is a whole bunch of new non-relational databases created by big companies, but also by one man armies. Smaller and less known databases can be more vulnerable. From our side, for business use we suggest to deploy only well known, mature solutions with vendor behind.
Schema flexibility could be a trouble
One of the peculiarities of NoSQL system is that they do not require a schema. In practices is the programmer that decide data structure in the moment he save it. So there is no place where it’s written how data is structured of what is the meaning of data. Even if you could easily recreate a db model starting from data relation using some automated tool, this could be something missing in traditional applications. Moreover, what if a bug occur? We know there could be situations where something is wrong with the code. Traditional RDMS are scaffolded, so in case you switch some fields or you are wrong with field format they protect you from inconsistency. In NoSQL case there is no help from the db, because without any schema defined, there isn’t any information about data should be saved: nobody can say if data is wrong or not. The worst side effect is that process bring lot of power and lot of responsibilities to the developer, that often doesn’t know all the process or the structure.
Moreover, even in case you now know what is saved where, do you think you’ll remember everything the next month? and the next year? Not all project are subject to continuous development, there could be business application that stay as-is for year, before we need to make some changes. Anyway in IT, company often commission a project to some supplier so this part have to be taken in account to ensure an easy handover at the end of the project, maybe asking for an accurate documentation about data is structured and what each field\collection means. Last problem related with schema flexibility is that every member of the team could not work in the project for all its life, so turnover is critical on little teams were not all the members have full knowledge of data structure or there isn't an adequate documentation.
Saving many nested data inside single documents you may lost analytic features like “SUM”, “COUNT” and so on. The bad things that this could not be a problem during first application development, but someone could ask later for some report, so what to do in this case? It is hard to change data structure after database is filled, and doing that could have may unpredictable effects due to the leak of a well defined data structure. Analytics is an hard point for NoSQL.
Moreover, while there are may commercial tool you can connect to your traditional db to manage analytic part, there is a limited support for NoSQL systems.
Another solution could be taken is to replicate some sort of "relationship" with unstructured data inside NoSQL db, maybe creating many collection and linking object one with others. If you are planning to follow this path to allow analytics reporting, keep in mind that this could slow down performances to be comparable with standard SQL systems.
t is important to specify how comparison are made. First of all I needed to place both solution in same condition. This means, for example to use the same hardware and have same level of tuning. So I installed MongoDB (last version) and SQLServer Express on same machine. Because we are not interested on performance inside the database itself I built my benchmark using C# code basing on standard framework.
Upon this two way to save data everything is shared (entities, logic, data generation) to ensure equity.
The list of all operation we will compare:
- mass insert
- transaction (or better, in NoSQL case, transaction simulation)
Mass operation on a single entity
This benchmark consist in a big set of object to insert to be performed in less time is possible. This test is replicated using a growing number of items to save to prove how performance scale in both systems. This benchmark measure execution time in ms and insist on a single table\collection.
This benchmark is focused on query feature. We separate following pattern:
- CASE 1 get one entity using primary key: this pattern is used to fetch a single entity from db using is unique identifier
- CASE 2 full scan with fail: when you are looking for a deleted element and database have to scan all index before reply “no”.
- CASE 3 Paged query: a complex query where you have some filters, one order condition, and you want to take just a page of data.
I created some benchmark simulating different ratio of patterns above. In example the first benchmark assume 5% of queries of kind 1, 70% of kind 2 and 25% of kind 3. This benchmark measure execution time in ms. This benchmark insist in a single table \ collection.
You can find all the code used to perform these test on git-hub.
|CASE 1||CASE 2||CASE 3|
First test is on a “small” set of data, about 2.500.000 rows.
Second test on a “bigger” data set, about 5M rows.
This benchmark highlights a big improvement of performance on query over index, but when MongoDB is used to read set of data the gain is reduced and kept stable over data increase.
We know transaction on NoSQL world are mostly unsupported. We also understood that renouncing to transaction could benefit on performance, an question is: how much I gain from this? I built this benchmark to compare insert in transaction of one master row related with many child. Benchmark is focused on execution time, expressed in ms.
|#transactions||SQL (ms)||NoSQL (ms)|
This benchmark is focused on analytics. Suppose we have a categorized master-detail data model, where you want to:
- export: whole join over all data tree
- report: sum all items in category for all category, i.e. give invoice amount for all customers
- KPI: sum all mater totals summing detail subtotals
On a base of 4M row after inner join:
|test||SQL (ms)||NoSQL (ms)|
Remember NOSQL aren't replacement for SQL databases. It's alternative in cases where relational databases are an inefficient. There isn't best solution, each can be better in different projects. Is the market ready for NoSQL? It is question to developers. Most of them were learning for years to think of data in relational way. How can they change mind in few moments? It isn't easy for them when they are working on many projects in the same time and when some of them are SQL and other are non relational. It's not the problem, which should cross out chances of NoSQL deployment.
In the case of e-commerce stores we espiecially recommend to use NOSQL databases. Just check our GrandNode performance comparison to another e-commerce solutions.
In the article we used parts of article created by Daniele Fontani from article about NOSQL on codeproject.com available on the Creative Commons Attribution 3.0 Unported License here.