Chat with us, powered by LiveChat

Data Storage: Cloud vs Private Blockchain


What is not known about Blockchain is that in addition to being a technology underlying cryptocurrencies and smart contracts, a blockchain can be used simply to store data. The issue is important, as the choice of storage has implications for system performance, as well as security. Currently, applications and software rely on either the cloud or an in house client/server architecture. To this we can presently add blockchain, although it is not yet seen as a conventional storage technology.  As blockchain becomes better known and more widely used, it is likely that there will be a set of use cases where it will become the default architecture used for developing applications. It is interesting to see how blockchain compares the existing data storage solutions, in order to better evaluate where these use cases can arise. 


The differences between in house storage and the cloud are smaller than they are with blockchain. They revolve mostly around the cost of investing in one’s own servers, rather than paying a licensing fee. There is also the security ramifications of outsourcing one’s data and business processes to a third party. From the point of view of a comparison with blockchain, however, these differences are minimal. For our present purposes, the main feature defining the cloud and in house storage is centralization. The data arrives from the individual user to a central database, which is managed by an administrator, who can make any modifications to the data, and enables user permissioning. Blockchain, however, is immutable and decentralized: data can only be added (not updated or removed), but also a copy of the database needs to be stored on each node in the network. Instead of having one centralized data repository queried by individual users, each user actually stores a copy of the database. Before proceeding further, it is important to note that we are here considering only private blockchains, which control access to the network. This is because in the context of data storage, we are generally speaking of one or several organizations that are seeking to run an application on top of a blockchain architecture, and do not need to have the blockchain be of public access. This removes several constraints that make the records addition process more cumbersome, like running mathematical calculations to validate transactions (proof of work). 


The obvious feature of a centralized database is its performance: thousands of transactions per second can be processed, and the storage capacity is almost infinite. As stated earlier, we are only here considering private blockchains, so the time to add and validate transactions is not much of a concern as with public blockchains. Therefore the main limitation to a blockchain is the size of the ledger. If the entire ledger needs to be stored on each node, then obviously the ledger must be limited in size, or else the device hosting the node will not have the memory to store ledger. As of June 2019, Bitcoin has exceeded 200 GB in size.

It is clear that having a small data storage is a huge limitation. There are many areas where this constraint would invalidate any use of blockchain, as the required storage capacity far exceeds anything that can currently be provided by blockchain. In this category we can think of payment processing or other types of financial information, like pricing data from a stock exchange. It is critical to note, however, that huge data volumes are far from a universal need. It is easy to think of businesses that work with much smaller amounts of data: any type of small scale retailer, restaurant, or manufacturer. In fact, it is easy to imagine that there would be more businesses that do not have large data storage needs. For those of us who work a lot with technology and computing, we spend so much time discussing the potential of groundbreaking technologies that we tend to lose sight of the real world. What does the business actually need? 


For small businesses, technological requirements and constraints will be very different. One typical issue is the lack of any substantial IT hardware and familiarity with technology. This means that on-premise storage is impossible, which leaves it with the cloud option, often hosted on mobile devices. The cloud still remains a centralized topology: there still needs to be an admin for the database. Who would it be at the level of the small business? It may seem a silly question, given the small scale of the tasks involved, but in practice it is a substantial issue. Is it the owner of the business?  They would typically have enough on their plate already, without having to worry about adding and deleting user profiles, or other tasks like supervising different types of permissions. A private blockchain would enable device permission, which is much more straightforward and practical. There could be a few devices (like tablets) in the shop, and there could be a few more devices chosen by the owner (like their own phone, computer, and that of a few family members) which could serve as other nodes in the blockchain. As a result of this, the owner has complete ownership of the data through their private blockchain, plus does not need to conduct administrative tasks. The owner does not need to worry about any falsification of the data, nor errors in privileging or access.

The previous reasoning is for private blockchains. If we were to extend the comparison to public blockchains, there is a host of other advantages (but also difficulties) that arise. The trustless nature of a public blockchain allows independent organizations to have a means of data validation, without each conducting reconciliation. As to the difficulties, the current scaling issues can be resolved by innovations like sharding