Skip to Content

Web 3.0 and Data Analytics – Part 3

Blog | November 24, 2022 | By Sachet Kashyap

Let’s start with the question, what is the use of data analytics in blockchain?

First and foremost, blockchains contain an unprecedented research corpus of financial transactions. Bitcoin’s blockchain alone contains more than 280 gigabytes of data. This data is interesting and insightful for both scientific analysis and commercial applications, for example to study user behaviour or answer economic questions. Available commercial blockchain analysis tools are often tailored towards specific use cases such as law enforcement investigations or insights for cryptocurrency traders. However, there is a lack of fast general-purpose tools suited for scientific analysis. More recently, there have been lot of technologies / tools built to solve this problem.  

When we talk about Data Analytics in conjunction to Blockchains there are two aspects we need to understand:

  • Reading Data from existing blockchains and do analytics on top of that
  • And building a new database using blockchains’ s distributed storage concept

Both of these aspects have separate applications and where the former tries to leverage traditional analytical tools whereas the later introduces a totally new database architecture.

Since we understand how a blockchain works with the help of Bitcoin which I covered in the Part 2 of this series, we know that Blockchain is a peer-to-peer system which stores data which is distributed across multiple systems. The data they store can be anything depending on the purpose of that blockchain, e.g., Bitcoin stores transactional data which is generated by transfer to its native token. Similarly, Blockchain can store transactions related to real estate which can act as immutable store of proof of ownership. Blockchains can also store Supply chain data which includes manufacturing logs, shipping logs, they can also be a store of Notary documents, identity documents, pictures etc. Blockchains can either store the data on-chain or off-chain. We usually store data off-chain if the size of the stored object is huge. Off-chain storage works when blockchain only stores the hash of the data which will act as a proof or it can store the address of the data e.g., Our data resides at IP:

Blockchain Databases:

Blockchains can only archive hundreds or maximally thousands of transactions per second. Inserting data is easy but querying is costly. Bitcoin writes blocks into .dat files. Newer blockchains, such as Ethereum uses LevelDB and other solutions. So, what are some of the storage alternatives: We can host the blockchain data in a traditional database e.g. PostgreSQL. We can use graph databases for transition networks e.g., Neo4J. Then there is another aspect of combining blockchain’s storage layer and build database capability on top such as Blockchain DB. Blockchain DB leverages Blockchains as storage layer and introduces a database layer on top.

A typical Blockchain DB Network

A Blockchain DB leverages Blockchains as storage layer and introduces a database layer on top. It extends blockchains by classical data management techniques e.g. Sharding as well as standardized query interface. (Sharding is a method for distributing a single dataset across multiple databases, which can then be stored on multiple machines. This allows for larger datasets to be split into smaller chunks and stored in multiple data nodes, increasing the total storage capacity of the system)

BigchainDB is an example of Blockchain based database. A BigchainDB design starts with a distributed database and through a set of innovations adds the blockchain characteristics: decentralized control, immutability and creation and movement of digital assets. The blockchain-based database is a combination of traditional database and distributed database where data is transacted and recorded via Database Interface (also known as Compute Interface) supported by multiple-layers of blockchains. The database itself is shared in the form of an encrypted/immutable ledger which makes the information open for everyone.

In actual case, the blockchain essentially has no querying abilities when compared to traditional database and with a doubling of nodes, network traffic quadruples with no improvement in throughput, latency, or capacity. To overcome these shortcomings, taking a traditional database and adding blockchain features to it sounds more feasible. That’s how the concept of blockchain-based database came into existence, which consists of multiple member clouds riding on two primary layers; the first one is Database Interface and the second one is the Blockchain Anchoring. The idea behind the blockchain based database concept is to complement the functionality and features of SQL and NoSQL databases with blockchain properties: data immutability, integrity assurance, decentralized control, Byzantine fault tolerance and transaction traceability.

Below are some examples of Blockchain based databases:

Blockchain Query Models:

There are lot of newer query models being built to read or query blockchain data.

For example, there is an SQL like language for querying Ethereum known as EQL – Ethereum Query Language. EQL is a query language that allows users to retrieve information from the blockchain by writing SQL like queries.

EQL Block Query Example


There are lot of blockchain start-ups and tools available in the market. Companies like Santiment and IncaDigital are monetizing the blockchain data and providing users with timely insights in the form of visualizations.

Trading firms use a combination of on-chain data and social media data to ensure that they find the most liquidity at the best prices, while also managing the risk involved in trading on centralized and decentralized exchanges. Start-ups like this provides trading firms with market data to analyse price and liquidity, blockchain data to add alpha to trading strategies, and social media data to analyse when an exchange goes down, has trade execution problems, or is hacked.

Regulators and exchanges use this data for cross-market surveillance and KYC to analyse dozens of venues for fraudulent trading patterns and identify the location of traders based on their social media use.

Special Operations Command uses this data in its Counter-WMD (Weapons of Mass Destruction) efforts to track purchases of dual-use materials and machinery on the dark web that may be used to build nuclear weapons.


The importance and relevance of blockchain analytics will continue to grow as more and more businesses and corporations adopt blockchain technology. It allows us to see beyond the artificial price movements or incomplete financial information and help us to understand what actually is happening on the blockchain. These data analysis techniques are necessary for ensuring successful regulations, error minimization, and the overall well-being of the system.

author image
About the Author
Data Storyteller and Business Intelligence professional with experience in supporting large scale Data & Analytics initiatives.
Sachet Kashyap | BI Technical Lead - Analytics | USEReady
Back to top