PPIO Code Talks is an open platform for high-quality presentations and discussions on blockchain technology with the aim of engaging the community and spreading ideas. The following is an adaptation of a presentation given on November 26 by Xianbo Yang, CTO of Rensenergy, on the subject of what underlies the development of a blockchain platform.
What Is Blockchain?
It’s literally block + chain.
- A block is a base where data is stored, a chain keeps them connected
- In a blockchain system, data cannot be blocked or modified after being linked together
- The essence of blockchain is an immutable ledger. Note: I didn’t add “shared” or “distributed” here. However, from the perspective of blockchain, it does not include how the ledger is generated and how it is used, which will be discussed in detail later.
- Data is stored in the form of blockchain. When we solve a business-related problem, we can use blockchain technology (but not always)
- Blockchain is not a new technology innovation, rather it is an innovation in the use of technology and data storage methods, which can be used to solve some existing problems
The Application of Blockchain
People in the blockchain industry are very lucky, as long as a business is related to storing data online, you can solve their needs with blockchain. Of course, not all businesses need blockchain, the use of blockchain will typically depend on a business’s pain point.
We are still in the early stage of blockchain, and we are very lucky to be able to do many things at this stage.
In fact, blockchain storage can be realized in the form of a database or a file. For example, where does your data exist? The key to your media database or relational database is to see how you store the accounts. Existing enterprises have a large number of legacy business systems, most of which are based on relational databases.
In this scenario, you may choose to store your blockchain solution with a relational database, so that your ledger will be easily read by most people.
Types of Blockchains
- Public Chain
- Consortium Chain
- Private Chain
A private chain is operated or owned by a private entity that controls the chain. The question is, is a private chain a single node or multi-node? If it is a single node, can it be made into a blockchain?
A single node does not have a shared ledger to synchronize nodes. There is only one ledger, and no one else can supervise it. In this case, you need to publish your hash value in real-time. Once someone wants to see if there is any problem with your data, you can take out your hash value, take out the data in your account book, and use a Merkel tree to prove that your data is actually in your account book.
- On your official website, you might broadcast data daily or offer real-time information
- Or you can publish the hash value of a certain company in the newspaper every day
- Or you can write them on the Ethereum chain to prove consistency and fairness
When we talk about blockchain platforms, we often hear about certain protocols. This isn’t unlike when we look at open-source code and see things like Go or Java.
In fact, a blockchain protocol must define what a blockchain looks like. When developing a blockchain platform, a blockchain protocol is the rule of conduct for a blockchain platform. What does a blockchain protocol typically include?
- How transactions are handled and validated
- Who are the block producers
- How do nodes interact?
- How is data broadcasted and how is the ledger data synchronized?
- The programming interface of the application
Note: There are internal interfaces (which are interfaces between nodes), and external interfaces. If other blockchains want to call upon your blockchain’s data, then an external programming interface is needed.
The most well-known blockchain protocols are
The following are all the modules needed for building a blockchain protocol.
Account Model: When you design a blockchain protocol, what should the account model include? What are the rules for generating your account name? What is your account type? Are you UTXO-based or balance-based?
Encryption Algorithm: What encryption algorithm is used on the platform?
Data Architecture: Each block is required to have a block header. What should be placed in the block header?
For example, the hash of the previous block, some random values, the hash of the current block, and so on. The current height, transaction data structure, a transaction and all data generation of blockchain are generated through transactions. Transaction doesn’t only refer to something monetarily related. An example of a non-monetary transaction would be if I send you a message. This is also a type of transaction.
Node Communication: How do we achieve communication across nodes? If we are developing a public chain, we need to implement a p2p network. How does a p2p network transmit data to each other? Methods include HTTP, a socket, or GRPC.
Consensus Mechanism: Your blockchain platform is a public blockchain platform. You can employ consensus mechanisms such as POW, POS, and POA. If it is a consortium chain, you can use methods like PBFT / RAFT.
Smart Contracts: Smart contracts are the contract language of most common blockchain platforms and include the contract language and the operating environment. For example, Ethereum uses the object-orientated language Solidity to write smart contracts. Ethereum has a Virtual Machine (EVM) as the runtime environment for smart contracts on its platform. Another example is Hyperledger which uses Docker to run Go and Java language codes.
Account design is the first thing to develop when designing a blockchain platform.
The first thing when designing is to choose an ellipse encryption algorithm.
Ellipse algorithms like curve 25519 can be used among several other options. The reason an ellipse algorithm is needed is that your account name is actually your public key.
The account name is not a simple public key, it is in fact calculated from the public key. Furthermore, you can’t know what your public key is just by your account name. The purpose of the elliptic curve encryption algorithm is that when I provide my private key, I can calculate the public key through the private key without any redundant data. Only then can I calculate my account name through the public key.
You either have to design a set of such mechanisms yourself, or you can refer to some existing practices. Generally speaking, the account name consists of three parts. This also includes your account type, which is usually represented by the first letter of your account name. There are in fact many types of accounts. The key is to see what kind of platform you want to design. The most common type is used by Ethereum — it has external accounts, which are accounts controlled by the account owner, and smart contract accounts. You can use 0 or 1 to identify it, or you can use another character to replace the genesis account — the original account. Because the genesis account is special, it can only be sent to the starting currency, and it can only be issued once.
As it’s at the base layer of the blockchain, the genesis account is not allowed to conduct transactions. You can also use the genesis account as a ‘black hole account’. For example, if you want to destroy some tokens, you can directly send the token to the genesis account, because the platform will not allow the genesis account to make any transactions.
Another part is that of public-key mapping. Part of public-key mapping is that we need to convert the private key to the public key, and then display the public key on our account name. The public key is generally calculated by a hash value. After hashing, the mapped part of the public key can be converted by base58.
Why should hash be converted by base58? Base58 is user-friendly. It will not mix up 0 and O, or say that 1 and L are the same. It wouldn’t be effective if it did.
Because account names are usually very long, when a user needs to manually enter these account names, it can be hard to guarantee that the inputted name is correct. If there is a check digit, then look to see whether the account name is correct.
The rules for generating the check digits can actually be obtained by hashing the contents of the first two parts and then taking the first few digits of the hash value as the comparison digits. This is a typical operation of BTC.
First, hash the public key, and then generate it with base58.
Account generation is quite special, it does not require any system, it can completely generate accounts offline, and the rules of generation are completely independent of your platform.
When an account is generated, you are usually prompted with mnemonic words or phrases — kind of like a random encrypted sentence to store your private key. When you create an account, the wallet will give you about 20 words to remember. This is because those words are taken as a text, and then later are taken to the background in order to be converted into their own sequences. Finally, these sequences are hashed and encrypted.
The obtained byte sequences are used as seeds to generate your private key. With the private key, the public key and account name will be there too.
Where did these words come from? There is a password book in the background which lists about 100 words. When you click to create an account, 20 words from that pool of 100 are randomly selected and are displayed. We get the password from there.
The question is, is it safe? What if someone comes up with the same secret words and uses them to get the same private key of someone else. If someone did that, then all their privately stored data, maybe even some cryptocurrencies, belong to someone else.
Well not quite. Actually the system is actually very safe.
Why do we show the password and the password book? Well from those 100 words, if there are more than 20 words that need to be combined, what’s the probability that a collision attacker can correctly guess the private key?
How many possibilities are there? There are more possibilities than there are atoms in the whole universe. It’s like finding a needle in a haystack, or rather the equivalent: searching for an atom in the cosmos.
Do we need a password to log in to the system? It is not necessary because your private key actually means that you control everything.
How do you log in to this system? Well, you don’t need to log in to the system. You only need your private key to sign when you send transactions with your account. Why? All accounts on the blockchain are open. In principle, you can log in to anyone’s system, for example, you can look at the block browser and find someone’s address. You can then log in to his system with his address.
As all the data is public, you can log in and see that your friend has 100 digital assets. However, it will be meaningless to you because you do not have a private key.
So when we are building a blockchain platform, we don’t need to consider another password system. You just need to choose an asymmetric encryption algorithm.
Choice of Encryption Algorithm
Asymmetric encryption algorithms are very important in the blockchain. Why is it asymmetric? Because symmetric encryption cannot do signing and checking; instead you must confirm that the transaction sent is your real subjective transaction. The signature and checking of transactions in the blockchain must use the encryption system to encrypt the transaction data.
Note: By encrypted transaction data, we mean symmetric encryption.
Why is symmetric encryption also reflected in asymmetric encryption?
For example, if I want to send a private message to you, I must encrypt it. Otherwise, everyone else will see the result calculated by my private key and your public key, and they will also see the results calculated by your private key and my public key.
You can encrypt the messages based on the calculated symmetric encrypted key.
Let’s say that if I sent you a message, why not just encrypt it with your public key? Or if you want to send me a message, use my public key? The reason is that if you do this when another person sends the same message to you, you will find that you are using the same encryption method to communicate.
The encrypted key does not retain the sender’s information, it only uses the receiver’s public key to encrypt. The encrypted data does not know who encrypted it. There is also a hash algorithm that is also very important in the blockchain. For example, we need to use a hash algorithm for all hashes, such as SHA256. We can also choose SHA3. In fact, this is quite open and depends on our own choice.
Transaction Signature: Why does the transaction signature use a hash algorithm? Instead of signing all the bytes of the transaction, you must first calculate a hash value for the content of the transaction, and then sign the hash value. So the hash algorithm is also used in the signature action.
Consensus Mechanism: When competing for block producers, the hash algorithm is actually used to calculate the probability of regular block production and the rules for verification.
Can different encryption algorithms be supported in a blockchain platform? At present, all blockchain platforms must choose an encryption algorithm.
When making a blockchain platform, one aspect you must consider is the transaction structure. When sending a transaction, you have to sign the transaction. The signature is an applied asymmetric cryptographic algorithm, but it’s better that an extra character is added to identify the property in the transaction structure. We can use the elliptic curve x encryption algorithm. Another aspect to consider is SM 2. In addition, in the data structure of the account, you may also need to add an attribute account that calculates the public key through the private key. Which algorithm is used to calculate the public key must also be considered.
Data Structure and Storage Design
Data structure includes your account data structure, block data structure, transaction data structure, and a node data structure. If it’s a relational database, this will mainly be about some database tables that you design. These database tables include account tables, block tables, transaction tables, and a node table.
We can choose any data storage method according to our needs. Data storage can be separated from a relatively independent place. You can make an interoperatable design for the consensus mechanism of a blockchain platform and data storage in the form of an API. In this way, the platform or solution you make can meet different business needs depending on how they need to be used.
Currently, most of them are NoSQL databases, such as RocksDB and LevelDB. You can even choose Oracle. One of the advantages of an embedded database like RocksDB and LevelDB is that it can be run when the performance requirements are low. Their purpose is to let more people use it. Once again, different solutions can be used depending on the business’s needs.
When we try to solve a business scenario, we don’t need to hurry to request data. We can use traditional enterprise computing architecture to design the bottom layer of our blockchain, and we can run an Oracle database.
Design of a Node Communication Protocol
Node communication means that you can access it and it gives you feedback. You can use HTTP, Socket, or GRPC to compute. GRPC may be a little more efficient than HTTP.
How Do We Deal with Byzantine Nodes?
To build a blockchain platform, first, we define the seed of a node, which is written in the code. You can write down all these nodes. The problem is that these nodes are not necessarily good nodes. Some nodes may be powered on only once and then turned off.
There are also some nodes that may be dedicated to evil. They can give fake data or wrong data. This can include data that is not reliable or if data given is different from the data given by others. In this case, we need to design a mechanism to verify the data is wrong, and a mechanism of punishment to deal with bad nodes.
When you define the database table of a node, there should be a field where you mark what kind of node it is. When it is a Byzantine node, you may have to punish it. The general method is to put that node into a blacklist. This blacklist doesn’t mean that the node is stuck there permanently; a release mechanism of sorts will need to be designed too.
Because in some cases it may be a good node even though in this instance it was marked as a Byzantine node. For example, in the case of a bad network, the node may not have been online, or the data may have been lost in the transmission process. For a fair design, these nodes should not lose access permanently. To ensure fairness, a release mechanism is necessary. If you check the node again and it is still not performing fairly or in a just manner, then we can increase the length of time it is blacklisted before ultimately concurring that the node is a bad player.
Also, you need to factor the performance of each node to your node settings, because some nodes may be far away from you — some may be in your area and some might be on the other side of the world. If the nodes are marked, they can pick the best nodes for you and connect with them.
The Design of Consensus Mechanisms
The design of the consensus mechanism is actually a simple problem to account for. If all our nodes are equally important, who writes to the ledger? Some consensus mechanism like Proof-of-Work is not recommended as it requires too much computing power in return for low performance.
Proof-of-Stake is a typical consensus mechanism. This kind of consensus mechanism is generally suggested for use (as well as Practical Byzantine Fault Tolerance or PBFT) as they are generally accepted by everyone.
The purpose of the smart contract is that you upload a piece of code execute it automatically. This code is available for each node. If the amount of this balance exceeds 10 units, you must let that person transfer excess tokens to another account. Then everyone can read your code, and then execute the smart contract in advance to verify what the result is after executing the smart contract. This is the simplest example of a smart contract.
The smart contract actually requires you to upload a contract.
What is the purpose of this contract?
The contract is that I want my program to be implemented as a rule, and the rules of consensus are enforced without anyone’s control. Previously it was mentioned that all public blockchain platforms will support the smart contracts by all means. When we are building our own blockchain platform, or when you are building a blockchain solution for a specific project, we actually don’t need to rebuild smart contracts blockchain all over again. Smart contracts are for public use because as a general platform, smart contracts don’t know how they are being used by a business.
What kind of business logic does something like Ethereum consider? After all, Ethereum needs to support many different types of scenarios. Ethereum cannot know what specific kind of business logic in needs to support.
We may have to write a general blockchain platform. To do this, we could start a business in a vertical industry and then later focus on a specific industry to solve a specific problem. If you follow this type of approach to design a blockchain solution, we won’t actually need smart contracts. We would write all the things that need consensus in the underlying code.
This can be a more flexible approach when working with clients as a smart contract is quite complex to design. If you write with Solidity, it will be very complex if any module is not well designed. Furthermore, you need to request smart contracts in order to meet demand from various business scenarios. Therefore it can be avoided. One of the advantages of using a smart contract is that it has a high threshold and technical requirements. You need to write a compiler and an interpreter which is a good benchmark for finding talent to work on your project as they are not easy to design.
There is also the option of a platform that does not support smart contracts, instead, they provide off-chain smart contracts for you to use. This type of platform often uses pre-written smart contracts and processes it off-the-chain.
We hope that the above presentation provides an overview of what needs to be considered when designing a blockchain for commercial use or a client. If you have any questions, ask them in the comments.
More In The Code Talks Series
- Application of Zero-Knowledge Proofs: From November’s Code Talk meetup at Tongji University. This article looks at the different types of zero-knowledge proofs and how they can be applied to improve a project
- Libra and PPIO: Our first Code Talks coincided with the announcement of Libra so we break down how it works and then share what we can learn from it.
- Tendermint: Introduction and Analysis: Details the ingenuity of their consensus mechanism and a tutorial on building your own public chain in just 15 minutes.
- The A-to-Z on zkSnarks: Why zero-knowledge proof is ideal for authentication so no one else can knows what you’re communicating.
- Talking Sense About Digital Currency Exchanges: Unpacking the business mechanics of exchanges and the biggest challenges they need to know in order to grow.