Background

To do a decentralized storage project with blockchain incentives, we must have good proof mechanisms in place. Only by using a proof mechanism with a strong mathematical basis can we prevent the attack of malicious miners and ensure the fairness of blockchain incentives.

Everyone familiar with blockchain often hears about PoW (Proof-of-Work) and PoS (Proof-Of-Stake), etc. These are consensus mechanisms. Simply put, a consensus mechanism is the algorithm that determines which node generates the block, and then by other node verification to reach a consensus. However, the proof of storage is different. Its purpose is to prove its contribution value to other nodes. In a distributed storage network, the greater the contribution value, the more incentives are obtained.

Why is it so difficult?

The proof mechanisms of cryptocurrency, such as bitcoin and Ethereum, are very simple. They are doing math problems and guessing numbers based on an integer. Everyone guesses a number, trying to find a number with the most consecutive zeros in the beginning, that also complies to the rules of the game. Whoever guesses this number first wins, so he gets the reward from the new block.

As you can see, the algorithm is very simple. Because of its simplicity, this algorithm is mathematically very rigorous. As long as this calculation process is mathematically impossible to calculate backward, it is very safe and has no loopholes. Because this is only a mathematical game, the algorithm is, of course, simple. However, this mathematical calculation is meaningless: the more people bitcoin mines, the more human resources are being consumed.

One of the great missions of the decentralized storage public chain is to turn bitcoin, a wasteful resource mining method, into a service that is beneficial to humans. The most basic measurement unit for decentralized storage is to use the storage unit and the flow unit to measure. Traditional cloud services also use these two factors.

What is storage? Storage is how big stored content is and how much time it needs to be stored. This is a measurement factor. What is traffic? Traffic is how many bytes are transferred, which is also a measured factor. Proving these two factors cannot be done by a single-machine algorithm alone. It is necessary to use network communication, so that the third party node can witness the two sides to prove that it is useful. However, if this third-party node is dishonest, more third-party nodes are needed to witness it together. Consensus must be reached among these witnesses to complete the proof. This process is much more complicated than the stand-alone algorithm of Bitcoin.

Another point is that each time Bitcoin generates a block, only one node can get rewards because it is the first to do the math problem right. So it's very simple. However, decentralized storage systems should reward for contributions based on all nodes that have served during this time. It's much more complicated than Bitcoin. (Maybe you mined Bitcoin, mistakenly thinking that Bitcoin based on computing power to distribute rewards. Actually not, because Bitcoin introduces a centralized node called the mine pool. The mine pool gets the bits obtained by very few nodes. The mine pool is equivalent to insurance; it takes the original only a few nodes got Bitcoin rewards for all those involved in mining.)

So the proof mechanism of decentralized storage is much more difficult than the proof mechanism of Bitcoin.

Proofs used in Cloud Storage

Is the storage-related proof only available in decentralized storage projects after the blockchain has appeared? No. It has actually existed and been used in previous cloud storage projects.

With the explosion of data and the popularity of broadband networks, cloud storage has become an important application since 2005. Personal cloud storage platforms include DropBox, Google Drive, Microsoft's OneDrive, and Apple's iCloud. Enterprise cloud storage platforms include AWS S3, Microsoft's Axure, Google Cloud, etc. These cloud storage services provide enterprises and individuals with a solution for secure storage and efficient access to massive amounts of data.

Although cloud storage is a centralized server deployment, data security is also an important issue. If the user has a large amount of data stored on the cloud server, how can he check that the data has not been lost or damaged, or even tampered with? Cloud storage service providers also need to prove to users or customers that data is safely kept.

In order to solve this problem, the early proof appeared, and the proof mechanism related to cloud storage has the following:

Provable Data Possession (PDP): The PDP proves that server holds this data without revealing the original data. PDP allow user to send data to the server, and later verifier can repeatedly check whether the server is still storing the data. PDPs are useful in cloud storage and other storage outsourcing settings. PDPs can be either privately-verifiable or publicly-verifiable, and static or dynamic. A wide variety of PDP schemes exist.

Related research papers:
G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson, and D. Song. Provable data possession at untrusted stores. In Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS '07, New York, NY, USA, 2007. ACM. Ateniese G, Di Pietro R, Mancini L V, et al. 2008. Scalable and efficient provabledata possession, SecureComm ’08.Heitzmann A, Palazzi B, Papamanthou C, et al. 2008. Efficient integrity checking of untrusted network storage, StorageSS ’08.

Proof-of-Retrievability (PoRet): PoRet are similar to PDP, but also enable extracting data, namely they ofier retrievability. PDPs allow the verifier  to check that the server is still storing the data, but the server may submit valid PDP proofs yet hold the data hostage and never release it. PoRet solves this problem by making the proofs themselves leak pieces of the data so that the verifier can issue some number of challenges and then reconstruct the data from the proofs.

Related research papers:
A. Juels and B. S. Kaliski, Jr. Pors: Proofs of retrievability for large files. In Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS '07. New York, NY, USA, 2007. ACM. https://dl.acm.org/citation.cfm?id=1315317 .Shacham H, Waters B. 2008. Compact Proofs of Retrievability, ASIACRYPT ’08.

Proofs used in Decentralization Storage

In the cloud storage project, all the machines that store data are the servers deployed by the cloud service providers themselves. They are not independent accounting individuals. They will not do evil, but there may be procedural bugs and hardware failures, so the relevant storage proof mechanism is relatively simple.

However, in a decentralized storage project, it is not a storage server, but a storage node, that is, a miner. They are independent individuals who are self-financing. They are connected to the network for profit, and most of them are rational. They will find the the way to pay as little as possible and receive as much as possible. They are likely to be evil and find ways to get something for nothing. Therefore, the proof mechanism of the decentralized storage project is much more complicated and more difficult than the proof in the cloud storage.

The proof mechanisms for decentralized storage are:

Proof-of-Storage (PoStorage):  (PoS) schemes allow a user to outsource the storage of data to a miner and then repeatedly check if the miner is still storing the data. PDPs and PoRets were independently introduced around the same time in 2007. Since then, the concept of Proofs-of-Storage generalizes PDPs and PoRets. This work presents PoReps, a new type of PoS.
PoStorage sometimes refers to all storage-related proofs. The following PoRep, and PoSt are all ones of storage-related proofs.

Proof-of-Capacity (PoC):PoC is another way of saying PoSpace. BrustCoin uses the PoC algorithm.

Proof-of-Replication (PoRep): PoRep is a kind of PoS that additionally ensure that the miner is dedicating unique physical storage to storing the data. the miner cannot pretend to store the data twice and deduplicate the storage. This construction is useful in Cloud Storage and Decentralized Storage Network settings, where ensuring a proper level of replication is important, and where rational miners may create Sybil identities and sell their service twice to the same user. PoRep schemes ensure each replica is stored independently. Some PoRep schemes may also be PoRet schemes.

Related research papers:
Ben Fissh. PoReps: Proofs of Space on Useful Data.  Cryptology ePrint Archive, 2018/678, 2018. http://eprint.iacr.org/2018/678 .

Proof-of-Spacetime (PoSt): l  (PoSt) schemes [10] allow the miner to convince the verifier that the miner has spent some storage space used over time (“spacetime”) resources. This is a PoSpace with a sequence of checks over time. This work introduces such a scheme, based on sequential PoReps.

Related research papers:
T. Moran and I. Orlov. Proofs of space-time and rational proofs of storage. Cryptology ePrint Archive, Report 2016/035, 2016. http://eprint.iacr.org/2016/035.

In addition, many decentralized storage blockchain projects claim to have invented new proofs, and in general have made some improvements based on the idea of these storage proofs.

I plan to follow this article with future articles that will reveal various storage-related proof techniques, and will progressively reveal the core storage proofs of various decentralized storage projects, including my own project, PP.IO.