One important aspect of decentralized artificial intelligence (DeAI) involves decentralizing model training across numerous devices. This strategy offers several advantages, including scalability, censorship resistance, and cost savings. However, among other difficulties inherited from federated learning [notably in terms of heterogeneity of data, models, and systems. See for instance: Liu, B., Lv, N., Guo, Y., & Li, Y. (2023, 3 janvier). Recent Advances on Federated Learning: A Systematic Survey. arXiv.org.arxiv.org/abs/2301.01299, pp 2-3.], an intrinsic challenge arises when incentives are employed in a context where devices do not rely on a central orchestrator: the potential for dishonesty among devices (or compute nodes), some of which may aim to minimize their workload while maximizing rewards (dishonest training).
In this article—which does not focus on a specific technical implementation but more on the economic dynamics—we would like to explore the idea of an incentive system that could ensure robust and reliable decentralized model training by nodes that do not trust each other in a distributed context.
Such a system should reward honest nodes and penalize dishonest ones. To this end, after presenting a toy L2 model that could possibly support this idea, we postulate that dual incentives are a prerequisite to such a system. We also introduce the concept of “Proof of Training”—PoT—, a technique designed to automatically assess the validity of the training process.
I. Suggested Framework: A Layer 2 Blockchain
Blockchain as creation and attestation of blocks. Blockchain technology provides a robust and secure method for managing communication between nodes in a network that lacks mutual trust, as in the present case.
In the context of blockchains, two fundamental functions are continually at work: the generation of new blocks (that embody operations affecting the state of the blockchain) and the attestation of their validity. Workers create new blocks to add to the network (creation; execution). Conversely, verifiers validate the authenticity of the proposed blocks and adherence to established rules (attestation; consensus).
An L2 Framework. The suggested simplified framework, just used to demonstrate the articulation between the entities training the model and the ones attesting to the quality of the training, incorporates this creation/attestation duality and adapts it to the specificities of machine learning by adopting an L2 system. A blockchain (L1) stores verified blocks—that notably refer to verified gradients—, and a globally shared storage (L2) allows workers to access global models available for training and share their computational results.
In this system,
From the shared storage, idle workers (Wn) pick up either a) a new model to train or b) an updated model from their current training session.
Workers train the model on their data. Then, they send their gradients to the shared storage.
Assuming that the nodes are synchronously interacting with the blockchain [for simplicity purposes, the framework requires synchronous operations], a worker proposes a new block to the network when m-of-n results are available for a given training task.
Proposed blocks contain training operations, each referring to a) a worker, b) a training task, c) a training epoch, and d) a set of gradients computed for this epoch.
Other nodes, acting as verifiers (Vn), verify the integrity of the proposed blocks by assessing the gradients against the global model and excluding potential outliers. This exclusion can trigger a punitive mechanism (see below).
Validated blocks become part of the canonical blockchain. (This framework is simplified; in a real-life implementation, blocks would be finalized after a delay for protocol security purposes).
The verifiers aggregate the verified gradients to update the global model.
Training at the consensus layer. In this example, nodes, and not smart contracts, train the models. The execution of a smart contract is indeed presently too limited in terms of computational power to be actionable at the training stage. Therefore, the quality of the training is not enforced purely programmatically through the execution of smart contracts—at blockchain runtime (or execution layer)—but by consensus—during the block creation time.
Naturally, workers and verifiers need to be incentivized to operate this blockchain.
II. Staking as an Incentives Framework for Distributed Training
A dual incentives mechanism. Workers must be rewarded for their computational effort, and validators must be rewarded positively for attesting the operations and updating the model. In other words, the rewards should concern the two core actions underlying any blockchain (creation and attestation).
Positive incentives are required but are insufficient to address several risks:
Sybil attack. Attackers could negatively impact the blockchain’s performance by creating deceptive nodes that continuously send incorrect gradients, overloading the verifiers with invalid data. Alternatively, workers could systematically submit low-quality gradients for rewards, relying upon the fact that some of their submissions may be rewarded from time to time by chance (coincidental validation; insufficient verification).
Collusion attack. Verifiers could collude with workers to systematically attest their operations.
For this reason, incentives should also necessarily be negative: nodes should be incentivized to contribute meaningfully to the network, while those engaging in malicious behavior should face penalties. The same principle should be extended to the verifiers.
Worker | Verifier | |
Positive incentive | Rewards (tokens) for producing valid gradients | Rewards for validly verifying a block |
Negative incentive | Slashing, exclusion for producing low-quality gradients or dummy gradients | Slashing, exclusion for producing an improper attestation (because of improper verification method, collusion, etc.) |
Staking precisely encapsulates this approach by offering rewards for constructive participation and imposing sanctions for misbehavior.
Illustration: Ethereum Staking. Ethereum illustrates this approach. In general terms, by staking Ether, validators are chosen to process transactions and create new blocks in the Ethereum blockchain. In return for their contributions, they receive rewards in the form of newly minted Ethers. This system not only ensures the security and reliability of the network but also offers financial incentives for honest participation. The Proof of Stake (PoS) consensus algorithm employed by Ethereum further reinforces this mechanism, as validators are penalized for misbehavior or failure to maintain the required stake. Importantly, verifiers are also penalized for bad attestations [Grandjean, D., Heimbach, L., & Wattenhofer, R. (2023, 19 juin). Ethereum Proof-of-Stake Consensus Layer: Participation and Decentralization. arXiv.org.arxiv.org/abs/2306.10777, pp. 4, 8.].
Applying this consensus mechanism to a new initiative is not straightforward, however. When Ethereum switched from PoW to PoS, stakeholders were incentivized to stake their tokens to generate revenue. In the context of decentralized training, a difficulty arises: would compute node operators be willing to allocate advance funding (i.e., by acquiring tokens) to make their computing power available to the network? The Ethereum model may, therefore, not be transposable as is.
We believe this approach could be applied to the framework using the concept of “Proof of Training” (PoT).
III. Proof of Training (PoT)
PoW or PoS? The core issue with this approach is the difficulty of assessing the soundness of training. The PoS presents interesting characteristics for the framework by enabling negative incentives. However, it is not sufficient as it does not address the question of assessing the computational effort required from the workers. Such “proof of computation” seems more closely related to the Proof of Work (PoW) concept. However, PoW cannot be extended to this context: it is only related to positive incentives and does not apply to non-deterministic computations.
Proof of Training (PoT). A third way, incorporating aspects of PoW and PoS, is necessary. Ideally, this type of proof should a) relate to the computational effort of the worker, b) permit the verification of the soundness of the training, and c) be actionable so that a worker can be rewarded or punished.
We name it “Proof of Training” (PoT) [other authors have used this expression to refer to different categories of proofs: A zero-knowledge proof of training (zkPoT):dl.acm.org/doi/abs/10.1145/3576915.3623202; eprint.iacr.org/2024/162; A Byzantine fault tolerance (PBFT) consensus mechanism:arxiv.org/pdf/2307.07066].
PoT allows verifiers to programmatically attest that a worker has truthfully executed model training tasks.
Unlike PoW, where task verification is straightforward, validating model training in PoT is nontrivial and requires sophisticated assessment techniques.
Automatic verification of the quality of the training. Automatic verification of the quality of the training is a hard problem (that the present article does not pretend to solve). The evaluation mechanism should be balanced to detect misbehaviors while at the same time being error-tolerant to avoid punishing honest but transiently defective nodes.
A) One technique for assessing PoT is validating model performance against private validation datasets. These data sets would necessarily be only known by the verifiers. Consequently, we recommend that validators generate synthetic datasets at runtime and obviously not share them with evaluated nodes (and even with other verifiers, in case of collusion) to prevent potential cheating.
The synthetic data must be as close as possible to the real data but should not be identical for privacy reasons; in other words, they should be “near-replica” [for an illustration of this idea and the suggestion of a VGAE-based technique to generate this synthetic data, see Nikolentzos, G., Vazirgiannis, M., Xypolopoulos, C. et al. Synthetic electronic health records generated with variational graph autoencoders. npj Digit. Med. 6, 83 (2023).doi.org/10.1038/s41746-023-00822-x: “Such graphs [variational graph autoencoder (VGAE) to generate synthetic samples from real-world electronic health records] correspond to near-identical replicas of input graphs and could lead to patient privacy leaking from the training set. These graphs must be eliminated from the generated data set to reduce the risk of privacy leak.” (p. 4)].
B) A consensus mechanism at the verifier level could also help identify outliers. By overlapping the verification operations across multiple verifiers, any discrepancy in terms of verification output would lead to a) the automatic rejection of the gradients under consideration and b) the slashing of the minority of verifiers opposing the super-majority consensus.
By combining these two approaches, we can create a robust and secure evaluation mechanism for Proof of Training, ensuring the integrity of the training process. However, how they can be implemented and the extent of their limitations are open questions.
Conclusion
Discussions around incentivizing decentralized training and recent startup initiatives mainly focus on positive rewards. This type of approach is lacking as it leads to a disequilibrium for the benefit of potential attackers. It is also lacking as it usually rewards the trainers, not the verifiers.
We postulate that a more complete incentivization mechanism should be dual (positive rewards and negative penalties) and extend to both entities training the models and those verifying their training outcomes.
Our article also raises a series of open questions, notably:
The first difficulty relates to the bootstrapping, in the economic dimension, of a new training network: how can potential operators be incentivized to adhere to this dual incentivization system?
From a technical perspective, how can the quality of the training be assessed automatically in a trustless distributed system?