Invented by Sathyanarayanan Manamohan, Krishnaprasad Lingadahalli Shastry, Vishesh Garg, Hewlett Packard Enterprise Development LP
Decentralized machine learning using Blockchain brings together the power of machine learning algorithms with the transparency and security of Blockchain technology. Traditional machine learning models rely on centralized servers to collect, store, and process data. However, this centralized approach raises concerns about data privacy, security, and ownership. By leveraging Blockchain, decentralized machine learning addresses these concerns by distributing the data and computations across a network of nodes, ensuring that no single entity has control over the entire process.
One of the key advantages of decentralized machine learning using Blockchain is enhanced data privacy. With traditional machine learning, data is often collected and stored in a centralized server, making it vulnerable to breaches or misuse. In a decentralized system, data is distributed across multiple nodes, and each node only has access to a fraction of the data. This ensures that sensitive information remains secure and private, reducing the risk of data breaches.
Another benefit is increased transparency and trust. Blockchain technology provides an immutable and transparent ledger that records all transactions and activities. This transparency allows participants to verify the integrity of the data and the fairness of the machine learning algorithms. It also enables auditing and accountability, which is crucial in industries where compliance and regulatory requirements are stringent.
Furthermore, decentralized machine learning using Blockchain promotes data ownership and control. In traditional machine learning models, data is often collected by companies or organizations, and individuals have little control over how their data is used. With decentralized machine learning, individuals can retain ownership of their data and choose whether to share it with specific algorithms or models. This empowers individuals to have more control over their personal information and ensures that data is used ethically and responsibly.
The market for decentralized machine learning using Blockchain is witnessing significant growth due to its potential applications across various industries. In healthcare, for example, it can enable secure and privacy-preserving analysis of patient data for medical research and personalized treatment recommendations. In finance, it can enhance fraud detection and risk assessment by leveraging a network of distributed machine learning models. In supply chain management, it can improve traceability and transparency by enabling real-time monitoring and analysis of data across the supply chain.
Startups and established companies are already exploring and developing solutions in this space. These companies are building platforms and frameworks that allow developers to create and deploy decentralized machine learning models using Blockchain. They are also providing tools and infrastructure to facilitate data sharing, privacy preservation, and secure computations.
However, challenges remain in the adoption of decentralized machine learning using Blockchain. Scalability and computational efficiency are key concerns, as Blockchain networks can be slower and less efficient compared to traditional centralized systems. Additionally, ensuring the quality and reliability of the machine learning models in a decentralized environment requires robust mechanisms for model validation and consensus.
In conclusion, the market for the system and method for decentralized machine learning using Blockchain is poised for growth as businesses and organizations recognize the benefits of combining these two transformative technologies. Enhanced data privacy, transparency, and control, along with potential applications across various industries, make this market an exciting and promising space for innovation and development. As technology advances and challenges are addressed, decentralized machine learning using Blockchain has the potential to revolutionize how data is collected, analyzed, and utilized in the future.
The Hewlett Packard Enterprise Development LP invention works as follows
Decentralized machine-learning to build models takes place at nodes that generate local training datasets. Blockchain platforms can be used to coordinate the decentralized machine-learning over several iterations. A distributed ledger can be used to coordinate nodes for each iteration. Smart contracts can be used to enforce node involvement in iterations of model building and sharing parameters, or provide logic for selecting a node as the master node. The master node receives model parameters from nodes, and then generates final parameters using the parameters obtained. The master node can write its state into the distributed ledger to indicate that the final parameters are now available. The distributed ledger allows each node to discover the state of the master node and then obtain the parameters and apply them to their local model.
Background for The system and method for decentralized machine learning using Blockchain
Efficient models require large volumes of data. Distributed computing was developed to coordinate large computing task using multiple computers. However, the application to large-scale machine learning (?ML?) It is hard to apply distributed computing to large-scale machine learning (?ML?) problems. In distributed model building, there are many practical issues that can arise. These include coordination and deployment problems, security concerns and the effects of latency on systems, fault tolerance and parameter size. These and other issues can be addressed in a data center where computers are tightly controlled. However, moving model-building outside the data center to truly decentralized environments presents additional challenges. In distributed computing environments, for example, access to large, private datasets can be difficult and prohibitive. Changes in the topology and size of the network in time also make coordination and real-time scalable networks difficult.
Accordingly to various embodiments, decentralized machine-learning via a multitude of iterations can be coordinated and facilitated by a distributed ledger in a blockchain network with a number physical computing nodes. Each node can enroll in the blockchain network and participate in the first iteration to train a machine-learned algorithm for the first time. Each node can participate in a decision by consensus to enroll another computing node in the first iteration. The consensus decision is only valid for the first iteration. It cannot be used to register a second computing node in order to participate in future iterations.
Upon registration of an specified number of nodes necessary for an iteration, each node can obtain a local dataset of training that is accessible locally but inaccessible at other nodes of the physical computing network. During the first iteration of training, each participant node can train a local model using the local training data and get at least one first training parameter. Each participant node can train using data that is local but cannot or should not be shared with any other nodes. “Each participant node can generate a blockchain transactions indicating that it is ready to share a first training parameter with a master node.
The master node can generate a new ledger block that will be added to each copy the distributed ledger when the first computing node in the physical world is ready to share its first training parameter. The master node can be chosen by consensus amongst the nodes participating in the iteration or simply selected as the first node that enrolls. The master node can obtain training parameters from all participant nodes and create final parameters based on these parameters. The master node can broadcast to the blockchain network an indication that the final parameters are ready and relinquish its status as master node.
Each node participant may obtain from the blockchain network one or more final parameters generated by the master node using the first training parameter, as well as at least a secondary training parameter generated and shared at the second computing node. The one or more final parameters can be applied by each participant node to the local model. Even nodes who did not take part in the current iteration can consult the distributed ledger for the latest training parameters. Decentralized machine-learning can be scaled dynamically as nodes become available, and updated parameters are provided to the nodes that join or become available.
The following detailed description will reveal other features and aspects in the disclosed technology. This is in conjunction with the accompanying illustrations, which show, as an example, features according to embodiments of disclosed technology. The summary does not limit the scope or any inventions described in this document, which is defined by the attached claims.
The disclosure is about decentralized parallel machine-learning at nodes in multiple iterations of a blockchain network. A blockchain network is a network in which nodes, for example, use a consensus method to update a distributed blockchain. The computing power of edge devices, which can act as nodes, may be used to train models. These nodes are sometimes referred to by the term “edge”. They may be located at the edge of an IT infrastructure, where real-world interactions with large IT infrastructure occur. As an example, autonomous vehicles include multiple computing devices that can communicate with fixed servers assets. Edge devices, such as the?Internet of Things’? (or ?IoT?) Devices in different contexts such as consumer electronics and appliances, drones and others, are becoming increasingly computationally and network-capable. One example is real-time traffic control in smart cities that diverts their data to data centers. As described in this article, these edge devices can be decentralized to increase efficiency and scale for parallel machine learning.
The disclosure allows model building to be pushed out to other nodes and to address changes in input data patterns. It also scales the system and coordinates model building across nodes. The model building can be moved closer to the source of the data or to other accessible locations. This allows for real-time analysis to take place at the site of the data generation, without having to consolidate data in datacenters with all the problems that come along with it. The need to consolidate data at a single physical location (datacenter or “core”) is eliminated. The disclosed systems, methods and non-transitory computer-readable storage media can reduce the time required for the model (e.g. model training time) to adapt to environmental changes and make more accurate forecasts. The system can be used to create applications that are truly autonomous and decentralized. This is true whether the context and implementation of autonomous vehicles or IoT and network-connected scenarios.
Nodes within the blockchain network can enroll to take part in a model training iteration. All nodes can benefit from an iteration even if not all of them enroll. The nodes that are enrolled in an iteration model training can collectively decide whether or not to enroll nodes based on the state and/or credentials of a requesting network.
Each participant node (also called a “participant” node) is enrolled in the iteration. The node may use local data to train a model, which may be available locally but not at other nodes. The training data, for example, may contain sensitive or private information which should not be shared. However, training parameters derived from this data using machine learning are able to be shared. The node can broadcast a message indicating that it is willing to share training parameters when training parameters have been obtained. The node can do this by creating a blockchain transaction which includes both the indication and the information on where to obtain the training parameters (such as an Uniform Resource Indicate address). A master node, also known as the “master computing node”, is responsible for sharing training parameters with all or some of the participating nodes. The indications may be written to a distributed database. One or more rules may define the minimum number of nodes willing to share parameters for training in order to allow the master node write the indications. These rules may be encoded into a smart contract as described in this document.
By indicating it has completed the merger, the master node releases its status as the master node for this iteration. The next iteration will most likely, but not necessarily, select a new master. The training can be iterated until the parameters of the training converge. Once the parameters of the training no longer converge the iterations can be restarted.
Decentralized model building can be dynamically scaled when nodes become available. As an example, the system can continue to execute machine learning iterations at nodes that are available even when autonomous vehicle computers are online (such a being in operation or off). As vehicles are brought online, they can receive a new version of the ledger from other vehicles. This allows them to get the most recent parameters learned while the vehicle was off.
Furthermore dynamic scaling does no cause degradation in model accuracy. The stale gradients can be avoided by using a distributed database to coordinate activities and smart contracts that enforce synchronization. The use of smart contracts and a decentralized ledger may make the system more fault-tolerant. “Node restarts, other downtimes, and model accuracy can be maintained without losing accuracy when participant nodes are dynamically scaled and synchronized.
Furthermore building applications to implement the ML model for experimentation is simplified, because a decentralized app can be agnostic of network topology and the role of a system node.
FIG. According to an implementation, FIG. 1 shows an example of a decentralized machine learning system 100 using blockchain. The system 100 can include a model-building blockchain network (also known as a network of blockchains 110) which includes a plurality physical computing nodes 10. Number, configuration and connections among nodes 10 can vary. The arrangement of nodes 10, as shown in FIG. The purpose of FIG. Nodes, like node 10a, can be either fixed or mobile devices. We will now describe some of the details of a particular node 10. “While only one node 10 is shown in detail, all nodes can be configured as illustrated.
Node 10a can include one or several sensors 12, actuators 14, and/or devices 16, as well as one or multiple processors 20. (Also interchangeably called processor(s) 20 or processor 20 herein for convenience), storage devices 40, or other components. Sensors 12, actuators 14 and/or devices 16 can generate data that’s accessible to the node 10 locally. The data generated by the sensors 12, actuators 14, and/or other devices 16 may only be available to nodes 10 participating in the model-building blockchain network 110.
It is important to note that the storage device(s), 40, can store distributed ledgers 42, models 44, and intelligent contracts 46. The distributed ledger may contain a series blocks of data which refer to at least one other block, for example a previous block. The blocks of data can be linked together in this way. In the well-known paper “Bitcoin: a Peer-toPeer Electronic Money System”, a distributed ledger example is given. Satoshi Nakamoto (bitcoin.org) whose contents are incorporated herein by reference. The distributed ledger may store blocks indicating a state of the node 10a in relation to its machine-learning during an iteration. The distributed ledger 42 can store an unalterable record of state transitions for a node 10 The distributed ledger 42 can store both the current and historical state of a model. Note that, in some embodiments of the distributed ledger, a collection of records, smart contracts, and models from one or more nodes (e.g. node(s), 10 b-10g) can be stored.
Model 44 can be locally trained on a node 10, based on data that is locally available, as described in this document, and updated using model parameters from other nodes 10 participants. The nature of the model will depend on the implementation of the node 10. This is noted elsewhere in the document. Model 44, for example, may include parameters that relate to: self-driving vehicles such as sensor data relating object detection; dryer appliances relating drying times and controls; network configuration features related to network configurations; security features relating network security, such as intrusion detectors and/or any other context-based model.
The smart contracts 46 could include rules that allow nodes to behave a certain way in relation to machine learning. The rules could specify, for example, deterministic state changes, how and when to elect a master, when to start an iteration in machine learning, if a node can enroll, a certain number of nodes that must agree on a consensus, or a certain percentage of voting nodes.
The processors 20 can obtain data that is only accessible to the node 10, but may not be accessible to all participants nodes 10. Locally accessible data could include, for instance, private data which should not be shared, but model parameters learned from private data can still be shared.
Click here to view the patent on Google Patents.