Addressing Collective Computations Efficiency
Towards a Platform-level Reinforcement Learning Approach
Talk @ ACSOS 2022
🎤 Gianluca Aguzzi, Roberto Casadei, Mirko Viroli
📧 gianluca.aguzzi@unibo.it
Collective Adaptive Systems
Context
Many Networked Agents
Collective Behaviour
Challenges
- Distribution
- Uncertainties
- Global-to-local mapping
Motivating example
Distributed and resilient sensing
Use cases
- Monitoring wild fires in forests
- Crowd engineering
- Avalanche monitoring
- Agriculture monitoring system
How
Collective pattern
Building blocks for CAS applications
Gradient Cast
def diffuse(center, data, accumulator)
Broadcast information from sink nodes
Data collection
def collect(path, accumulator, data)
Collect data into sink nodes
Sparse Choice
def leaderElection(grain)
Distributed leader election
Self-organising & self-stabilising blocks
High-Level specification
General schema
val centers = leaderElection(grain)
val centers = leaderElection(grain)
val pathTo = diffuse(centers, 0, _ + connectionRange())
val centers = leaderElection(grain)
val pathTo = diffuse(centers, 0, _ + connectionRange())
val regionPerception = collect(pathTo, sense("perception"), accumulator)
val centers = leaderElection(grain)
val pathTo = diffuse(centers, 0, _ + connectionRange())
val regionPerception = collect(pathTo, sense("perception"), accumulator)
diffuse(centers, centerDecisionUsing(regionPerception), identity)
Execution Model
Behaviour
Loop of three phases:
Sense Compute Act
- Sense deals with neighborhood information retrieval & environment perceptions
- Compute performs the main logic of the application
- Act executes action following the result from Compute phase
Scheduling
- Message-based
- Execution triggered by messages received from the other
- Sensor-based
- Changes in the environment trigger execution
- Time-Based
- With a certain rate, nodes perform an execution step
- Event-based
On Collective Computing Efficiency
Goals
Approaches
On Collective Computing Efficiency
- Same Global specification (e.g., Distributed Sensing & Act)
Goals
Approaches
On Collective Computing Efficiency
- Same Global specification (e.g., Distributed Sensing & Act)
- Several Execution strategies
Goals
Approaches
On Collective Computing Efficiency
- Same Global specification (e.g., Distributed Sensing & Act)
- Several Execution strategies
- Collective Execution Strategies influence the Global result
Goals
Approaches
On Collective Computing Efficiency
- Same Global specification (e.g., Distributed Sensing & Act)
- Several Execution strategies
- Collective Execution Strategies influence the Global result
Goals
- Reduce power consumption Green computing
- Improve convergence time to the desired emergent
Multi-objective nature
Approaches
On Collective Computing Efficiency
- Same Global specification (e.g., Distributed Sensing & Act)
- Several Execution strategies
- Collective Execution Strategies influence the Global result
Goals
- Reduce power consumption Green computing
- Improve convergence time to the desired emergent
Multi-objective nature
Approaches
-
Rule-Based Schedulers
-
Reinforcement Learning-based Schedulers
On Collective Computing Efficiency
High Consumption, fast convergence
On Collective Computing Efficiency
Low Consumption, slow convergence
On Collective Computing Efficiency
Goal: adaptive solution
Contribution
Learning to reduce power consumption for self-stabilising building blocks (i.e., gradient-cast, data collection, sparse choice)
- Reference Learning Technique Value-Based Reinforcement Learning (e.g., Q-Learning)
- Reference Framework Aggregate Computing: A top-down global to local functional programming approach to express self-organising collective behaviours
- Good match for expressing our solution
- High potential-impact in the Aggregate Computing research
Contribution
Learning to reduce power consumption for self-stabilising building blocks (i.e., gradient-cast, data collection, sparse choice)
Benefits
- RL applied in the middleware – does not change the semantics of Aggregate Computing
- Separation between the Functionals aspect (i.e., the aggregate code) and Non-Functional aspects (i.e., the energy consumption)
Idea
- Time-based schedulers
- Learning how/when to reduce the round frequency …
- … Maintaining a good reactivity with respect to environmental changes
Learning Settings
Multi (Many) Agent System
- CASs foster systems with hundreds of computational entities
Homogenous Behaviour
- Complexity emerges from interactions
- Typical choice in swarm-like system
Centralised Training Decentralised Execution
- Global information during learning using the Q update
- distributed control at runtime state built from only local data
- learning eased by the global view
Learning Settings
For each time step, each agent record a $s_t, a_t, r_t$ trajectory, as:
- State: local output derivate
- Actions: next wake-up time
- Reward: near 0 if the output is stable, -1 in the other cases
Learning Settings
For each time step, each agent record a $s_t, a_t, r_t$ trajectory, as:
- State: local output derivate
- Window of size $w$ of local changes
- Each element could be
Rising, Stable, Decreasing
- Actions: next wake-up time
- Reward: near 0 if the output is stable, -1 in the other cases
Learning Settings
For each time step, each agent record a $s_t, a_t, r_t$ trajectory, as:
- State: local output derivate
- Actions: next wake-up time
- Fixed set:
[100ms, 200ms, 500ms, 1s]
- Reward: weight the next-wake time and the consumption
Learning Settings
For each time step, each agent record a $s_t, a_t, r_t$ trajectory, as:
- State: local output derivate
- Actions: next wake-up time
- Reward: near 0 if the output is stable, -1 in the other cases
- Multi-objective:
- non-stable condition: negative reward (-1)
- stable condition: weight the reward w.r.t. next wake up time – 0 if it is near to 1s
- $\theta$ weights the contribution of non-stability w.r.t. consumption
Simulation Configuration
Goal
- Learn to reduce power consumption
- Blocks: Gradient-cast and Data collection
- Maintain a good convergence time
- Dual: with the same power, learn to converge more quickly
- Verify if the policy generalizes (i.e., work in other deployments)
Training
Evaluation
Simulation Configuration
Goal
Training
- 100 nodes in place in a semi-random grid
- each simulation has a random seed
- performance verified in different conditions
Evaluation
Simulation Configuration
Goal
Training
- 100 nodes in place in a semi-random grid
- each simulation has a random seed
- performance verified in different conditions
- multiple seek nodes
- multiple environmental changes
Evaluation
Simulation Configuration
Goal
Training
Evaluation
Simulation Configuration
Goal
Training
Evaluation
- Control the performance by varying the nodes in the system
- Compare with a hand-mand policy as a reference
Result Overview
Conclusion
- The system eventually learns a policy that reduces the collective power consumption
- The same policy work for different deployments
Future Work
- Using scheduling policy as action
- Deep Learning approaches to encode local node features
- Using Multi-Objective Reinforcement Learning to better represent the problem space.
- Extends Learning to a fully online & distributed version