Embodied Intelligence via Learning and Evolution

Gupta et al., 2021

Source: Gupta et al., 2021

Summary

  • The relationship between environmental complexity, evolved morphology, and learnability of intelligent control is not well understood
  • Deep Evolutionary Reinforcement Learning (DERL) evolves diverse agent morphologies to learn locomotion and manipulation tasks in complex environments using egocentric sensory information
  • Demonstrates that environmental complexity fosters the evolution of morphological intelligence
    • Evolution selects morphologies that learn faster - morphological Baldwin effect – due to better physical stability and energy efficiency
  • Links: [ website ] [ pdf ]

Background

  • Animals exhibit high degrees of embodied intelligence by leveraging their morphologies to solve complex tasks
    • In contrast, AI has generally focused on disembodied cognition
  • Artificial evolution of morphologies is difficult:
    • Combinatorially large number of possible morphologies
    • Significant compute to evaluate fitness through lifetime learning
  • DERL enables scaling along three axes of complexity: environmental, morphological, and control
    • Mimics process of Darwinian evolution over generations and neural learning within a lifetime
  • Previous evolutionary simulations used generational evolution, which scales poorly since evolution occurs only after every individual is trained

Methods

  • DERL uses asynchronous tournament based evolution in groups of four
  • Each agent receives egocentric proprioceptive and exteroceptive observations, policy learned with PPO
    • Proprioceptive observations: joint angles, angular velocities, head velocity, acceleration, and angular acceleration, and touch sensors on limbs and head
    • Exteroceptive observations: local terrain profile, goal location, and positions of objects and obstacles
    • Controller reward is a combination of forward velocity and a small penalty for large torques, but only forward progress is used for fitness
  • UNIMAL: UNIversal aniMAL morphological design space that is expressive yet controllable
    • Kinematic tree genotype corresponding to a hierarchy of 3D rigid parts connected via motor actuated hinge joints
    • Three classes of mutations:
      • Grow or delete limbs
      • Modify physical properties of existing limbs (e.g. length or density)
      • Modify properties of joints (e.g. DoF, limits of rotation, or gear ratios)
    • Preserve bilateral symmetry by using paired mutations, which results in the center of mass lying on the saggital plane
  • Three levels of environmental complexity: flat terrain (FT), variable terrain (VT), and non-prehensile manipulation in variable terrain (MVT)

Results

  • Experiments averaging 10 generations, 4000 morphologies, and 5 million agent-environment interactions
  • Relatively high average initial fitness indicates the efficacy of UNIMAL
  • Asynchronous parallel tournaments in DERL enables ancestors with lower initial fitness to still contribute highly fit descendants to the final population
  • Assessing morphological intelligence
    • Eight tasks divided into three domains: agility, stability, and manipulation
    • Controllers learned from scratch in each task, ensuring differences in performance are a result of morphology
    • Agents evolved in MVT outperformed FT in seven tasks, VT better than FT in agility and stability but same in manipulation – indicates that complex environments promotes morphological intelligence
  • Morphological Baldwin effect, where learning time to reach a given level of fitness is reduced over generations
    • Evolution selects for morphologies with better passive stability and energy efficiency, which enables better and faster learning

Conclusion

  • Large-scale evolutionary simulations by DERL yield insights into how the interaction between learning, evolution, and environmental complexity can lead to morphological intelligence
  • Looks like the performance is still increasing at the end of lifelong learning (5 million environmental interactions), which confounds the selection pressure for final performance and learning speed
  • Would be interesting to further investigate the various design choices (morphological design space, evolution hyperparameters, environments, etc.)
  • Morphological intelligence is just one example of useful information that is encoded in the genome
Elias Z. Wang
Elias Z. Wang
AI Researcher | PhD Candidate