Visual Affordances: Enabling Robots to Understand Object Functionality

Istituto Italiano di Tecnologia1
Queen Mary University of London2
Idiap Research Institute3, École Polytechnique Fédérale de Lausanne4

Abstract

Human-robot interaction for assistive technologies relies on the prediction of affordances, which are the potential actions a robot can perform on objects. Predicting object affordances from visual perception is formulated differently for tasks such as grasping detection, affordance classification, affordance segmentation, and hand-object interaction synthesis. In this work, we highlight the reproducibility issue in these redefinitions, making comparative benchmarks unfair and unreliable. To address this problem, we propose a unified formulation for visual affordance prediction, provide a comprehensive and systematic review of previous works highlighting strengths and limitations of methods and datasets, and analyse what challenges reproducibility. To favour transparency, we introduce the Affordance Sheet, a document to detail the proposed solution, the datasets, and the validation. As the physical properties of an object influence the interaction with the robot, we present a generic framework that links visual affordance prediction to the physical world. Using the weight of an object as an example for this framework, we discuss how estimating object mass can affect the affordance prediction. Our approach bridges the gap between affordance perception and robot actuation, and accounts for the complete information about objects of interest and how the robot interacts with them to accomplish its task.

Highlights

  • Unified problem definition of visual affordance and systematic review
  • Physically-based framework for human-robot collaboration
  • Reproducibility issues in visual affordance datasets and benchmarks
  • Affordance Sheet to favour reproducibility
  • Open challenges and future directions of visual affordance for robotics

Unified affordance problem formulation

We unify the formulation for visual affordance prediction across its various tasks that were treated separately or appear disconnected in previous works and surveys. Our framework decomposes the task of the robot in the following subtasks and related components:

  1. Localises the object of interest (object localisation).
  2. Predicts the actions for each localised object (functional classification).
  3. Predicts the object regions that enable to perform the action (functional segmentation).
  4. Estimates the end-effector pose on the object, given the end-effector model and previous extracted information (end-effector pose estimation).
  5. Renders the end-effector on the RGB(D) image (end-effector synthesis).
  6. Reaches the estimated target pose, keeping into account the desired result also from a visual point of view through the rendered end-effector pose (robot control).


Affordance formulation

Visual affordance redefinitions in previous works (colored arrows): Affordance classification , Affordance detection and segmentation , Affordance grounding , Hand-object pose estimation , Hand-object synthesis .

Physically-based framework for human-robot collaboration

We present a framework that relates the mass and the segmentation of affordance regions. Detection methods localise the objects to crop the image and use the object crops as input to other specialised models. The object of interest is selected based on the specific purpose of the interaction, e.g. taking the object that is held by a human. A model specialised in mass estimation predicts the weight of the object and a model specialised in affordance segmentation predicts the regions of interaction. The two independent predictions can be fused to support the movement planning and the adjustment of the robotic hand pose and force during the interaction (robot control).


Physically-based framework

Reproducibility in visual affordance

Reproducibility challenges (RCs) in different redefinitions of visual affordance prediction include: data availability for benchmarking (RC1); availability of a method's implementation (RC2); availability of the trained model (RC3); details of the experimental setups (RC4); and details of the performance measures used for the evaluation (RC5).


Research challenges

Affordance sheet

To promote reproducibility in affordance prediction and overcome reproducibility challenges, we propose the Affordance Sheet, an organised collection of good practices that can facilitate fair comparisons and the development of new solutions (see the Table below). Our Affordance Sheet includes Model Cards and adds sections that complement the released information.


Affordance sheet