Human-robot interaction for assistive technologies relies on the prediction of affordances,
which are the potential actions a robot can perform on objects.
Predicting object affordances from visual perception is formulated differently for tasks such as
grasping detection, affordance classification, affordance segmentation, and hand-object
interaction synthesis. In this work, we highlight the reproducibility issue in these
redefinitions, making comparative benchmarks unfair and unreliable. To address this problem, we
propose a unified formulation for visual affordance prediction, provide a comprehensive and
systematic review of previous works highlighting strengths and limitations of methods and
datasets, and analyse what challenges reproducibility. To favour transparency, we introduce the
Affordance Sheet, a document to detail the proposed solution, the datasets, and the validation.
As the physical properties of an object influence the interaction with the robot, we present a
generic framework that links visual affordance prediction to the physical world. Using the
weight of an object as an example for this framework, we discuss how estimating object mass can
affect the affordance prediction. Our approach bridges the gap between affordance perception and
robot actuation, and accounts for the complete information about objects of interest and how the
robot interacts with them to accomplish its task.