Affordances are the potential actions an agent can perform on an object, as observed by a
camera. Visual affordance prediction is formulated differently for tasks such as grasping
detection, affordance classification, affordance segmentation, and hand pose estimation. This
diversity in formulations leads to inconsistent definitions that prevent fair comparisons
between methods. In this paper, we propose a unified formulation of visual affordance prediction
by accounting for the complete information on the objects of interest and the interaction of the
agent with the objects to accomplish a task. This unified formulation allows us to
comprehensively and systematically review disparate visual affordance works, highlighting
strengths and limitations of both methods and datasets. We also discuss reproducibility issues,
such as the unavailability of methods implementation and experimental setups details, making
benchmarks for visual affordance prediction unfair and unreliable. To favour transparency, we
introduce the Affordance Sheet, a document that details the solution, datasets, and validation
of a method, supporting future reproducibility and fairness in the community.