I try to minimize the requirements for the output for vision, so that it is more likely to fit all vertebrates rather than just more advanced ones. Thus, this model does not require depth perception or extensive recognition other than differentiating between front, side, and back of the perceived object.
I assume that the plumb line of gravity is known through direct, passive, perception. This plumb line gives us the first dimension in a Cartesian frame of reference that is independent of our own present position or line of view. At right angle to this plumb line is the floor on which we stand, lie, or kneel, etc. The observed floor may of course not be at right angles to the plumb line because it may be sloped or hilly. The virtual floor at right angles to our plumb line gives us the second and third dimensions of our Cartesian framework. Let us assume that we can determine the front (or back) of the actor or animal by looking at his/its eyes, right and left side symmetries, etc. Alternatively we should be able to determine the side from the profile. The front and the side can be represented by virtual lines drawn on the virtual floor. The front-to-back line should be at right angles to the side-to-side line. These two lines along with the plumb line yield a 3D Cartesian framework that we can use to describe the pose of the actor or animal.