Almost exactly a year ago, we posted about how Ashutosh Saxena's lab at Cornell was teaching robots to use their "imaginations" to try to picture how a human would want a room organized. The research was successful, with algorithms that used hallucinated humans (which are the best sort of humans) to influence the placement of objects performing significantly better than other methods. Cool stuff indeed, and now comes the next step: labeling 3D point-clouds obtained from RGB-D sensors by leveraging contextual hallucinated people.
A significant amount of research has been done investigating the relationships between objects and other objects. It's called semantic mapping, and it's very valuable in giving robots what we'd call things like "intuition" or "common sense." However, being humans, we tend to live human-centered lives, and that means that the majority of our stuff tends to be human-centered too, and keeping this in mind can help to put objects in context.
In the above case, a traditional semantic mapping algorithm might take a look at all of the objects on the desk and be able to figure out that its a desk area, but some of the objects (like the bottle of water or the jacket) don't necessarily fit into the "desk" semantic category. When you imagine a human there, though, it starts to make more sense, because clothing and water often show up where humans tend to spend significant amounts of time.
The other concept to deal with is that of object affordances. An affordance is some characteristic of an object that allows a human to do something with it. For example, a doorknob is an affordance that lets us open doors, and a handle on a coffee cup is an affordance that lets us pick it up and drink out of it. There's plenty to be learned about the function of an object by how a human uses it, but if you don't have a human handy to interact with the object for you, hallucinating one up out of nowhere can serve a similar purpose.