One of the ironies of upcoming Mobile World Congress in Barcelona is that the handset itself may seem to get overlooked, despite the start of a revolution in mobile imaging and intelligent vision processing that will forever change how handsets are perceived by users, and architected by designers.
True to MWC’s promise, the lights will shine brightly on keynotes and sessions dedicated to 5G networks, e-commerce, mobile content, gaming, wearables, user security and privacy, NFC, money-making opportunities, and thousands of unique and novel Apps.
Despite the high-level dealings, a clear shout out is given on Wednesday at 2 pm to “The Explosion of Imaging,” and what an explosion it is. The short hour dedicated to the topic, and its intent to cover everything from the impact of photography on device design to services as well as photo privacy issues over networks, is ambitious, to say the least. Still, it does call attention to what is a major trend that has already impacted smartphone architecture and design.
Spurring the imaging explosion is the users’ requirement to take and share images and videos spontaneously, combined with a desire to have a single device for communications and imaging. The latter has led to a pull for ever-higher image quality to compete with digital SLR cameras, as well as seemingly innocuous but useful features such as digital video stabilization, HDR and photo stitching.
More recently, however, the requirements put upon mobile and embedded devices have started to increase exponentially. The pull for even better image quality has led to dual-camera designs, enabling smartphones to understand the depth or distance of objects within the image. This was led by HTC’s One M8 with Duo Camera which introduced some interesting features such as refocus using two sensors to generate depth, along with other image- enhancement functions. HTC was followed by Huawei and others, expanding those imaging capabilities into low-light shots and fast autofocus. Now there’s a rumor about Apple possibly introducing a dual-camera iPhone.
The introduction of 3D imaging using two dedicated sensors and image-processing chains now opens the door to full 3D vision, advanced computational photography and visual perception, terms that are not new from a technology point of view, but are relatively new to power- and space-constrained mobile devices.
When push comes to shove in the highly competitive smartphone landscape where differentiation separates success from failure, designers have, for the most part, been up to the task of meeting next-generation device requirements by leveraging advanced process nodes. But these have slowed down the past few years. We’re steady at 28 nm, with relatively few vendors having access to lower geometries. And of course, batteries continue to slowly progress with incremental capacity percentage improvements in the low single digits.
Yet the processing requirements are set to increase exponentially. Recent human-like vision technologies from IBM, Microsoft, Google and others in the fields of human vision imitation, augmented reality (AR), and holographic imaging, show clearly that some of these processing-intensive features are developing quickly and will soon be migrating to mobile and embedded systems. These will enable an exciting era of vision that shows great potential, but there are at least two major issues that need to be addressed, head on.
Firstly, while the algorithms at the root of these capabilities are well known and established, they have not been adopted for mobile or embedded systems. Sure, we’ve seen embedded implementations for Haar and ORB in mobile computing platforms, but what about convolutional neural networks (CNN) for deep learning? These have been good on desktops and for cloud processing of acquired images. It is no mean feat to bring human vision and intelligent processing algorithms and apply them in the mobile domain with its limited power and space. Companies have shown demonstrations, but none have emerged as products, yet.
Secondly, these additional features and heavy algorithms are being added to handsets that have been using the same basic architectures — CPU, GPU, DSP — that in many cases have only been through minor technological upgrades. The lifespan of this approach may be limited.
Another approach is to acknowledge the need for CPU and GPU, but then redefine the DSP role. Gone are the days where full-fledged DSPs with their flexibility and programmability were useful for general-purpose signal processing tasks. Where most opportunities lie for innovation, is in the development of specific processor IP to handle 3D imaging and intelligent vision functions. To be effective, these processors need to be extremely low power and highly optimized for the task at hand. If applied on a broader scale, it could almost be considered to be almost a ‘swarm’ approach to signal processing, whereby processor IP is pervasive on a device, but harnessed and intelligently applied to optimize for power, space, programmability and cost for specific applications (imaging and vision being a good example).
Expanding up upon this idea, in light of the emerging Internet of Things, the macro could be embedded devices themselves acting as swarm processing nodes that are collecting, processing, locally analyzing and then uploading interesting data to be collected and further analyzed as part of a bigger, connected-vision paradigm.
Where this takes us is open to our imagination. In the meantime, let’s take some time to rethink the hardware/software processing divide and functional allocation. Thoughts welcome.