TOKYO—Toshiba Corporation (TOKYO: 6502) has developed an ultra-fast data search and matching technology that outperforms similar systems by a factor of 50. It can be applied to any data that can be represented as a high-dimensional vector, and its wide ranging applications include big data analytics of large scale media databases*1 and facial recognition—in experiments, the system recognized a single individual among 5,800 people in a photo database of 10 million images in only 8.31 milliseconds*2.
Advances in big data analysis continue to secure dramatic refinements in such areas as machine learning and failure prediction, bringing increasing benefits to daily life. However, data volumes continue to grow exponentially and to keep pace analysis and recognition capabilities must also accelerate.
Toshiba’s technology builds indexes of high-dimensional feature data*3 extracted from objects, including complex, multi-faceted objects such as the human face or representations of product sale patterns and stock prices over time. The database can be searched for pattern matches, and produces results at an unmatched, ultra-fast rate. This performance acceleration rests on three components, shown below.
- Vector Coding Technology: encodes feature of objects as short vectors, and maintains the shortest possible difference between the vectors.
- Vector Indexing Technology: recognizes similar vectors without any need to compute the distance between them.
- Pipeline Lookup Technology: a combination of both coarse and fine lookup
Vector Indexing Technology is an original technology developed by Toshiba. It builds groups of similar vectors, and so enables rapid identification of the group close to the vector in a query. It does not need to compute the distance between individual vectors and the query, realizing ultra-fast lookup of vectors.
Toshiba initially intends to apply the technology in three areas: pattern mining, media recognition and big data analysis. For example, pattern mining would allow a particular person to be identified almost instantly among a large set of images taken by surveillance cameras, while media recognition could be used to protect soft targets, such as airports and railway stations*4by automatically identifying persons wanted by the authorities.
Toshiba also hoping to support its clients and contribute to society by deploying this technology to new fields such as deep learning.
The company plan to release a new database product based on the new recognition technology and GridDB, its scalable database, that will enable ultra-fast processing of big data and large-scale media databases in fiscal year 2016.
1. Pattern Mining—finding similar patterns
Surveillance cameras installed over a wide area, a town for example, can be monitored Cameras can be installed in diverse facilities, such as railway stations, airports, highway entrances, amusement parks, ATMs, banks, ticket vendors, etc. Surveillance of crowds moving from a railway station to a stadium or a concert hall may be another application.
Financial data mining of characteristic movements of stock prices.
Identification of individual in massive image database could be done, in order to create or search within a video database.
2. Reinforcement of media recognition
Soft targets could be protected by detecting individuals on wanted lists. In industry, recognition off a single component within a database of 10-million industrial parts could be done almost instantly, boosting productivity.
3. Big data analytics
Cloud services for automatic data analysis with machine learning and prediction could be implemented. Analysis of sales data or sensor data could be achieved simply by uploading the data to a server.
Video: Wide area surveillance
This video shows application of the technology to wide area surveillance.
*1. Collections of surveillance video, TV program archives, phone conversations recorded by call center and web texts are examples of such databases.
*2.Experiment parameters: Precision of 98% in recognition of 10 million images of the faces of 5,800 people. The results were as follow.
*3.Feature data expressed as high dimensional vectors, with 100-100,000 dimensions; many more dimensions than 2D (planar) and 3D (spatial) vectors.
*4.With current technology, the identification of a single individual in a database of 10,000,000 criminals takes approximately 20 seconds. Toshiba’s technology can do it in just 0.68 seconds. (Toshiba estimate for theoretical system).
Toshiba Corporation, a Fortune Global 500 company, channels world-class capabilities in advanced electronic and electrical product and systems into five strategic business domains: Energy & Infrastructure, Community Solutions, Healthcare Systems & Services, Electronic Devices & Components, and Lifestyles Products & Services. Guided by the principles of The Basic Commitment of the Toshiba Group, “Committed to People, Committed to the Future”, Toshiba promotes global operations and is contributing to the .realization of a world where generations to come can live better lives.
Founded in Tokyo in 1875, today’s Toshiba is at the heart of a global network of over 580 consolidated companies employing 199,000 people worldwide, with annual sales surpassing 6.6 trillion yen (US$55 billion).
To find out more about Toshiba, visit www.toshiba.co.jp/index.htm