University home
University in Finnish
For Foreign Students
Faculties and Departments
University Library

Professor Matti Pietikäinen, Professor Janne Heikkilä, and Professor Olli Silvén
Information Processing Laboratory, Department of Electrical and Information Engineering, University of Oulu
mkp(at)ee.oulu.fi, jth(at)ee.oulu.fi, olli(at)ee.oulu.fi
The Machine Vision Group (MVG) is renowned world-wide for its expertise in computer vision. It works as a single well-focused research group and collaborates with other groups of similar status in Europe, the USA and China. The research areas of the group range from generic computer vision methodologies to machine vision applications and vision systems engineering. The results of its research have been widely exploited in industry. The current areas of interest include face analysis, camera-based user interfaces, visual surveillance, smart environments and energy-efficient architectures for vision computing.
During the year 2007, MVG participated in two thorough research evaluations with international evaluator committees, organized by the Academy of Finland and the University of Oulu. The results of both of the evaluations show that the chosen research strategies and practices have been successful, and that the research team has established its position among the most significant research teams in the field of machine vision.
Two important events were organized. First, in June 2007, the group organized the 30th anniversary seminar of the Pattern Recognition Society of Finland. The keynote speaker of the event was Prof. Rama Chellappa from the University of Maryland, who is one of the leading researchers in the field.
In August 2007, the group organized the Finnish Signal Processing Symposium. Participators of the event included mainly graduate students from Finland, but also some students from abroad took part in the symposium. The invited speech of the event was given by Prof. Yrjö Neuvo.
Prof. Yrjö Neuvo giving his speech at Finsig 2007 symposium. |
The group has attracted public visibility both within and outside academic forums. During the year 2007, the group made its new record in the amount of scientific publications. Also the quality of the publications has traditionally been good, and the papers have been published in leading conferences and journals of the field.
In addition, visibility outside the academic forums has been reached. The group has presented its research projects and activities several times in the most significant local and national, and also some international media forums presenting science news in the field. Among these forums were, for example, a national television programme Prisma Studio, the regional news of YLE, the newspaper Kaleva and a national professional magazine Prosessori.
The group has hosted visits of several respected and well-known scientists from abroad, including Professor Thomas S. Huang from the University of Illinois at Urbana-Champaign, Professor Rama Chellappa from the University of Maryland, and a group of researchers from the Academy of Sciences of the Czech Republic led by Professor Jan Flusser. In addition, several domestic visitors, for example from the Academy of Finland, were briefed about our research activities.
Professors Thomas S. Huang and Rama Chellappa visiting Oulu. |
|
The group has established active collaboration with some of the world’s leading institutions and top scientists. It has had in-depth collaboration with the University of Maryland (USA) since the early 1980´s, and more recent partners include the Chinese Academy of Sciences, INRIA Rhone-Alpes (France), and the Academy of Sciences of the Czech Republic. Joint research efforts with the University of Freiburg (Germany) have also been made.
The group fosters researcher mobility to and from our unit. Two of our researchers made research visits to partner institutions during the reporting period. The group has also visiting postdoctoral researchers and graduate students from abroad. Recently the co-operation with Chinese universities has been intensifying, and during the year 2007 two new Chinese postdoctoral researchers joined the group. This mobility has led to joint research projects and co-authored publications.
The group and its members are active in the scientific community. For example, in 2006-2007 Prof. Pietikäinen served as an area chair of the top-ranking IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007). He is also a Co-Chair of Workshops for the International Conference on Pattern Recognition (ICPR 2008). The professors of the group were committee members of several other international conferences and many researchers of the group served as reviewers for various journal and conference articles.
The current focus areas of the research consist of: 1) texture-based computer vision, including facial image analysis 2) geometric image and video analysis 3) machine vision for sensing and understanding human actions, 4) learning in machine vision, and 5) vision systems engineering, including new paradigms for embedded systems.
Texture is a fundamental property of surfaces. It can be seen almost anywhere. We have developed a novel methodology based on Local Binary Patterns (LBP), which has evolved to present a major breakthrough in texture analysis. It is already widely used all over the world. Our recent results show that the approach can offer significant potential for many important tasks in computer vision which have not been earlier regarded as texture problems.
In 2007, our research focused on robust LBP-based descriptors for static and dynamic textures, and on texture-based methods for face detection, face recognition, facial expression recognition, visual speech recognition and recognition of actions.
Visual speech recognition. |
In a study aiming for a better understanding of the properties of the Local Binary Pattern operator, a framework for image descriptors based on quantized joint distribution of filter bank responses was formulated. In this framework, it was shown that despite their small spatial support, the oriented derivative filters that can be used to implement the LBP operator outperform Gabor and MR8 filters in the texture categorization task. Furthermore, codebook and thresholding based quantization of filter bank responses were compared. Even though conceptually and computationally simpler, thresholding was found to perform better than codebook based quantization in many cases.
Local photometric descriptors computed for regions around interest points have proven to be very successful in many problems. These local features are distinctive, and robust with respect to changes in viewpoint, scale and occlusion. The most widely used is the SIFT descriptor proposed by David Lowe. Earlier, a novel Center-Symmetric-LBP (CS-LBP) interest region descriptor which combines the strengths of the SIFT descriptor and the LBP texture operator was proposed. In extensive experiments, our descriptor performed better than SIFT for most of the test cases, and about equally well for the remaining ones. Especially the tolerance of our descriptor to illumination changes was clearly demonstrated. Lately, we have prepared and submitted a journal article on our method. This research was carried out in collaboration with Dr. Cordelia Schmid from INRIA Rhône-Alpes, France.
The work with texture analysis in industrial applications was continued. An enhanced method for training SOM maps for defect detection with the LBP texture feature was developed. A journal article on this topic is in revision.
In 2005, we began to study methods for analyzing dynamic textures, i.e. textures in motion. A volume LBP operator (VLBP) was developed which combines temporal and spatial information (i.e. motion and appearance). A simplified method based on concatenated LBP histograms computed from three orthogonal planes (LBP-TOP) was also proposed. In 2007, a paper on these descriptors with an application to facial expressions was published in the prestigious IEEE Transactions on Pattern Analysis and Machine Intelligence journal. In the reporting year, we studied some improvements of the original VLBP operator and presented a proof on the relation of two versions of rotation invariant VLBP patterns.
Lip movements shown as features.(a) Block volumes. (b) LBP features from three orthogonal planes. (c) Concatenated features for one block volume with the appearance and motion. |
In 2004, we proposed a novel facial representation based on LBP features, obtaining excellent results. A paper on this topic was published in 2006 in IEEE Transactions on Pattern Analysis and Machine Intelligence. Our approach has evolved to be a growing success. It has been adopted and further developed by many research groups and leading scientists working in the field.
We continued the investigation of spatiotemporal approaches to face recognition from videos by considering an extended set of Volume LBP features and AdaBoost learning. Among the key properties of the new approach are: (i) the use of local Extended Volume LBP based spatiotemporal description instead of the holistic representations commonly used in previous works; (ii) the selection of only personal specific facial dynamics while discarding the intrapersonal temporal information; and (iii) the incorporation of the contribution of each local spatiotemporal information. The extensive experimental analysis with three different publicly available video face databases and five benchmark methods (PCA, LDA, LBP, HMMs and ARMA) clearly assessed the excellent performance of the proposed approach, significantly outperforming the comparative methods and thus advancing the state-of-the-art. We also began the application of the proposed approach to other facial image analysis tasks such as gender classification from videos.
Research and experiments on face and eye detection for person authentication in mobile phones were also made yielding in a prototype of an authentication system using Haar-like features with AdaBoost for face and eye detection, and LBP features with histogram intersection matching for face verification. Average authentication rates of 82% for small-sized faces (40×40 pixels) and 96% for faces of 80×80 pixels were obtained on a Nokia N90 mobile phone with an ARM9 processor with 220 MHz. These obtained results are encouraging, and point to the feasibility of face authentication in mobile phones.
Research on face recognition techniques was also continued with the development of a pattern recognition database that would make it possible to classify faces against large databases in real time. Extensive tests with a large-scale similarity matching study were made. An article was written for a journal (yet to be published).
Facial expression recognition using LBP-TOP descriptors was further studied. It was shown that our approach performs well also at low frame rates and for low-resolution image sequences. A near real-time experimental system was implemented to demonstrate the applicability of the method. A simple web camera was used to capture videos in an office environment.
Learning discriminative spatiotemporal features for expression pairs Sadness vs. Anger and Happiness vs. Fear. |
We also continued research on recognition of isolated phrases using only visual information. Visual speech information plays an important role in speech recognition under noisy conditions or for listeners with hearing impairment. A human listener can use visual cues, such as lip and tongue movements, to improve speech understanding. An appearance feature representation based on LBP-TOP descriptors was proposed, taking into account the motion of the mouth region and time order in pronunciation. Positions of the eyes determined by a robust face and eye detector are used for localizing the mouth regions in face images. Spatiotemporal LBP-TOP patterns extracted from these regions are applied for describing phrase sequences. In our experiments, promising accuracies of 62% and 70% were obtained in speaker-independent and speaker-dependent recognition, respectively.
In comparison with other methods on the Tulips1 audio-visual database, the accuracy of 92.7% of our method clearly outperformed the others. Advantages of our approach include local processing and robustness to monotonic gray-scale changes. Moreover, no error prone segmentation of moving lips is needed.
Feature definition and selection are two important aspects in visual analysis of motion. We have also been investigating the use of the spatiotemporal local binary patterns computed at multiple resolutions for describing dynamic events, which combine static and dynamic information from different spatiotemporal resolutions. Appearance and motion are the key components for visual analysis related to movements. The AdaBoost algorithm is utilized for learning the principal appearance and motion from spatiotemporal descriptors derived from three orthogonal planes, providing important information about the locations and types of features for further analysis. In addition, learners are designed for selecting the most important features for each specific pair of different classes. The figure above shows the selected features for two expression pairs. The experiments carried out on diverse visual analysis tasks, facial expression recognition and visual speech recognition, show the effectiveness of the approach.
Imaging geometry provides the basic mathematical tools for analyzing the spatial properties of images. For image sequences, motion is another important feature that describes the temporal changes between successive images. Topics dealing with both imaging geometry and motion have been explored in this part of the research.
During the year 2007, we have continued our research in geometric camera calibration. The emphasis has been on the modeling and calibration of omnidirectional cameras which is currently an active research area in the computer vision community. In particular, we have further developed our generic calibration approach, which is based on using a planar calibration pattern and a flexible parametric camera model. The approach was experimented on with various kinds of cameras, including fish-eye lens cameras and catadioptric cameras, and it was observed that in most cases a subpixel level of calibration accuracy could be achieved. The results of the development work were included in our camera calibration toolbox which is publicly available on the Internet, and already widely used in different universities and organizations all over the world. Furthermore, our first studies on self-calibration of generic omnidirectional cameras have been published. Potential application areas of this research include panoramic imaging and 3D modeling.
an omnidirectional camera produces a 360 degree image. On the right, smaller images cut from the original. |
Recent studies and advances in image-based 3D reconstruction and object recognition suggest that the reconstruction and recognition procedures should be combined in order to build efficient systems for automatic scene analysis. Inspired by this development we have extended our previous work about quasi-dense matching, and applied it for object recognition tasks. The results obtained with publicly available datasets indicate that reliable object recognition is possible, also in difficult viewing conditions where extensive background clutter, partial occlusion, large scale and viewpoint changes, or notable geometric deformations are present.
Our research on affine invariants concentrated on finalizing the formulation of the multiscale framework and expanding the variety of the related image transformations. The methodology of the presented framework is provided by the idea of combining novel image transformation to the existing feature extractors in both recognition and registration, which enables us to effectively multiply the number of output features of the original methods. A major advantage in the approach is also the possibility to use one transformation in combination with many different feature extractors, enabling a generic use of the developed techniques. The results presented in the affine case hold great promise for generalization to cover other distortions as well.
In many practical situations, where imaging conditions are not ideal, the images captured are degraded by blur due to motion of the camera or the lens system being out of focus. This problem has been addressed in our research on blur invariant pattern recognition and image registration. The frequency domain invariant features developed in our group are insensitive to centrally symmetric blur, including linear motion and out of focus blur. In our recent work, the blur invariant phase correlation (BIPC) method has been extended to images that are also subject to rotation, scaling and translation. Similar methodology has been used for object recognition purposes as well. Our latest achievement was to broaden the method for affine invariance. In all these cases, our methods have outperformed the blur moment invariants that are the only previously known approach to deal with image blurring in object recognition and image registration.
Another problem related to common acquisition devices is that they are not able to capture the entire dynamic range that natural scenes often exhibit. High Dynamic Range Imaging (HDRI) techniques try to recover the radiance at each pixel location by merging a sequence of images taken with different exposure times. However, results are often spoiled by the presence of moving objects that cause ghosting artifacts in the output image. A new ghost-removal approach has been developed in order to make HDR acquisition of non-static scenes more feasible. The proposed method proved to be fairly robust and work better than existing algorithms in many circumstances.
An example of ghost-removal in a non-static scene. Above the original image, below without the ghosts. |
In our research on image processing for an MRI-compatible robot, usage of computer vision for surgical navigation has been studied, and an experimental setup with near-IR lighting and cameras has been built. Different camera calibration and pose estimation methods and algorithms have been implemented and evaluated. Based on this research, we have developed a marker based pose estimation framework, which can be utilized in the control of a surgical robot.
We have continued our research on vision based human-computer interaction. Such technologies are likely to be building blocks for cognitive systems embedded in homes, offices, vehicles, and the equipment we use for everyday tasks.
Our research on vision-based human-computer interaction has progressed with several new advanced concepts. One of them is a technique for finger tracking with a mobile device, where we utilize a novel method that embeds the Kalman filter and the expectation maximization algorithms in a special way to estimate the finger as well as the background motion. This Kalman-EM technique was successfully applied for controlling the user interface of a mobile phone, where the users were allowed to interact with the device by moving a finger in front of the camera.
Interaction with a mobile phone by moving a finger in front of the camera. |
Finger motion can be directly used to browse information on the display of the device, but it also enables us to give specific commands to the device with simple gestures, which involves using hidden Markov models (HMMs) to model sequences of motion features. In order to improve the recognition performance we have proposed an approach, where the motion trajectories are filtered based on the entropy of a histogram of the velocity. Sequences with high entropy, and so more random velocity, are rejected as possibly being unintended or incorrect. This work was extended to adapting the models to individual users using unsupervised maximum a posteriori (MAP) adaptation. The velocity entropy measure was used as criteria for successfully controlling the adaptation process.
Another related topic is the research on a camera based document scanner that has been conducted in collaboration with the University of Maryland. Instead of using devices such as flatbed scanners, our solution was to allow the users to capture large document images with their mobile phones equipped with a camera. In this work, we have developed a method where the device interactively guides the user to move the device over, for example, a newspaper page in a manner that a panorama image can be assembled from individual frames. During online scanning, motion determined from low-resolution images is used to control the interaction process, while good high-resolution images of the document page are captured from the most favorable locations and used to construct the mosaic image automatically.
Automatic interpretation of hand gestures has many potential applications, for example in natural user interfaces, automatic sign language recognition, virtual reality, and even emotion recognition. We developed a robust real-time method for hand tracking based on particle filtering. The method uses computationally efficient color blob features for effective propagation of particles. The experiments showed that the method is able to track a hand in the presence of complicating factors such as fast hand movement, and clutter and movements in the background.
Hand gestures can act as commands in natural user interfaces. |
Our work has continued on the recognition of human body parts from silhouette images based on statistical models, specifically, Gaussian mixture models and hidden Markov models. Using large amounts of artificially generated labeled training data, models were created for human body parts. The performance of these models was tested using a variety of real test data collected from different sources. These tests showed that the models performed well with very noisy and occluded silhouettes. This body part recognition was extended to unusual pose recognition by estimating the overall model confidence based on the likelihood ratio of the recognized pixels.
We have also developed a new algorithm for tracking multiple objects. The method is based on our Kalman-EM framework, where the measurements are assumed to obey a dynamic Gaussian mixture model. The algorithm includes a novel way of extracting the measurements from binary masks using basic morphological operations. In the current implementation, we utilize color features to determine the interesting objects from the background. Preliminary experiments are promising and they indicate that the algorithm could have great potential in practical multi-object tracking applications.
Human motion can be seen as a type of moving texture pattern. We developed a novel description for human movements by describing human activities with texture features. We use temporal templates as a pre-processing stage and describe their local characteristics with LBP features to obtain a short time motion description. By using the local properties of movements, our method captures the essential information from human movements while it also allows variation in performance of activities. The method was tested on a database (http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html) of various human movements. A classification rate of 98% was achieved with HMM modeling, which equals the best results reported.
Human activities described with texture features. |
A Tekes funded project PersonID came to an end in the summer of 2007. During the last year of the project, a state-of-the-art image enhancement method based on blind super-resolution was applied to real world video surveillance data. Also, the surveillance algorithm developed in the previous year of the same project was developed further: Abnormal human activity detection was studied and, as a result, an algorithm capable of detecting simple but important anomalies in human behavior (like collapsing and tripping) was added to the existing multi-object tracking framework. After the Tekes project ended, the research focus has been on object tracking and recognition in a distributed environment of multiple cameras and image processing elements.
Detection of abnormal human activity. |
Learning is a key bottleneck in building cognitive machine vision systems, regardless of whether they are intended for industrial inspection, visual surveillance, or medical image analysis. Most of our recent research on learning is related to manifold learning and nonlinear dimensionality reduction, as it is known that high-dimensional feature data often lies in a lower dimensional manifold in the observation space. This can be exploited by learning the low dimensional manifolds from image data, and then selecting appearance models or features for recognition.
This approach is used as such in interdisciplinary brain activity analysis research which utilizes functional magnetic resonance imagery (fMRI). This work was initiated in 2006 together with the Department of Radiology (the largest unit in the Nordic countries) at the University of Oulu, and the Chinese Academy of Sciences (Prof. Yu-Feng Zang, National Laboratory of Pattern Recognition). Our goal is to develop clinically applicable diagnostic methods modeled according to solutions originally developed for learning in visual inspection applications. The goal is the early detection of neurodegenerative diseases such as Parkinson’s and Alzheimer’s. Resting state fMRI (functional Magnetic Resonance Imaging) is used for obtaining the data. A project on this topic is funded from the NEURO 2006-2009 program of the Academy of Finland.
A successful combination of machine learning and bioinformatics, started in 2004, has continued. A new non-negative matrix factorization algorithm for unsupervised data reduction was proposed and tested on face images. Its goal is to detect most representative patterns for each class of data without knowing a priori class labels of patterns. These representative patterns can facilitate clustering or classification of large data sets. Results were published in the Signal Processing Journal.
Research on ensembles (collections) of classifiers led to the 1st Workshop on Supervised and Unsupervised Ensemble Methods and their Applications organized by Dr. Oleg Okun together with Dr. Giorgio Valentini from the University of Milan, Italy, in Girona, Spain on June 4, 2007. Both co-chairs acted as editors of the workshop CD proceedings. The workshop was attended by researchers from Spain, Portugal, Italy, France, the USA, and Finland.
Two classifier ensembles (of k-nearest neighbors and of decision trees) were researched for gene expression based cancer classification, with the conclusion that ensembles of k-nearest neighbors are much smaller in size and more accurate that those of decision trees.
To enable useful real-world systems, our vision system engineering research provides guidelines for methodological research, helping to identify attractive approaches, architectures, and algorithms, as general purpose computing is seldom a realistic option. In practice, solutions from low-level image processing to even equipment installation and operating procedures need to be considered simultaneously. The roots of this expertise are in our industrial visual inspection research in which we met extreme computational requirements already in the early 1980’s, and we have contributed to the designs of several industrial systems. Recently, we have applied our expertise to applications intended for mobile platforms, and our collaborative vision computing architecture research is a recent spin-off.
An example of a novel application intended for mobile platforms is the panorama imager that “glues” together frames selected from video sequences. The selection process analyzes the displacements between the video frames, measures the blur due to motion and focusing, selecting the suitable frames for mosaicing, and detects moving targets and human faces ensuring that they are not mutilated in the process. In other words, the apparently simple panorama capturing process contains lots of image analysis functionality to achieve good image quality. Some of the solutions developed for mobile platforms are already being re-used in industrial applications. For instance, the frame selection techniques of panorama capture are employed in developing a matrix camera based quality monitoring system for a printing machine that can cope with flutter and frequent environmental disturbances.
Machine vision applications are characterized by both high data input rates and high computational costs. For instance, a typical raw digital video rate is around 10 Mpixels/s and its processing demands at least a few hundred operations per pixel, requiring multiple GOPS of computing power. Often this needs to be done in a small package, such as a mobile communications device that may allocate at most 500 mW for application processing like video coding or person identification. This prevents implementations based on conventional processors even in the future, but hardware acceleration is mandatory. Currently, only monolithic long latency accelerators improve energy efficiency, but they are rigid and costly to design and verify, and difficult to justify for purposes which are considered marginal and computationally expensive.
Together with the Technical Universities of Tampere (Prof. Jarmo Takala) and Helsinki (Prof. Petri Vuorimaa), and the Åbo Akademi University (Prof. Johan Lilius) we are concentrating on improving the energy-efficiency of embedded high performance computing. We have demonstrated that fine grained, silicon area efficient adaptable hardware accelerators can be employed at very low software interface overheads through deterministic multithreaded schedules. This has turned out to be much more efficient than the conventional interrupt, semaphore, and event handler mechanisms advocated by the textbooks. In essence, we are targeting a new paradigm for embedded computing and expect significant impacts in the field. Our current demonstrations include the simultaneous decoding of multiple MPEG-4 streams on shared accelerators and MIMO reception. The latter work has been carried out in cooperation with Centre for Wireless Communications (CWC).
An example of stream computing: Transport Triggered Architecture. a) Instruction fetch and decoding. b) Processor organization principle. |
Our work is already attaining international scale as Mr. Jani Boutellier from our group is currently a visiting researcher at the Processor Architecture Laboratory of the Ecole Polytechnique Federale de Lausanne (EPFL). He is participating in the development of a scheduler for the emerging ISO standard of Reconfigurable Video Coding. The methodology used is essentially that used in our research, developed in cooperation with Prof. Shuvra Bhattacharyya at the University of Maryland. Mr. Boutellier visited UMD during the fall of 2006, and Mr. Tuukka Toivonen continued with the same theme during the spring of 2007, developing a fine grained scheduled solution for implementing the NTT (Number Theoretic Transform). These contributions are expected to be of significant practical importance in industry when design tools supporting the new approach are available.
Visual inspection is economically still the most important application area of machine vision. The inspection systems can be relatively expensive as long as they provide high added value, and are therefore attractive testing grounds for new technologies. Typical inspection targets include part assemblies in the electronics and car industry, continuous webs such as paper, steel and fabrics, and natural materials such as wooden boards and coffee beans. Many of these targets are textured and colored, such as wood, and the inspection problem is solved best with respective methods. Several inspection systems based on our results are marketed by our industrial partners for applications ranging from coffee bean sorting to paper formation measurement.
Quality problems of different degrees in newspaper letters. |
|
Currently, we are investigating methods and means for building visual inspection systems for exceptionally demanding applications such as non-destructive, non-contact dynamic strength grading of wooden boards. The underlying observation is that the strength is different to the axial, tangential and radial directions of the wood grain, while the behavior of the grain is affected differently by sound and dry knots. After color and texture based image analysis has provided the grain and knot information, 3-D Finite Element Model (FEM) is built and analyzed for strength. Due to the interdisciplinary nature of the research area, the MVG has started cooperation discussions with Prof. Mark Hughes’ group at the Wood Technology Laboratory of the Helsinki University of Technology.
The dynamic ranges of the camera sensors are rather limited. As a cure, we have endeavored to create techniques that enable the capturing of good quality images from unevenly illuminated scenes, and under infra-red illumination. These techniques borrow from our visual inspection and mobile device imaging solutions, once more bridging between the apparently very different domains.
Our approach of combining world-class basic research with more applied research on vision systems and systems engineering is quite unique, giving rise to our research having a great practical impact. We conceive of machine vision research as a remarkable field of science that contributes to the competitiveness of Finnish enterprises by developing methods and techniques for improving the performance and usability of industrial machines and products.
The results of the project on developing novel solutions for embedded systems design are expected to have a significant commercial influence. The new technology provides significantly improved energy efficiency when fine grained hardware acceleration is mandatory . The first commercial uses are expected to be in mobile video codecs. We are working together with Videra Oy to enable home video sensor technologies for detecting accidents and illness related problems. The solutions synthesize infrared imaging, human action recognition, and visual learning technologies that make the systems installable in almost any home. The first uses of the technology are expected to be in retirement and nursing homes.
Another example of the impact of our work is that in 2005 Intopii Ltd., a spin-off company from our texture research, entered into a cooperative agreement with the Cognex Corporation, the world’s leading supplier of machine vision systems. In late 2006 another spin-off, Visidon Ltd., was launched. The company provides intelligent computer vision solutions for mobile devices as well as special system, algorithm, and software design and training services for a variety of demanding industrial and consumer applications.
The Machine Vision Group is now stronger than ever. Working as a single well-focused research group, in which different teams and researchers work closely together, has made it possible for them to benefit from each other’s work and cumulative past experiences in an efficient way.
Within international collaboration, the group will participate in a new European FP7 project called Mobile Biometry (MOBIO) coordinated by the IDIAP Research Institute, Switzerland. The goal of this three-year project (2008-2010) is to investigate multiple aspects of face and speech data in user authentication of mobile devices. The approach for face description using local binary patterns introduced by MVG, will play a significant role in the consortium.
The research activities of the group have always been a proven ground on actual research challenges and improving the state-of-art methods. In the near future, more attention will be given to ubiquitous computing, where machine vision applications are being embedded into the home environment even as invisible solutions.
It is becoming technically possible to build smart rooms which have an all-around technical wireless infrastructure capable of sensing and interpreting human actions. Machine vision will play a key role in developing such ubicom systems. Our successful research on sensing and understanding human actions, face detection and recognition, and interpretation of emotions, as well as energy efficient engineering continues within new European, Academy and Tekes projects.
In order to develop challenging real-world applications, many scientific and engineering problems need to be solved. Although very successful in controlled environments in the industry, to reach homes and consumers, machine vision needs major breakthroughs and generic methodologies that make the technology inherently robust and simple to use. In practice, the Machine Vision Group will be able to achieve this by preserving the successful approach of carrying out top-class fundamental research in chosen key areas, close interaction between basic and applied research, and in-depth international collaboration.
|
professors & doctors |
13 |
|
graduate students |
15 |
|
others |
15 |
|
total |
43 |
|
person years |
28 |
|
Source |
EUR |
|
Academy of Finland |
242 000 |
|
Ministry of Education |
210 000 |
|
Tekes |
228 000 |
|
domestic private |
355 000 |
|
total |
1 035 000 |
Barnard M, Hannuksela J, Sangi P & Heikkilä J (2007) A vision based motion interface for mobile phones. Proc. 5th International Conference on Computer Vision Systems (ICVS 2007), Bielefeld, Germany, 1-10.
Boutellier J, Bhattacharyya S & Silvén O (2007) Low-overhead run-time scheduling for fine-grained acceleration of signal processing systems. Proc. IEEE Workshop on Signal Processing Systems (SIPS 2007), October 17-19, Shanghai, China, 457-462.
Hadid A, Heikkilä JY, Silvén O & Pietikäinen M (2007) Face and eye detection for person authentication in mobile phones. Proc. First ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC-07), Vienna, Austria, 101-108.
Hadid A, Pietikäinen M & Li SZ (2007) Learning personal specific facial dynamics for face recognition from videos. In: Analysis and Modeling of Faces and Gestures, AMFG 2007 Proceedings, Lecture Notes in Computer Science 4778, 1-15.
Hadid A, Zhao G, Ahonen T & Pietikäinen M (2008) Face analysis using local binary patterns. Handbook of Texture Analysis, Imperial College Press, 27 p, in press (invited chapter).
Hannuksela J, Sangi P & Heikkilä J (2007) Vision-based motion estimation for interaction with mobile devices. Computer Vision and Image Understanding: Special Issue on Vision for Human-Computer Interaction, 108(1-2): 188-195.
Hannuksela J, Sangi P, Heikkilä J, Liu X & Doermann D (2007) Document image mosaicing with mobile phones. Proc. 14th International Conference on Image Analysis and Processing, September 10-13, Modena, Italy, 575-580.
Heimonen T (2007) Computer vision based pose estimation and tracking of a magnetic resonance imaging compatible robot - a feasibility study. Licentiate thesis, Department of Electrical and Information Engineering, University of Oulu, Finland, 147 p + App.
Huttunen S, Heikkilä J & Silvén O. (2008) A distance education system with automatic video source selection and switching. Advanced Technology for Learning, in press.
Kannala J, Brandt SS (2007) Quasi-dense wide baseline matching using match propagation. Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2007), June 17-22, 1-8.
Kannala J, Brandt SS & Heikkilä J (2008) Measuring and modelling sewer pipes from video. Machine Vision and Applications 19(2): 78-83.
Kouropteva O, Okun O & Pietikäinen M (2007) Semi-supervised visualization of high-dimensional data. Pattern Recognition and Image Analysis 17(4).
Ojansivu V & Heikkilä J (2007) Image registration using blur invariant phase correlation. IEEE Signal Processing Letters 14(7): 449-452.
Okkonen M, Kellokumpu V, Pietikäinen M & Heikkilä J (2007) A visual system for hand gesture recognition in human-computer interaction. In: Image Analysis, SCIA 2007 Proceedings, Lecture Notes in Computer Science 4522, 709-718.
Okun O & Priisalu H (2007) Unsupervised data reduction. Signal Processing 87(9): 2260-2267.
Okun O & Priisalu H (2007) Random forest for gene expression based cancer classification: overlooked issues. Proc. 3rd Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA 2007), Girona, Spain, 483-490.
Okun O & Valentini G (Eds.) (2007) CD Proc. Workshop on Supervised and Unsupervised Ensemble Methods and Their Applications.
Silvén O & Jyrkkä K (2007) Observations on power-efficiency trends in mobile communication devices. EURASIP Journal on Embedded Systems, Volume 2007, Article ID 56976, 10 p.
Silvén O & Rintaluoma T (2007 Energy efficiency of video decoder implementations. In : Fitzek FHP & F Reichert F, eds: Mobile Phone Programming and its Application to Wireless Networking, Springer Verlag, 20, 421-439.
Takala V & Pietikäinen M (2007) Multi-object tracking using color, texture and motion. Proc. Seventh IEEE International Workshop on Visual Surveillance (VS 2007), Minneapolis, USA, 7 p.
Zhao G, Cui L & Li H (2007) Gait recognition using fractal scale. Pattern Analysis & Applications 10(3): 235-246.
Zhao G & Pietikäinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6): 915-928.
Zhao G, Pietikäinen M & Hadid A (2007) Local spatiotemporal descriptors for visual recognition of spoken phrases. Proc. 2nd International Workshop on Human-Centered Multimedia (HCM2007), Augsburg, Germany, 57-65.