Research!

Object RecognitionThis week was a very successful one for me. I found a professor at the computing faculty of the University Bielefeld who is able to verify me in my fith a-levels exam. This is a special topic in informatics. Normally you have an exam in four topics. My fifth is an optional topic because I wanted to do this. It is about object recognition. I wanted to start with simple motion detection and maybe some object deteciton. Something like detecting a red ball.

Now everything gets more complicated. My new task is to create an application that is doing the following:

  1. Some objects are placed on a table. There is a camera directed in front of theese objects with a known distance
  2. User grabs an object and holds it in front of the camera
  3. The user puts the object back in any rotation. The object can also be flipped horizontally or vertically.
  4. The program has to identify that object

This will be a lot of work. I will use artificial neural networks for that process. Also some motion detection and maybe a lot of filters. I am just starting right now with an application written in C++. But there is the first problem. I need the rgb values in something like an Array (Flash) or int[WIDTH][HEIGHT][3] if you know about C. The current problem is that I do not have any clue about the WinAPI and all that stuff. There are a lot of tricks I do not know about. Maybe I will try using DirectShow for the video capture because I am currently using the Video For Windows (VFM) API and everything becomes very slow (9fps ;-)).

If there won’t be any solution (for me, as an unskilled person in C/C++) to capture the webcam nice and fast and put all the rgb values in an Array I have to use proce55ing or Actionscript3 for the whole project. But I need the speed.

Anyway. Thanks to David Rowald from extrajetzt for the nice stickers. They will look nice on my new ride - a Fiat126.

Update:
Phew. It was not that difficult to get more speed. Instead of using slow GetPixel functions of the WinAPI I am using now the direct information from the camera. One memcpy() call is doing me the work to divide all pixels into a rgb array, where I do not even have to split the values from 0xRRGGBB into RR,GG,BB. This is now done in one call. FPS still greater than 2*10^5.

Related Posts

Leave a Reply






Close
E-mail It