
Group members: Sunny Jovita, Jeconiah Richard, Christensen Mario Frans, Muhammad Lukman
Problem Description
Computers have a lot of magnificent functions that makes human life and work easier. With the additional vision ability given to computers, it will enable the computer to reach a wider scope of access. In our project, we use computer vision to detect objects that appear in the given frame by using OpenCV (Python library) and Caffe implementation of Google MobileNet Single Shot multibox Detection (SSD) to recognize them.
With this program, computers are now able to assist surveillance systems to detect humans in each location or can be advanced further to look up a person’s identity from a scene with TensorFlow and Keras. We have come up to the extent where the computer is able to detect humans in the frame and other 21 objects listed in the Caffe Model. This program is very possible to be further advanced, however, more computer power is required to do so. Here is the architecture of Single Shot Multibox that will be used in the program.
Related Work
In the process of making this project, there are several libraries that are used to make this program work. As mentioned in Section 1.2., OpenCV and MobileNet SSD are part of them. Here is the list of libraries that are used in the program:
- OpenCV library is commonly known as opencv-python and is imported to the code as cv2. cv2 is the latest version of opencv and can be accessed by downloading the opencv-python package.
- Numpy library is commonly used in python to make 2d arrays and matrices. This library is crucial in resizing the frame of the input and array making in the program.
NOTE: Each library is needed for the program to run and must be set as project interpreter. Python 3+ is needed.
Design of Algorithm
This program mainly uses python as the programming language with two additional Python libraries, OpenCV and Numpy.
OpenCV is a library of programming functions mainly aimed at real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage then Itseez.
NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
In this project, we firstly take a video input with openCV from the device’s webcam (if any) or any camera connected to the computer. The video will be broken down into frames that will be converted into Binary Large Object (BLOB) before the computer processes it further. Then, for each frame that has been converted into BLOB will be forwarded to MobileNet SSD as an input and will return some values like, Object name index, Detection confidence and X and Y location of the image in the frame. These values will act as the output of this program. The object name index will be displayed on the terminal console and on the frame (in real time) with its confidence next to it. The X and Y coordinates will be used to draw the bounding box of the image on the frame (in real time). A while loop will be inserted in the code to ensure the program repeats until a certain key is pressed.
Youtube link : https://youtu.be/8uchU5G1jJ4









