Explain the World to Me - Camera-Based Internet Search

As part of a project, an application for classifying vehicles according to model and brand was developed. In addition, an information retrieval system was integrated, which is based on the recognition results and retrieves relevant information from a vehicle database. The aim was to create a system that delivers precise detection results and displays them in a user-friendly app.

Multi-step approach #

The approach consisted of several steps that built on each other:

Coarse detection of vehicles in the image:
In a first step, the Faster-RCNN architecture was used to roughly localize vehicles in the image. Faster-RCNN was chosen for its efficiency and precision in object localization.
Fine detection and classification by make and model:
Once the vehicles were localized, classification was performed using a ResNet model that identifies the specific make and model of the vehicle. The use of ResNet allowed complex visual features to be captured that were critical to accurately differentiate vehicle types.
Information retrieval based on the recognition results:
The recognized vehicle makes and models were used to retrieve relevant information such as technical data, years of manufacture and variants from a purpose-built vehicle database. This system made it possible to provide the classification results with additional context and to provide the user with further information.
Display of the results in an app:
The user interface was developed in PyQt and allowed for an intuitive presentation of the results. The user could upload an image, classify the vehicle and then view the associated vehicle data.

graph TB A[Vehicle image] --> B[Faster-RCNN coarse detection] B --> C[ResNet classification] C --> D[Information Retrieval]

For the development, Caffe was used as Deep Learning Framework.
This framework was chosen due to its widespread use and availability at the time.

For data pre-processing, augmentation was also used intensively to expand the data: This included, for example, rotation, scaling and brightness and color adjustments of the images.

The application was developed using Scrum. The individual components were developed, tested and iteratively improved in regular sprints and presented to the stakeholder.

Conclusion #

The developed application combines modern deep learning algorithms with an effective information retrieval system. Thanks to the Faster RCNN architecture and the ResNet model, it was possible to precisely classify vehicles and retrieve additional information from a database. The intuitive user interface in PyQt makes the application quick and easy to use.

Activities #

Implementation and realization of an object recognizer using the deep learning framework Caffe and the Faster RCNN architecture
Implementation of a RestNet for the fine classification of vehicles
Temporary project management and management of a 10-person team
Creation of an own car data set
Augmentation of the created data set
Writing various documentation and reports