Multi-camera tracking

Multi-Camera Tracking (MCT) is a crucial technology in an envisioned smart city which aims to track multiple people through a network of cameras. MCT is composed of three parts: Detection, Single-Camera Tracking(SCT), and data association. Generally, the inference of deep learning network relies on a strong server with unlimited computing power and memory space. Yet considering the tremendous amount of data collected by multiple cameras in a surveillance system, it is not practical to transmit all the video data to the central server due to the limited transmission bandwidth. Instead of streaming all the information to a central server, a better approach is to deploy the system on an edge device and convey the high-level information of interest to the server for further analysis. In that case, the limited computational resources on mobile devices are what we need to take into consideration.

We proposed the multi-camera tracking system on a real-world hardware with an efficient framework to demonstrate the viability on edge devices

Person re-identification

While MCT is a notoriously difficult problem to solve, a popular research topic has derived from the final step of the matching scheme, person re-identification (re-ID), which address the problem of recognizing people across cameras with visual appearance. Although person re-ID has received great improvement due to the rise of the Convolution Neural Network (CNN) with the supervised learning methods, the task of unsupervised cross-domain re-ID is still challenging owing to the lack of labelled data in the target domain.

We propose an unsupervised learning scheme of Hard Samples Rectification (HSR) for person re-ID which resolves the weakness of original clustering-based methods being vulnerable to the hard positive and negative samples in the dataset.



We propose a dual-faceted learning scheme of Hard Samples Rectification (HSR) which contains two components in the dual aspects:
1) an inter-camera mining technique (ICM) which utilizes the feature distribution and the camera ID information to resolve the shortcomings in the original clustering results caused by the hard positive pairs.
2) a part-based homogeneity technique (PBH) to split the possible hard negative pairs within a cluster into different groups by their features of local parts.

Proposed Method

Initially, the feature extractor is pretrained on the source dataset. For each iteration after clustering, we first rectify the hard negative pairs in the imperfect clusters with our part-based homogeneity technique (PBH) by splitting and regrouping the samples. The new refined pseudo label is then employed as the supervised information to fine-tune the model along with the cross-entropy loss and triplet loss. In the other aspect, we apply inter-camera mining technique (ICM) as a complement of clustering results by pulling close the possible hard positive pairs which are mutually top-$K$ closest to the anchor image and at the same time captured in different camera views.

Inter-camera mining

For each anchor image in the training procedure, we will mine the mutually top-K images closest to it in the feature space but with different camera views to form the possible hard positive pairs. To ensure the robustness and correctness of our inter-camera mining, an additional K mutually best-buddies pairs technique is applied.

Part-based homogeneity

We extract local features of upper part and lower part for each sample in the imperfect cluster and apply K-means clustering on the local features respectively to obtain two kinds of part-based labels. With the two temporary local labels, the cluster is then split into at most four different groups according to the look-up table.



System overview

The original framework of MCT system implementation is composed of four operators: Detector, Tracker, Extractor, and Matcher. The layout of the pipeline system requires memory buffers to store transmitted frames between every operator. Once the buffer has reached the maximum amount of storage, the operator would discard the redundant frames and degrade the performance of MCT by generating a considerable amount of false negatives in detection.

Proposed framework

We propose an effective framework by switching the order of Extractor and Tracker in the pipeline to make use of the lacking computing power. Detector only runs at predefined intervals while leaving the rest unprocessed to the next operator. We place Extractor after Detector so that the latter only extract features of a few frames on which Detector has run detection. Tracker would bridge the broken trajectories with Kalman Filter tracker and, at the same time, preserve the features of each track. Since there only exists a subtle difference between the person captured within a relatively short time, extracting features from sampled frames of a track is an acceptable approach to maintain the operating speed of the system. The Matcher would link up the corresponding tracks and assign identities based on the Euclidean distance between each feature.


We adopt the DukeMTMC as the dataset in our experiments by cutting off a few video sequences from the original testing set to form two scenarios, easy and hard, as the new testing set for executive expediency. The scene "Easy" contains 13 identities across two 90 seconds long of videos captured by two cameras, where scene "Hard" contains 28 identities across two 70 seconds long of videos.


Offline demo

Person summarization

Real-time system

To demonstrate the viability on an edge device, the system is implemented on an Intel mini-PC, NUC, as our hardware platform. All operations are performed in real-time.

Original framework

Proposed framework



This is bold and this is strong. This is italic and this is emphasized. This is superscript text and this is subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.

Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5
Heading Level 6


Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.


i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;

print 'It took ' + i + ' iterations to sort the deck.';



  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.


  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.


  1. Dolor pulvinar etiam.
  2. Etiam vel felis viverra.
  3. Felis enim feugiat.
  4. Dolor pulvinar etiam.
  5. Etiam vel felis lorem.
  6. Felis enim et feugiat.





Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99


Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99


  • Disabled
  • Disabled