摘要:

目前傳統監視攝影系統雖然具有“監視“的效果，不過卻都是等到事故發生時，才利用人眼判斷過濾影片中的可疑事物，這樣的做法只是亡羊捕牢，完全無法降低意外發生的損失。有鑑於傳統監視系統的被動性，我們提出一套新的“智慧型監視系統“，它能夠分析畫面中的移動事物，進一步判別此事物否具有危險性，並主動告知管理人員，讓管理人員能夠在第一時間制止這類事情的發生，降低損失。另外在犯罪發生後，我們要找到犯人變得更快了，因為我們只要做類似google search之類的搜尋動作即可，不必大量的觀看所有的錄影。一般而言，智慧型的監視系統的主要功能都基於三個基本功能，分別為視訊物件的分割、物件的描述及物件的追蹤。這裡要介紹的演算法，是一個可以同時達到視訊物件的分割、描述及追蹤效果的單一演算法。

Abstract:

Segmentation, tracking, and description extraction are important operations in smart camera surveillance systems. A robust segmentation-and-descriptor based tracking algorithm is introduced here. Segmentation is applied first, and description for each connected component is extracted for object classification to generate the video object masks. It can do segmentation, tracking, and description extraction with a single algorithm without redundant computation. In addition, a new descriptor for human objects, Human Color Structure Descriptor (HCSD), is also proposed for this algorithm. Experimental results show that the proposed algorithm can provide precise video object masks and trajectories. It is also shown that the proposed descriptor, HCSD, can achieve better performance than Scalable Color Descriptor and Color Structure Descriptor of MPEG-7 for human objects.

演算法簡介:

這個演算法可以由下圖(Fig. 1)來表示，它分為幾個步驟，以下就依序分別介紹:

Segmentation: 可以將監視畫面中移動的物體分割出來，如Fig. 2. (b) 所示。

Connected Component Labeling: 將Segmentation步驟中分割出來的區塊，進一步標示為許多分離的不同物件。如Fig. 2. (c) 所示，不同的物件以不同的顏色深淺表示。

Skeletonization: 將不同的物件，其身體的不同區塊的骨幹分離出來，如果是人類物件的話，可以將其軀幹、頭及四肢分離出來。如Fig. 2. (d) 及(e) 所示，Fig. 2. (d)是人類軀幹部份的骨幹，Fig. 2. (d)是人類頭及四肢部份的骨幹。

Feature Extraction: 基於Skeletonization步驟的結果，統計每個物件不同區塊的特徵，如軀幹的顏色、腳所在的位置……等等，這些基本特徵將在最後整合成為對畫面中出現過的物件的描述。

Object Classification: 基於Feature Extraction得到的特徵，對物件進行分類，並建立目前這張畫面中出現的物件與上一張畫面出現的物件之間對應的關係，後者即在對物件進行追蹤。

Video Object Mask Generation: 將前面幾個基本步驟相關的結果作整合，可以得到每個物件在分割後的物件形狀位元圖，如Fig. 2. (b)。

Trajectory Generation: 將前面幾個基本步驟相關的結果作整合，可以得到每個物件的移動軌跡。

Human Color Structure Descriptor Generation: 將前面幾個基本步驟相關的結果作整合，可以得到人體顏色結構描述器(Human Color Structure Descriptor)對每個出現的物件描述的結果。人體顏色結構描述器的描述內容如下:

HCSDi = {(cib, pib), (cil, pil), (cis)}, ............................................................................................................................................(1)
其中cib, cil, and cis, 記錄的是人類物件i上衣、褲子及鞋子的顏色，而 pib and pil 記錄的是物件i軀幹及下肢所在的位置 .

Fig.1. Algorithm Flow Chart

Fig. 2. An example from segmentation step to skeletonization step, where the test sequence is Hall Monitor.

Algorithm Overview:

In this section, a segmentation-and-descriptor based tracking algorithm, which can do segmentation, description extraction, and tracking with a single algorithm without redundant computation, is proposed. The flow of the proposed algorithm is shown in Fig. 1, and an example is given in Fig. 2. The extraction process of the proposed descriptor, HCSD, is also shown in the flow of Fig. 1. First, the source surveillance video is segmented into background and foreground video objects, in which the object mask is produced [1]. Second, each object will be given an unique label in the connected component labeling step. Thirdly, each object will be decomposed into several meaningful parts in the skeletonization step by morphological skeleton transform [2]. Especially for the human objects, the head, body, hands and legs could be separated apart. Benefited from the skeletonization step, the color feature and the position of each part can be extracted in the feature extraction step. Besides, the size of each object is also extracted in this step. The extracted features can then be grouped to form HCSD, which can be represented with the following form : For an object i,

HCSDi = {(cib, pib), (cil, pil), (cis)}, ................................................(1)
where cib, cil, and cis, are the colors of body, legs, and shoes of human object i, respectively, and pib and pil are the positions of body and legs of the object.

Finally, with the features extracted in the feature extraction step, the objects detected in each frame can be simply classified, and the correspondences between the objects detected in the current frame and those detected in the previous frames can be easily built in the object classification step, in which the video object tracking is done, and the trajectories of the video objects are produced as well.

Demo Sequences:

Sequence Name	Original Sequence	Demo Sequence
Hall_cif	Hall_cif	Hall_cif_Demo
Outside_Library	Ourside_Library	Ourside_Library_Demo

發表過的相關論文(Published Papers):

Human Object Tracking Algorithm with Human Color Structure Descriptor for Video Surveillance Systems; Shao-Yi Chien; Wei-Kai Chan; Der-Chun Cherng; Jing-Ying Chang; Multimedia and Expo, 2006 IEEE International Conference on; July 2006 Page(s):2097 - 2100
High Performance Low Cost Video Analysis Core for Smart Camera Chips in Distributed Surveillance Network; Wei-Kai Chan; Shao-Yi Chien; Multimedia Signal Processing, 2006 IEEE 8th Workshop on; Oct. 2006; Page(s):170 - 175

參考文獻(Reference):

[1] S.-Y. Chien, Y.-W. Huang, B.-Y. Hsieh, S.-Y. Ma, and L.-G. Chen, “Fast video segmentation algorithm with shadow cancellation, global motion compensation, and adaptive threshold techniques,” IEEE Transactions on Multimedia, vol. 6, no. 5, pp. 732–748, Oct. 2004.

[2] J. Xu, “A generalized discrete morphological skeleton transform with multiple structuring elements for the extraction of structural shape components,” IEEE Trans. Image Processing, vol. 12, pp. 1677 – 1686, Dec. 2003.