Intelligent video analytics (IVA) is on my radar, having just finished audits for two residential estates, both of which are making use of thermal analytic technology on their perimeters, but where the two could not be more different in terms of the application of, and the understanding of, this technology.
About IVA
Video analytics in their most basic form perform real-time monitoring in which objects, their attributes, movement patterns and behaviour are detected and processed, and can also be used to go through historical data and mine that data for forensic purposes.
Fixed algorithm analytics and artificial intelligence (deep learning) analytics are similar in that they both determine if unwanted behaviour is occurring within the field of view of the camera. Facial recognition, the third common type of analytics, matches points on a face in real time with a sample of that face stored in a database.
However, IVA is also processor-intensive and so several approaches to this problem have been developed over time:
• IVA is processed on the camera itself (‘at the edge’).
• IVA is processed on the CCTV server, appliance, or NVR centrally.
• IVA is processed utilising third-party software installed on either the CCTV head-end or on a separate server.
• IVA is streamed to and processed offsite by a bureau (cloud service) providing IVA services.
Many manufacturers and system integrators nowadays tend to take a hybrid approach to client solutions, where processing by the cameras at the edge reduces bandwidth for real-time monitoring, and then centralising the data for forensic analysis at the head end.
Edge analytics in focus
At the outset, there was analogue CCTV, for which only very rudimentary analytics were sometimes implemented. Then came IP CCTV, which enabled the transmission of video via network cable, meaning IP cameras attached to IP networks could now analyse digital video by utilising computing power and using purpose-built algorithms for the processing of a sequence of well-defined instructions, designed to solve a particular class of problem (in this case, the intelligent analysis of CCTV footage).
Originally, the analytics took place at the server/appliance (or NVR) loaded with the relevant video management software, either by the manufacturer of the cameras, or by a third-party provider. This software became more sophisticated, adding sense and structure to the images that the cameras were viewing, and producing alerts if what was being observed could potentially be classified as a threat.
But this came with challenges, given the large amount of processing power required to effectively analyse the footage.
So manufacturers turned their attention to analytics taking place on the camera itself. With these edge analytics, both the image and the metadata of the image is analysed by each camera, without having to send video across the network to the VMS. Instead, the results of the analysis are sent to the VMS. If a camera is fitted with motion detection, for example, it will only start sending images when and if motion is detected.
Furthermore, cameras can now also record at the edge. Live viewing can be undertaken at relatively low resolution to conserve bandwidth, and high-resolution recordings can be made for forensic analysis and evidence. SD cards and the like take this a step further, enabling each camera to hold several terabytes of data.
The CCTV analytics world has been further impacted by artificial intelligence (particularly deep learning), in that AI has taken analytics to an altogether new level. Deep learning means that the performance of the relevant algorithm is continually being improved. The camera ‘learns’ the environment and begins to be able to distinguish between what should and should not be the case for that environment.
This has not made cameras predictive quite yet, but it certainly can significantly reduce nuisance alarms. How deep learning has affected processing is that there is more need for analytic processing on the camera before notification is sent to the video management software at the head end.
With my two residential estate audits in mind, the section to follow will briefly touch on detection principles and the effect on analytics of the physical installation.
The basics of IVA application
The Johnson metric has, as its basic premise, the camera’s ability to discriminate between objects, and it has become common to refer to the following levels of differentiation using this metric:
• Detection, which means that an unidentifiable object has been detected.
• Classification, which means that the camera’s analytics can distinguish between an inanimate (vehicle) and an animate (person or animal) object.
• Recognition, which means that the object can be distinguished as a person specifically.
• Identification, which means that the identity of the person is easily distinguishable.
A camera’s detection capability is determined by a multitude of factors, but in practical terms, the primary consideration is the number of pixels associated with the object in question. This in turn is determined primarily by the focal length of the lens, the size of the sensor, and the resolution of the camera.
Now as it relates to people, there is broad consensus between manufacturers that the following pixel counts are required for the various levels:
• Detection: 1.5 pixels.
• Classification: 6 pixels.
• Recognition: 12 pixels.
• Identification: 25 pixels.
Unfortunately, many manufacturers do not fully differentiate between recognition and identification. However, for this article, I am primarily interested in the detection of potential intruders and so the discrimination between recognition and identification is therefore not very important.
If we assume that 12 pixels is an appropriate pixel count for reliable recognition of a potential threat to a property, and assume a perimeter thermal camera with a 19 mm lens, this camera should be able to recognise a human at about 140 metres (or even slightly beyond). At that distance, however, the horizontal field of view is 29,5 metres across, and a human represents about 7% of that. Even though the camera can resolve a human, it is only a very vigilant control room operator, looking at a small number of screens, that is likely to detect that human.
So now we turn to using intelligent video analytics to assist human operators in surveying a scene. As part of this process, we need to consider that intrusion prevention is predicated by the Four Ds: Deter, Detect, Delay and Defend.
Deter, Detect, Delay and Defend
It is of paramount importance that any attempt at breaching the perimeter be detected as early and as reliably as possible and detection should also provide the ability to determine the nature and the exact location of the attempted breach. This is important for understanding whether the attempt is real because not being able to do so will easily result in ‘false response fatigue’, which will reduce effectiveness in understanding the nature of the intrusion to provide the right response, and for understanding exactly where the response should be directed to.
The objective behind setting up video analytics detection is simple, but practical implementation can be difficult. The objective is to set up the analytics so that it is impossible for a human to approach the perimeter barrier from the outside without being detected, while at the same time making sure that the analytics do not alarm for anything else.
For video analytics to work well, the area in which detection is required needs to be clear of obstacles, and the undulations of the area need to be such that no area falls below the direct line of sight of the camera (this principle applies to bends in the fence also). The camera should not be required to provide accurate detection further than its specified capacity.
There is a difference between detecting an object and detecting an object which is recognisable as human. The cameras need to be positioned so that each covers the dead spot of the other. This is not in terms of the view (how far the camera can see), but is in terms of the detection capability of each camera; analytics are not able to detect objects through fences accurately, even if these are wire fences or palisades which are easy to see through.
Detection without the ability to categorise the object will lead to a significant number of false alarms, which can easily overload an alarm stack and can also cause an unnecessary response or even system crashes.
Also, to meet the objective of the Four Ds, specifically the requirement for Delay, it follows that detection should be on the outer perimeter. Where this is not possible, it is important to rely on the barrier itself to provide adequate alarming, and for the corresponding camera to link to that alarm for visual confirmation. Again, this reduces nuisance alarms caused by the configuration of unnecessary detection rules, and streamlines the process.
A significant number of analytic algorithms (tasks) exist for various camera makes and models, ranging from object in field, to crossing line, to loitering, and so on. For most of these tasks, each thermal camera must be correctly calibrated for its position. This ‘teaches’ the camera about size and distance. This is used to configure the analytics to trigger only for events that meet certain size objectives.
It is beyond the scope of this article to provide a detailed description of how the analytics should be optimally configured, but I will suggest several significant guidelines:
• Avoid the use of ‘Detect any object’.
• Use ‘Enter field’ rather than ‘Object in field’.
• Try and use directional ‘Crossing line’ (IVA fow) rather than just ‘Crossing line’.
• Use ‘Loitering’ with care and with suitable delays. A loitering condition should almost always follow an alarm condition only. This is the single most important note to avoid copious nuisance alarms.
In terms of the physical installation of each camera, it is important to note that on a CCTV camera with a 60 mm lens, a 0,05 cm (half a millimetre) movement of the camera around its axis will result in a relative 100 cm (one metre) movement of an object in the field of view at 300 metres. The implication is obvious for video analytics. From the perspective of a camera on a pole, two events happen simultaneously in a strong wind:
• The first is that the camera direction is deflected by a degree directly correlated with the strength of the prevailing wind.
• The second is that harmonic vibration is likely to set in.
Any pole, regardless of its construction material, will be affected to a greater or a lesser extent by the wind. This is often noticeable with the naked eye (think streetlamps in a strong wind). It is therefore important from the perspective of utilising video analytics that this effect is minimised.
In addition to the movement of the pole itself, the mounting bracket with which the camera is affixed to the pole also contributes to unwanted movement. Sway is usually noticed at the top of the pole (this is known as first-level vibration). Second-level vibration (Aeolian vibration) is caused by steady winds ranging from 2 to 15 metres per second (8 to 56 km/h) and produces frequencies of 2-20 Hz. This vibration is predominantly caused by vortices that form on the back side of the pole as this steady stream of air passes across the pole.
The vortices originate from opposite sides of the pole and create alternating pressures that move at right angles to the direction of the airflow. This causes a high-frequency, short-cycle harmonic reaction. While no poles are immune to these effects, tapered spun concrete poles perform better than steel, fibreglass and wood poles. Brackets need to be small, rigid, and where larger cameras are used, also damped.
Kleyn Consulting is an independent risk, safety and physical security consultancy with experience in a range of verticals. Based in the Western Cape Winelands, Lesley-Anne travels across South Africa. Feel free to contact her on +27 64 410 8563 or at [email protected]
© Technews Publishing (Pty) Ltd. | All Rights Reserved.