
High-tech projects always start with a multi-tiered plan that must include the Total Cost of Ownership (TCO) of the software and related infrastructure. The problem for many integrators and end-users is that when it comes to multi-vendor hardware and software systems, many hidden costs can be overlooked in the early planning stages.
As a team with extensive experience in implementing facial recognition solutions, we decided to share our own experience and tell you what pitfalls you may encounter when bringing your project to life. We will give you some tips on choosing facial recognition software that will allow you to get maximum results with minimum TCO. 
License Cost is not TCO
Some believe that it is enough to choose the recognition software from the vendor that offers the lowest license ownership price. However, this approach is flawed and can cost many times as much as a solution with a higher license price. Unforeseen costs can start with technical support that is not included in the license price and end up with serious limitations on the system based on such software when trying to change, add or remove something.
It is essential to recognize that the success of the project depends on the choice of the solution, and any overlooked detail can lead to failure. Therefore, it is necessary to choose flexible and stable software that is initially designed with the entire set of tasks to be performed and with possible future modifications in mind.
Always Consider Hardware
In addition to the software license, any recognition system involves lots of hardware, which includes cameras, kilometers of cables, switches, servers for video processing and data storage. Often it is the hardware that is the main item for both capital and operating expenses. Thus, in a large project across an extensive territory, the main expense item may be not servers or cameras, but high-quality cable. At the same time, when calculating the cost of the project, you should not discount the importance of software, because the choice of software solution provider directly affects the cost of ensuring the required computing power.
Avoid buying hardware just to cover the current needs, it is necessary to have a capacity reserve, because at the stage of evaluation and design it is not always possible to fully assess the requirements for the hardware. As a business develops, the problem of server shortage appears, which may require more computing power. The software itself constantly gets new updates and can also become a bit more demanding. Therefore, from the very beginning, you should not attempt to squeeze in the minimum possible hardware barely allowing everything to run.
By making a decision in favor of one or the other software, you either get a reliable basis for the recognition system, predictable at any scale, or you get a disappointing solution that requires endless investment. So what distinguishes the right solution from the software that generates a lot of budget sinkholes around itself? As our practice shows, thoughtfulness and optimization are the key factors of savings in various projects involving video analytics.
It is important to consider how the software uses the allocated resources. For instance, both CPU , and GPU  can be used to process video streams. Implementing a solution using only the CPU may seem easier (as it is more common, many specialists already have experience with it) and cheaper (as the minimum cost of such equipment is lower). However, when a certain number of streams to be processed is reached, this solution becomes increasingly expensive.

For example, if we are talking about working with one or two cameras, then even a compact NUC with intel core i5 can easily handle this task, but to process 20 cameras, you will need a server with 24 CPU cores, and further requirements will only increase. 
If we are talking about a GPU-based solution, the cost of “entry” to such a solution is significantly higher, but the performance of such solutions is also superior. For example, for the same 20 video cameras that can take care of facial recognition in a small shopping mall, just a single RTX 2080Ti video card and a 4-core CPU will be enough, and most importantly, the cost of expansion will only go down.
When we talk about large solutions for hundreds of video streams, the difference will be quite significant. As an example, let’s compare solutions for 100 video cameras in a mall between GPU and CPU in the format of monitoring the traffic and working with blacklists. When we recognize a face with a track that has not yet been completed, it implies a real-time operation. This scenario is the hardest for the system, as it has high speed requirements and maintains a constant workload. In such scenarios, the system can produce up to 1,000,000 recognitions per day, and the number of simultaneously recognized faces can reach hundreds.
We would need about 200 CPU cores, with at least 2.8 GHz each to provide a decent response speed. A standard server with two Intel Xeon on board is only 40 cores (threads are not counted). So we get quite a complex system of 5 servers, about 5 units high, with the corresponding heat generation and power consumption, and the price of about $ 70 000 (including all other components).
For the same scenario, using GPUs, we get 1 4U server with 8 Tesla T4 GPUs and an inexpensive CPU with 8 cores to process the requests. Such a solution will cost around $ 40 000, which is much cheaper both in terms of price and maintenance, as well as power consumption.
A solution designed to run on CPUs will cost more, since the density of processors per server is much lower than that of video cards. Configuration based on GPUs makes the system more compact, flexible and efficient. That’s why FindFace Multi is designed with the ability to run on GPUs, which allows you to process more video streams at a lower cost.
Another cost-saving aspect is the ability of the selected software to integrate with the existing infrastructure and work with a wide range of equipment. Many customers, who already have an extensive video surveillance system, simply want to supplement it with facial recognition functionality. Unfortunately, not every recognition system provider is willing to compromise, as this will lead to a noticeable reduction in recognition performance. Often implementing facial recognition means a complete upgrade of the equipment, which is much more expensive than building on top of the current capacity, and saving money is out of the question.
The NtechLab’s development team creates software products that can run on a wide range of equipment. There are no ties to either the server hardware manufacturer or the specialized cameras for facial recognition. It is enough to match hardware specifications and the camera installation requirements. Thanks to neural networks capable of effectively detecting people’s faces, only 30 pixels are enough to capture a face.
Choose Predictably Scalable Software
Scalability is one of the key characteristics of biometric recognition systems. Projects involving the use of such systems can be as localized as a single camera or terminal, or as large-scale as a network of cameras covering entire megacities. It is worth bearing in mind that any project must always include the possibility of expansion, which requires a high degree of flexibility of the system.
As the needs of the business grow, the scaling of the recognition system is inevitable, which puts some customers in a situation where an upgrade is required. And here a lot depends on the specific software provider, since the end price may be non-linear. Many facial recognition system vendors have software limitations on scaling, for example, when one server cannot handle more than a certain number of video streams or the total number of detection servers for the system is limited. This leads to a lot of complexity in the final solution architecture and the cost of its deployment, operation and support. Our experience shows that non-optimized detection software, when extended, can cost 2−3 times more than a solution with a well-thought-out architecture and a high degree of optimization.
For recognition systems, there is also an additional aspect in the form of the speed of image retrieval and comparison with images from the database, which can range in size from tens of thousands to billions of faces. As the amount of data increases, non-optimized solutions have a lower search speed, while specialized software has a consistently high speed. For example, FindFace Multi is capable of searching images in 0.025 sec for 100,000 records, while linear horizontal scaling (sharding) ensures expansion to any size without performance loss.
Consider the Architecture of the System
Large projects to implement facial recognition usually involve a large number of remote objects. For example, if we are talking about an industrial facility with the extraction and processing of natural resources, the objects can be located kilometers away from each other, and it is natural that very high-bandwidth and reliable communication channels will be required to establish interaction between all elements of the network.
Transmitting video streams from the cameras to the processing server requires a high speed, so it is costly over long distances. Would it be reasonable to support a system with 30 cameras, each requiring at least 4 Mbit/s for HD cameras and 8 Mbit/s for FullHD, and collectively over 120 Mbit/s and 240 Mbit/s? Usually, these costs are not accounted for, which can significantly increase the TCO later on. Not all software developers foresee this problem, which makes it impossible to change the configuration of the system to meet the needs of the customer.
One way to reduce bandwidth costs is to move the video pre-processing servers directly to the location where the cameras are installed. A local server can receive the video stream directly, analyze it and send only relevant images to the main server. Sending images will require significantly less bandwidth, and even 2−4 cameras can operate on a 1−2 Mbit/s connection.
But sometimes even such speed is difficult to provide, or there may be requirements for the response speed, which at a connection of 1−2 Mbit/sec is not the fastest or the connection is unstable at all. For such systems it is possible to build a network of independent installations, fully autonomous, but with the synchronization of the dossier database. Thus, the system will have an up-to-date and synchronized database of faces, high response speed (as all components are placed locally) and there is no dependence on the stability of the Internet connection.
It is critical to choose a solution that has the ability to work in any hybrid combination without losing efficiency.
How to calculate TCO?
Calculating the total cost of ownership of a project with facial recognition is a complex task that only specialists with many years of experience in biometric system integration can perform. You can always turn to our experts, who will select the right integrator for you, taking into account all the details and potential pitfalls. Just fill out the contact form with the description of your project and wait for a response which will come shortly.
