.. _video-cards:

Multiple Video Cards Usage
========================================


Should you have several video cards installed on a physical server, you can create additional ``findface-extraction-api`` or ``findface-video-worker`` instances on your GPU-based system and distribute them across the video cards, one instance per card.  

If you have followed the instruction to prepare the server on :ref:`Ubuntu <deploy-prepare-server>` (:ref:`CentOS <deploy-prepare-centos>`, :ref:`Debian <deploy-prepare-debian>`) you should already be all set to proceed.

However, before you take further action, make sure that you have a properly generated ``/etc/docker/daemon.json`` configuration file. The example is provided for the Ubuntu operating system and shows how to configure the Docker network and also configure usage of the NVIDIA Container Runtime. Note that in our example, we imply that you have already :ref:`installed <nvidia_container_runtime_ubuntu>` the NVIDIA Container Runtime.

.. code::

   sudo su
   BIP=10.$((RANDOM % 256)).$((RANDOM % 256))
   cat > /etc/docker/daemon.json <<EOF
   {
      "default-address-pools":
           [
                   {"base":"$BIP.0/16","size":24}
           ],
       "bip": "$BIP.1/24",
       "fixed-cidr": "$BIP.0/24",
       "runtimes": {
            "nvidia": {
                "path": "nvidia-container-runtime",
                "runtimeArgs": []
            }
       },
       "default-runtime": "nvidia"
   }
   EOF
   
Refer to the corresponding sections of the documentation to validate configuration of the ``/etc/docker/daemon.json`` file for :ref:`CentOS <nvidia_container_runtime_centos>` and :ref:`Debian <nvidia_container_runtime_debian>`.


.. rubric:: In this section:

.. contents::
   :local:


Distribute ``findface-extraction-api`` Instances Across Several Video Cards 
-----------------------------------------------------------------------------------

To distribute the ``findface-extraction-api`` instances across numerous video cards, create multiple instances of the ``findface-extraction-api`` service in the ``docker-compose.yaml`` configuration file. Then, for the ``findface-extraction-api`` GPU instances to work within one system, bind them via a load balancer, e.g., Nginx. 

Do the following:

#. Configure the ``docker-compose.yaml`` file. 

   #. Open the ``/opt/findface-cibr/docker-compose.yaml`` file and create multiple records of the ``findface-extraction-api`` configuration in the ``findface-extraction-api`` section. Each new instance of the ``findface-extraction-api`` should be configured similarly to another one so that there are no problems in query processing. Do the following:

      * Rename the default ``findface-extraction-api`` service section.

      * Create configuration copies for the new instances.

      * Configure the instances.

      Below is an example of two configured services:

 
      .. code::

         sudo vi /opt/findface-cibr/docker-compose.yaml

         findface-extraction-api-0:
         command: [--config=/etc/findface-extraction-api.ini]
         depends_on: [findface-ntls]
         environment:
           - CUDA_VISIBLE_DEVICES=0
           - CFG_LISTEN=127.0.0.1:18660
         image: docker.int.ntl/ntech/universe/extraction-api-gpu:ffserver-9.230407.1
         logging: {driver: journald}
         network_mode: service:pause
         restart: always
         runtime: nvidia
         volumes: ['./configs/findface-extraction-api/findface-extraction-api.yaml:/etc/findface-extraction-api.ini:ro',
           './models:/usr/share/findface-data/models:ro', './cache/findface-extraction-api/models:/var/cache/findface/models_cache']
       findface-extraction-api-1:
         command: [--config=/etc/findface-extraction-api.ini]
         depends_on: [findface-ntls]
         environment:
           - CUDA_VISIBLE_DEVICES=1
           - CFG_LISTEN=127.0.0.1:18661
         image: docker.int.ntl/ntech/universe/extraction-api-gpu:ffserver-9.230407.1
         logging: {driver: journald}
         network_mode: service:pause
         restart: always
         runtime: nvidia
         volumes: ['./configs/findface-extraction-api/findface-extraction-api.yaml:/etc/findface-extraction-api.ini:ro',
           './models:/usr/share/findface-data/models:ro', './cache/findface-extraction-api/models:/var/cache/findface/models_cache']

      * ``findface-extraction-api-0`` — the readable name of an instance. Must be unique for each instance;

      * ``CUDA_VISIBLE_DEVICES=0`` — GPU ID to run the service instance. Must be unique for each instance;

      * ``CFG_LISTEN=127.0.0.1:18660`` — IP:Port to listening requests. Must be unique for each instance;

      * ``./cache/findface-extraction-api/models:/var/cache/findface/models_cache`` — the volume for storing the model cache. It may be the same if you have the same GPU models. In other cases, the caches are stored separately.

       The remaining configuration parameters can be the same for every ``findface-extraction-api`` service instance.
          
   #. Configure the load balancing in the ``/opt/findface-cibr/docker-compose.yaml`` file. To do so, add a new service (``findface-extraction-api-lb`` in our example) to the ``docker-compose.yaml`` file, e.g., to the end of the file.

      .. code::

         findface-extraction-api-lb:
           depends_on: [findface-ntls]
           image: docker.int.ntl/ntech/multi/multi/ui-cibr:ffcibr-2.1.2
           logging: {driver: journald}
           network_mode: service:pause
           restart: always
           volumes: ['./configs/findface-extraction-api/loadbalancer.conf:/etc/nginx/conf.d/default.conf:ro']

      We have specified the ``loadbalancer.conf`` configuration file in the volumes of the ``findface-extraction-api-lb`` section. Now we need to create it.

#. Create the load balancer configuration file with the following content:

   .. code::   

      sudo vi /opt/findface-cibr/configs/findface-extraction-api/loadbalancer.conf

      upstream findface-extraction-api-lb {
          least_conn;
          server 127.0.0.1:18660 max_fails=3 fail_timeout=60s;
          server 127.0.0.1:18661 max_fails=3 fail_timeout=60s;
      }

      server {
          listen 18666 default_server;
          server_name _;
          location / {
              proxy_pass http://findface-extraction-api-lb;
          }
      }

   In the ``upstream`` section, configure the balancing policy and the list of the ``findface-extraction-api`` instances between which the load will be distributed.

   You can choose among the sending policy options below, but we recommend using the ``least_conn`` sending policy.

   * ``round-robin`` — requests are distributed evenly across the backend servers in a circular manner;

   * ``least_conn`` — requests are sent to the server with the fewest active connections. This algorithm is useful when you have backend servers with different capacities;

   * ``weighted`` — backend servers are assigned different weights, and requests are distributed based on those weights. It allows you to prioritize certain servers over others.

   The ``server 127.0.0.1:18666 max_fails=3 fail_timeout=60s`` line consists of the following meaningful parts:

   * ``server`` — defines an upstream server in Nginx;

   * ``127.0.0.1:18660`` — represents the IP address (``127.0.0.1``) and port (``18660``) of the ``findface-extraction-api`` instance;

   * ``max_fails=3`` — this parameter sets the maximum number of failed attempts that Nginx will allow when connecting to the server before considering it as unavailable;

   * ``fail_timeout=60s`` — this parameter specifies the amount of time Nginx should consider the server as unavailable (in this case, 60 seconds) if it exceeds the maximum number of failed attempts specified by ``max_fails``;

   * ``weight=2`` (not used in the provided example) — this parameter allows you to assign a relative weight or priority to each server in an upstream group. This means that for every request, the server with  ``weight=2`` will receive twice as many requests. It might be useful in scenarios where GPUs of different performance are used.


#. Open the ``/opt/findface-cibr/configs/findface-sf-api/findface-sf-api.yaml`` configuration file. If the address or port of the load balancer differs from the ``127.0.0.1:18666`` in the ``extraction-api`` section, you must set it to ``127.0.0.1:18666``. Otherwise, don't change anything. 

   .. code::

      sudo vi /opt/findface-cibr/configs/findface-sf-api/findface-sf-api.yaml

      extraction-api:
        timeouts:
          connect: 5s
          response_header: 30s
          overall: 35s
          idle_connection: 10s
        max-idle-conns-per-host: 20
        keepalive: 24h0m0s
        trace: false
        url: http://127.0.0.1:18666

#. Rebuild all FindFace CIBR containers, removing at the same time orphan containers for services that are no longer defined in the ``docker-compose.yaml`` file.

   .. code::

      cd /opt/findface-cibr/

      docker-compose down
      docker-compose up -d --remove-orphans

To make sure that everything works as expected, check the logs of the services.

.. code::

   docker compose logs --tail 10 -f findface-extraction-api-0
   docker compose logs --tail 10 -f findface-extraction-api-1
   docker compose logs --tail 10 -f findface-extraction-api-lb


Allocate ``findface-video-worker`` to Additional Video Card
-----------------------------------------------------------------------------

To create an additional ``findface-video-worker`` instance on your GPU-based system and allocate it to a different video card, do the following:
 
#. In the ``/opt/findface-cibr/docker-compose.yaml`` file, specify the ``findface-video-worker`` configuration for each running ``findface-video-worker`` instance. Copy your current ``findface-video-worker`` configuration.

   .. code::

      sudo vi /opt/findface-cibr/docker-compose.yaml

      findface-video-worker:  
        command: [--config=/etc/findface-video-worker.yaml]
        depends_on: [findface-video-manager, findface-ntls, mongodb]
        environment: [CUDA_VISIBLE_DEVICES=0]
        image: docker.int.ntl/ntech/universe/video-worker-gpu:ffserver-9.230407.1
        logging: {driver: journald}
        network_mode: service:pause
        restart: always
        runtime: nvidia
        volumes: ['./configs/findface-video-worker/findface-video-worker.yaml:/etc/findface-video-worker.yaml:ro',
          './models:/usr/share/findface-data/models:ro', './cache/findface-video-worker/models:/var/cache/findface/models_cache',  
          './cache/findface-video-worker/recorder:/var/cache/findface/video-worker-recorder']

   Then, adjust it accordingly. 

   .. code::

      findface-video-worker-0:
        command: [--config=/etc/findface-video-worker.yaml]
        depends_on: [findface-video-manager, findface-ntls, mongodb]
        environment:
          - CUDA_VISIBLE_DEVICES=0 
          - CFG_STREAMER_PORT=18990
          - CFG_STREAMER_URL=127.0.0.1:18990
        image: docker.int.ntl/ntech/universe/video-worker-gpu:ffserver-9.230407.1
        logging: {driver: journald}
        network_mode: service:pause
        restart: always
        runtime: nvidia
        volumes: ['./configs/findface-video-worker/findface-video-worker.yaml:/etc/findface-video-worker.yaml:ro',
          './models:/usr/share/findface-data/models:ro', './cache/findface-video-worker/models:/var/cache/findface/models_cache',
          './cache/findface-video-worker/recorder:/var/cache/findface/video-worker-recorder']
      findface-video-worker-1:
        command: [--config=/etc/findface-video-worker.yaml]
        depends_on: [findface-video-manager, findface-ntls, mongodb]
        environment:
          - CUDA_VISIBLE_DEVICES=1
          - CFG_STREAMER_PORT=18991
          - CFG_STREAMER_URL=127.0.0.1:18991
        image: docker.int.ntl/ntech/universe/video-worker-gpu:ffserver-9.230407.1
        logging: {driver: journald}
        network_mode: service:pause
        restart: always
        runtime: nvidia
        volumes: ['./configs/findface-video-worker/findface-video-worker.yaml:/etc/findface-video-worker.yaml:ro',
          './models:/usr/share/findface-data/models:ro', './cache/findface-video-worker/models:/var/cache/findface/models_cache',
          './cache/findface-video-worker/recorder:/var/cache/findface/video-worker-recorder']

   The main parameters here are as follows:

   * ``findface-video-worker-0`` — the new name of the ``findface-video-worker`` instance (in our example, the ``findface-video-worker`` instance is assigned a number equal to the GPU ID on the device);
   * ``CUDA_VISIBLE_DEVICES=0`` — the CUDA device ID to run a new instance;
   * ``CFG_STREAMER_PORT=18991`` — the streamer port (must be unique);
   * ``CFG_STREAMER_URL=127.0.0.1:18991`` — the URL to connect to and get stream from the ``findface-video-worker`` instance (must be unique). 

   All other parameters remain unchanged.

#. Rebuild all FindFace CIBR containers, removing at the same time orphan containers for services that are no longer defined in the ``docker-compose.yaml`` file.

   .. code::

      cd /opt/findface-cibr/

      docker-compose down
      docker-compose up -d --remove-orphans

To make sure that everything works fine, check the logs of the ``findface-video-worker`` services.

.. code::

   docker compose logs --tail 10 -f findface-video-worker-0
   docker compose logs --tail 10 -f findface-video-worker-1