Multiple Video Cards Usage

Should you have several video cards installed on a physical server, you can create additional findface-extraction-api or findface-video-worker instances on your GPU-based system and distribute them across the video cards, one instance per card.

If you have followed the instruction to prepare the server on Ubuntu (CentOS, Debian) you should already be all set to proceed.

However, before you take further action, make sure that you have a properly generated /etc/docker/daemon.json configuration file. The example is provided for the Ubuntu operating system and shows how to configure the Docker network and also configure usage of the NVIDIA Container Runtime. Note that in our example, we imply that you have already installed the NVIDIA Container Runtime.

sudo su
BIP=10.$((RANDOM % 256)).$((RANDOM % 256))
cat > /etc/docker/daemon.json <<EOF
{
   "default-address-pools":
        [
                {"base":"$BIP.0/16","size":24}
        ],
    "bip": "$BIP.1/24",
    "fixed-cidr": "$BIP.0/24",
    "runtimes": {
         "nvidia": {
             "path": "nvidia-container-runtime",
             "runtimeArgs": []
         }
    },
    "default-runtime": "nvidia"
}
EOF

Refer to the corresponding sections of the documentation to validate configuration of the /etc/docker/daemon.json file for CentOS and Debian.

In this section:

Distribute findface-extraction-api Instances Across Several Video Cards
Allocate findface-video-worker to Additional Video Card

Distribute `findface-extraction-api` Instances Across Several Video Cards 

To distribute the findface-extraction-api instances across numerous video cards, create multiple instances of the findface-extraction-api service in the docker-compose.yaml configuration file. Then, for the findface-extraction-api GPU instances to work within one system, bind them via a load balancer, e.g., Nginx.

Do the following:

Configure the docker-compose.yaml file.

Open the /opt/findface-cibr/docker-compose.yaml file and create multiple records of the findface-extraction-api configuration in the findface-extraction-api section. Each new instance of the findface-extraction-api should be configured similarly to another one so that there are no problems in query processing. Do the following:

Rename the default findface-extraction-api service section.
Create configuration copies for the new instances.
Configure the instances.

Below is an example of two configured services:

  sudo vi /opt/findface-cibr/docker-compose.yaml

  findface-extraction-api-0:
  command: [--config=/etc/findface-extraction-api.ini]
  depends_on: [findface-ntls]
  environment:
    - CUDA_VISIBLE_DEVICES=0
    - CFG_LISTEN=127.0.0.1:18660
  image: docker.int.ntl/ntech/universe/extraction-api-gpu:ffserver-9.230407.1
  logging: {driver: journald}
  network_mode: service:pause
  restart: always
  runtime: nvidia
  volumes: ['./configs/findface-extraction-api/findface-extraction-api.yaml:/etc/findface-extraction-api.ini:ro',
    './models:/usr/share/findface-data/models:ro', './cache/findface-extraction-api/models:/var/cache/findface/models_cache']
findface-extraction-api-1:
  command: [--config=/etc/findface-extraction-api.ini]
  depends_on: [findface-ntls]
  environment:
    - CUDA_VISIBLE_DEVICES=1
    - CFG_LISTEN=127.0.0.1:18661
  image: docker.int.ntl/ntech/universe/extraction-api-gpu:ffserver-9.230407.1
  logging: {driver: journald}
  network_mode: service:pause
  restart: always
  runtime: nvidia
  volumes: ['./configs/findface-extraction-api/findface-extraction-api.yaml:/etc/findface-extraction-api.ini:ro',
    './models:/usr/share/findface-data/models:ro', './cache/findface-extraction-api/models:/var/cache/findface/models_cache']

findface-extraction-api-0 — the readable name of an instance. Must be unique for each instance;
CUDA_VISIBLE_DEVICES=0 — GPU ID to run the service instance. Must be unique for each instance;
CFG_LISTEN=127.0.0.1:18660 — IP:Port to listening requests. Must be unique for each instance;
./cache/findface-extraction-api/models:/var/cache/findface/models_cache — the volume for storing the model cache. It may be the same if you have the same GPU models. In other cases, the caches are stored separately.

The remaining configuration parameters can be the same for every findface-extraction-api service instance.

Configure the load balancing in the /opt/findface-cibr/docker-compose.yaml file. To do so, add a new service (findface-extraction-api-lb in our example) to the docker-compose.yaml file, e.g., to the end of the file.
```
findface-extraction-api-lb:
  depends_on: [findface-ntls]
  image: docker.int.ntl/ntech/multi/multi/ui-cibr:ffcibr-2.1.2
  logging: {driver: journald}
  network_mode: service:pause
  restart: always
  volumes: ['./configs/findface-extraction-api/loadbalancer.conf:/etc/nginx/conf.d/default.conf:ro']
```
We have specified the loadbalancer.conf configuration file in the volumes of the findface-extraction-api-lb section. Now we need to create it.

Create the load balancer configuration file with the following content:
```
sudo vi /opt/findface-cibr/configs/findface-extraction-api/loadbalancer.conf

upstream findface-extraction-api-lb {
    least_conn;
    server 127.0.0.1:18660 max_fails=3 fail_timeout=60s;
    server 127.0.0.1:18661 max_fails=3 fail_timeout=60s;
}

server {
    listen 18666 default_server;
    server_name _;
    location / {
        proxy_pass http://findface-extraction-api-lb;
    }
}
```
In the upstream section, configure the balancing policy and the list of the findface-extraction-api instances between which the load will be distributed.

You can choose among the sending policy options below, but we recommend using the least_conn sending policy.
- round-robin — requests are distributed evenly across the backend servers in a circular manner;
- least_conn — requests are sent to the server with the fewest active connections. This algorithm is useful when you have backend servers with different capacities;
- weighted — backend servers are assigned different weights, and requests are distributed based on those weights. It allows you to prioritize certain servers over others.
The server 127.0.0.1:18666 max_fails=3 fail_timeout=60s line consists of the following meaningful parts:
- server — defines an upstream server in Nginx;
- 127.0.0.1:18660 — represents the IP address (127.0.0.1) and port (18660) of the findface-extraction-api instance;
- max_fails=3 — this parameter sets the maximum number of failed attempts that Nginx will allow when connecting to the server before considering it as unavailable;
- fail_timeout=60s — this parameter specifies the amount of time Nginx should consider the server as unavailable (in this case, 60 seconds) if it exceeds the maximum number of failed attempts specified by max_fails;
- weight=2 (not used in the provided example) — this parameter allows you to assign a relative weight or priority to each server in an upstream group. This means that for every request, the server with weight=2 will receive twice as many requests. It might be useful in scenarios where GPUs of different performance are used.

Open the /opt/findface-cibr/configs/findface-sf-api/findface-sf-api.yaml configuration file. If the address or port of the load balancer differs from the 127.0.0.1:18666 in the extraction-api section, you must set it to 127.0.0.1:18666. Otherwise, don’t change anything.

sudo vi /opt/findface-cibr/configs/findface-sf-api/findface-sf-api.yaml

extraction-api:
  timeouts:
    connect: 5s
    response_header: 30s
    overall: 35s
    idle_connection: 10s
  max-idle-conns-per-host: 20
  keepalive: 24h0m0s
  trace: false
  url: http://127.0.0.1:18666

Rebuild all FindFace CIBR containers, removing at the same time orphan containers for services that are no longer defined in the docker-compose.yaml file.
```
cd /opt/findface-cibr/

docker-compose down
docker-compose up -d --remove-orphans
```

To make sure that everything works as expected, check the logs of the services.

docker compose logs --tail 10 -f findface-extraction-api-0
docker compose logs --tail 10 -f findface-extraction-api-1
docker compose logs --tail 10 -f findface-extraction-api-lb

Allocate `findface-video-worker` to Additional Video Card 

To create an additional findface-video-worker instance on your GPU-based system and allocate it to a different video card, do the following:

In the /opt/findface-cibr/docker-compose.yaml file, specify the findface-video-worker configuration for each running findface-video-worker instance. Copy your current findface-video-worker configuration.

sudo vi /opt/findface-cibr/docker-compose.yaml

findface-video-worker:
  command: [--config=/etc/findface-video-worker.yaml]
  depends_on: [findface-video-manager, findface-ntls, mongodb]
  environment: [CUDA_VISIBLE_DEVICES=0]
  image: docker.int.ntl/ntech/universe/video-worker-gpu:ffserver-9.230407.1
  logging: {driver: journald}
  network_mode: service:pause
  restart: always
  runtime: nvidia
  volumes: ['./configs/findface-video-worker/findface-video-worker.yaml:/etc/findface-video-worker.yaml:ro',
    './models:/usr/share/findface-data/models:ro', './cache/findface-video-worker/models:/var/cache/findface/models_cache',
    './cache/findface-video-worker/recorder:/var/cache/findface/video-worker-recorder']

Then, adjust it accordingly.

findface-video-worker-0:
  command: [--config=/etc/findface-video-worker.yaml]
  depends_on: [findface-video-manager, findface-ntls, mongodb]
  environment:
    - CUDA_VISIBLE_DEVICES=0
    - CFG_STREAMER_PORT=18990
    - CFG_STREAMER_URL=127.0.0.1:18990
  image: docker.int.ntl/ntech/universe/video-worker-gpu:ffserver-9.230407.1
  logging: {driver: journald}
  network_mode: service:pause
  restart: always
  runtime: nvidia
  volumes: ['./configs/findface-video-worker/findface-video-worker.yaml:/etc/findface-video-worker.yaml:ro',
    './models:/usr/share/findface-data/models:ro', './cache/findface-video-worker/models:/var/cache/findface/models_cache',
    './cache/findface-video-worker/recorder:/var/cache/findface/video-worker-recorder']
findface-video-worker-1:
  command: [--config=/etc/findface-video-worker.yaml]
  depends_on: [findface-video-manager, findface-ntls, mongodb]
  environment:
    - CUDA_VISIBLE_DEVICES=1
    - CFG_STREAMER_PORT=18991
    - CFG_STREAMER_URL=127.0.0.1:18991
  image: docker.int.ntl/ntech/universe/video-worker-gpu:ffserver-9.230407.1
  logging: {driver: journald}
  network_mode: service:pause
  restart: always
  runtime: nvidia
  volumes: ['./configs/findface-video-worker/findface-video-worker.yaml:/etc/findface-video-worker.yaml:ro',
    './models:/usr/share/findface-data/models:ro', './cache/findface-video-worker/models:/var/cache/findface/models_cache',
    './cache/findface-video-worker/recorder:/var/cache/findface/video-worker-recorder']

The main parameters here are as follows:

findface-video-worker-0 — the new name of the findface-video-worker instance (in our example, the findface-video-worker instance is assigned a number equal to the GPU ID on the device);
CUDA_VISIBLE_DEVICES=0 — the CUDA device ID to run a new instance;
CFG_STREAMER_PORT=18991 — the streamer port (must be unique);
CFG_STREAMER_URL=127.0.0.1:18991 — the URL to connect to and get stream from the findface-video-worker instance (must be unique).

All other parameters remain unchanged.

Rebuild all FindFace CIBR containers, removing at the same time orphan containers for services that are no longer defined in the docker-compose.yaml file.
```
cd /opt/findface-cibr/

docker-compose down
docker-compose up -d --remove-orphans
```

To make sure that everything works fine, check the logs of the findface-video-worker services.

docker compose logs --tail 10 -f findface-video-worker-0
docker compose logs --tail 10 -f findface-video-worker-1

Multiple Video Cards Usage

Distribute findface-extraction-api Instances Across Several Video Cards

Allocate findface-video-worker to Additional Video Card

Distribute `findface-extraction-api` Instances Across Several Video Cards 

Allocate `findface-video-worker` to Additional Video Card 