Fast Index

A fast index is an opportunity to speed up the emben (facen) search tens times as compared to the normal mode of operation. The index has 3 main parameters affecting accuracy and speed:

  • "m (M)": the number of bi-directional links created for every new element during construction. Reasonable range of M is 2–100. Higher values of M work better on datasets with high intrinsic dimensionality and/or high recall, while low values of M work better on datasets with low intrinsic dimensionality and/or low recalls.

  • "search_ef (ef)": the size of the dynamic list for the nearest neighbors (used during the search).

  • "ef (ef_construction)": the parameter has the same meaning as search_ef, but controls the index_time/index_accuracy. Bigger value of "ef" leads to longer construction, but better index quality.

The values in brackets are values in the fast index library documentation. See ALGO_PARAMS.md for more information.

Live Index

The Live index is a gallery working mode, where new objects are immediately turned into the index, which is periodically saved to a disk, and the search always uses the index. It’s possible when there aren’t any other search filters than emben(facen).

Create gallery with live index parameters:

POST /v2/galleries/add/:name

{
    "live_idx": {
        "enabled": true,
        "snapshot_path": "/tmp/idx.bin",
        "snapshot_interval_seconds": 10000,
        "snapshot_changes_count": 99,
        "initial_size": 100,
        "m": 4,
        "ef": 100,
        "search_ef": 100
    }
}
  • "enabled": enables live index, boolean.

  • "snapshot_path": path to the file with live index snapshot, string. The directory must already exist.

  • "snapshot_interval_seconds": the interval of the index snapshot creation, uint64_t.

  • "snapshot_changes_count": the count of added/removed indexed space objects, after which snapshot will be created, uint64_t.

  • "initial_size": the count of objects in the gallery, after that index will create, uint64_t.

  • "m", "ef", "search_ef": fast index parameters.

Important

Gallery creation returns null in case of success. To check the parameters of newly created gallery use POST /v2/galleries/get/:name.

Note

All numeric index creation parameters of gallery are checked for >=4 to protect from mistakes.

This example uses tntapi API. You can also perform the same operation via sf-api.

Warning

Live index does not work on replicas!

Index snapshots

The index is saved to the snapshot file (snapshot_path) either once every snapshot_interval_seconds seconds or after snapshot_changes_count changes made to the gallery (whichever happens first). All records will be moved from the space linear to the space indexed after saving the snapshot file. When restarting, the tntapi service tries to load index from snapshot_path and add to snapshot all new records (after last snapshot) from space linear. Then all records with tag deleted will be removed from the space indexed. Snapshot operations are blocking, no interaction with tntapi is possible while they are in progress.

Important

It is recommended to take snapshots every N object additions and limit the size of the gallery with live index.

Objects removing

Internally, removed records are only marked as “removed” but still occupy space in fast index. Having a large amount of removed records may reduce both performance and accuracy.

Important

If your use-case involves a lot of deletions, we recommend to organize your workflow in a way that allows you to discard galleries as a whole (i.e. for historical data use a gallery per day or per week and delete oldest gallery on a regular basis instead of deleting individual records, or, for static data, regularly rotate galleries by copying live records into a newly created empty gallery) to keep accumulation of removed records low.

To enable a live index, do the following:

  1. Create a directory for live index snapshot.

    sudo mkdir -p /opt/ffserver/tnt/001/{snapshots,xlogs,live_index}
    
  2. Start the tntapi docker container.

    docker run -tid --name tnt-1-1 --restart always --network server \
        --env CFG_LISTEN_HOST=0.0.0.0 \
        --env CFG_NTLS=ntls:3133 \
        --env TT_LISTEN=0.0.0.0:32001 \
        --env TT_MEMTX_MEMORY=$((1024 * 1024 * 1024)) \
        --volume /opt/ffserver/tnt/001-01:/opt/ntech/var/lib/tarantool/default \
        --publish 127.0.0.1:8001:8001 \
        docker.int.ntl/ntech/universe/tntapi:ffserver-11.240325
    
  3. Create a gallery:

    curl -D - -X POST -s 'http://localhost:8001/v2/galleries/add/testgal' --data '{
        "live_idx": {
            "enabled": true,
            "snapshot_path": "/opt/ntech/var/lib/tarantool/default/live_index/idx.bin",
            "snapshot_interval_seconds": 30,
            "snapshot_changes_count": 99,
            "initial_size": 100,
            "m": 4,
            "ef": 100,
            "search_ef": 100
        }
    }'
    
    HTTP/1.1 201 Created
    X-request-id: TN:77gGv1a1
    Content-type: application/json
    X-read-only: false
    Content-length: 4
    Connection: keep-alive
    Server: Tarantool http (tarantool v2.10.4-2-gd536a7aa5)
    
    null
    
  4. Add more than 100 objects to the gallery testgal.

  5. Check index snapshot on the path /opt/ffserver/tnt/001/live_index/idx.bin.

    Warning

    Do not move the snapshot file to another location!

  6. Send emben search request to the gallery and check response header "X-search-stat":

    HTTP/1.1 200 Ok
    X-request-id: TN:EiQdsLeF
    X-search-stat: batch_size:1, fastIndex:yes;
    Content-type: application/json
    X-read-only: false
    Content-length: 1882
    Connection: keep-alive
    Server: Tarantool http (tarantool v2.10.4-2-gd536a7aa5)