Fast Index
A fast index is an opportunity to speed up the emben (facen) search tens times as compared to the normal mode of operation. The index has 3 main parameters affecting accuracy and speed:
"m (M)"
: the number of bi-directional links created for every new element during construction. Reasonable range of M is 2–100. Higher values of M work better on datasets with high intrinsic dimensionality and/or high recall, while low values of M work better on datasets with low intrinsic dimensionality and/or low recalls."search_ef (ef)"
: the size of the dynamic list for the nearest neighbors (used during the search)."ef (ef_construction)"
: the parameter has the same meaning as search_ef, but controls theindex_time/index_accuracy
. Bigger value of"ef"
leads to longer construction, but better index quality.
The values in brackets are values in the fast index library documentation. See ALGO_PARAMS.md for more information.
Live Index
The Live index is a gallery working mode, where new objects are immediately turned into the index, which is periodically saved to a disk, and the search always uses the index. It’s possible when there aren’t any other search filters than emben(facen).
Create gallery with live index parameters:
POST /v2/galleries/add/:name { "live_idx": { "enabled": true, "snapshot_path": "/tmp/idx.bin", "snapshot_interval_seconds": 10000, "snapshot_changes_count": 99, "initial_size": 100, "m": 4, "ef": 100, "search_ef": 100 } }
"enabled"
: enables live index, boolean.
"snapshot_path"
: path to the file with live index snapshot, string. The directory must already exist.
"snapshot_interval_seconds"
: the interval of the index snapshot creation, uint64_t.
"snapshot_changes_count"
: the count of added/removedindexed
space objects, after which snapshot will be created, uint64_t.
"initial_size"
: the count of objects in the gallery, after that index will create, uint64_t.
"m"
,"ef"
,"search_ef"
: fast index parameters.Important
Gallery creation returns
null
in case of success. To check the parameters of newly created gallery usePOST /v2/galleries/get/:name
.Note
All numeric index creation parameters of gallery are checked for >=4 to protect from mistakes.
This example uses
tntapi
API. You can also perform the same operation via sf-api.Warning
Live index does not work on replicas!
Index snapshots
The index is saved to the snapshot file (snapshot_path
) either once every snapshot_interval_seconds
seconds or after snapshot_changes_count
changes made to the gallery (whichever happens first).
All records will be moved from the space linear
to the space indexed
after saving the snapshot file.
When restarting, the tntapi
service tries to load index from snapshot_path
and add to snapshot all new records (after last snapshot) from space linear
.
Then all records with tag deleted
will be removed from the space indexed
.
Snapshot operations are blocking, no interaction with tntapi
is possible while they are in progress.
Important
It is recommended to take snapshots every N object additions and limit the size of the gallery with live index.
Objects removing
Internally, removed records are only marked as “removed” but still occupy space in fast index. Having a large amount of removed records may reduce both performance and accuracy.
Important
If your use-case involves a lot of deletions, we recommend to organize your workflow in a way that allows you to discard galleries as a whole (i.e. for historical data use a gallery per day or per week and delete oldest gallery on a regular basis instead of deleting individual records, or, for static data, regularly rotate galleries by copying live records into a newly created empty gallery) to keep accumulation of removed records low.
To enable a live index, do the following:
Create a directory for live index snapshot.
sudo mkdir -p /opt/ffserver/tnt/001/{snapshots,xlogs,live_index}
Start the
tntapi
docker container.docker run -tid --name tnt-1-1 --restart always --network server \ --env CFG_LISTEN_HOST=0.0.0.0 \ --env CFG_NTLS=ntls:3133 \ --env TT_LISTEN=0.0.0.0:32001 \ --env TT_MEMTX_MEMORY=$((1024 * 1024 * 1024)) \ --volume /opt/ffserver/tnt/001-01:/opt/ntech/var/lib/tarantool/default \ --publish 127.0.0.1:8001:8001 \ docker.int.ntl/ntech/universe/tntapi:ffserver-11.240325
Create a gallery:
curl -D - -X POST -s 'http://localhost:8001/v2/galleries/add/testgal' --data '{ "live_idx": { "enabled": true, "snapshot_path": "/opt/ntech/var/lib/tarantool/default/live_index/idx.bin", "snapshot_interval_seconds": 30, "snapshot_changes_count": 99, "initial_size": 100, "m": 4, "ef": 100, "search_ef": 100 } }' HTTP/1.1 201 Created X-request-id: TN:77gGv1a1 Content-type: application/json X-read-only: false Content-length: 4 Connection: keep-alive Server: Tarantool http (tarantool v2.10.4-2-gd536a7aa5) null
Add more than 100 objects to the gallery
testgal
.Check index snapshot on the path
/opt/ffserver/tnt/001/live_index/idx.bin
.Warning
Do not move the snapshot file to another location!
Send emben search request to the gallery and check response header
"X-search-stat"
:HTTP/1.1 200 Ok X-request-id: TN:EiQdsLeF X-search-stat: batch_size:1, fastIndex:yes; Content-type: application/json X-read-only: false Content-length: 1882 Connection: keep-alive Server: Tarantool http (tarantool v2.10.4-2-gd536a7aa5)