Рейтинг@Mail.ru

Документация Tarantool 1.7.5

Замечание

Документация находится в процессе перевода и может отставать от английской версии.

Tarantool - Documentation

Что нового?

Здесь собрана информация о существенных изменениях, которые произошли в конкретных версиях Tarantool’а.

Более мелкие изменения и исправления дефектов указаны в отчетах о выпущенных стабильных релизах (milestone = closed) на GitHub.

Что нового в Tarantool 1.7?

The disk-based storage engine, which was called sophia or phia in earlier versions, is superseded by the vinyl storage engine.

Добавлены новые типы индексируемых полей.

Обновлена версия LuaJIT.

Automatic replica set bootstrap (for easier configuration of a new replica set) is supported.

Функция space_object:inc() объявлена устаревшей.

Функция space_object:dec() объявлена устаревшей.

The space_object:bsize() function is added.

The box.coredump() function is removed, for an alternative see Core dumps.

The hot_standby configuration option is added.

Configuration parameters revised:

  • Parameters renamed:
    • slab_alloc_arena (in gigabytes) to memtx_memory (in bytes),
    • slab_alloc_minimal to memtx_min_tuple_size,
    • slab_alloc_maximal to memtx_max_tuple_size,
    • replication_source to replication,
    • snap_dir to memtx_dir,
    • logger to log,
    • logger_nonblock to log_nonblock,
    • snapshot_count to checkpoint_count,
    • snapshot_period to checkpoint_interval,
    • panic_on_wal_error and panic_on_snap_error united under force_recovery.
  • Until Tarantool 1.8, you can use deprecated parameters for both initial and runtime configuration, but Tarantool will display a warning. Also, you can specify both deprecated and up-to-date parameters, provided that their values are harmonized. If not, Tarantool will display an error.

What’s new in Tarantool 1.6.9 after February 15, 2017?

Due to Tarantool issue#2040 Remove sophia engine from 1.6 there no longer is a storage engine named sophia. It will be superseded in version 1.7 by the vinyl storage engine.

What’s new in Tarantool 1.6?

Tarantool 1.6 is no longer getting major new features, although it will be maintained. The developers are concentrating on Tarantool version 1.7.

Общие сведения

Сервер приложений + СУБД

Tarantool is a Lua application server integrated with a database management system. It has a «fiber» model which means that many Tarantool applications can run simultaneously on a single thread, while each instance of the Tarantool server itself can run multiple threads for input-output and background maintenance. It incorporates the LuaJIT – «Just In Time» – Lua compiler, Lua libraries for most common applications, and the Tarantool Database Server which is an established NoSQL DBMS. Thus Tarantool serves all the purposes that have made node.js and Twisted popular, plus it supports data persistence.

Tarantool — это open-source проект. Исходный код открыт для всех и распространяется бесплатно согласно лицензии BSD license. Поддерживаемые платформы: GNU / Linux, Mac OS и FreeBSD.

Создателем Tarantool’а — а также его основным пользователем — является компания Mail.Ru, крупнейшая Интернет-компания России (30 млн пользователей, 25 млн электронных писем в день, веб-сайт в списке top 40 международного Alexa-рейтинга). Tarantool используется для обработки самых «горячих» данных Mail.Ru, таких как данные пользовательских онлайн-сессий, настройки онлайн-приложений, кеширование сервисных данных, алгоритмы распределения данных и шардинга, и т.д. Tarantool также используется во всё большем количестве проектов вне стен Mail.Ru. Это, к примеру, онлайн-игры, цифровой маркетинг, социальные сети. Несмотря на то что Mail.Ru спонсирует разработку Tarantool’а, весь процесс разработки, в т.ч. дальнейшие планы и база обнаруженных ошибок, является полностью открытым. В Tarantool включены патчи от большого числа сторонних разработчиков. Усилиями сообщества разработчиков Tarantool’а были написаны (и далее поддерживаются) библиотеки для подключения модулей на внешних языках программирования. А сообщество Lua-разработчиков предоставило сотни полезных пакетов, большинство из которых можно использовать в качестве расширений для Tarantool’а.

Пользователи Tarantool’а могут создавать, изменять и удалять Lua-функции прямо во время исполнения кода. Также они могут указывать Lua-программы, которые будут загружаться во время запуска Tarantool’а. Такие программы могут служить триггерами, выполнять фоновые задачи и взаимодействовать с другими программами по сети. В отличие от многих популярных сред разработки приложений, которые используют «реактивный» принцип, сетевое взаимодействие в Lua устроено последовательно, но очень эффективно, т.к. оно использует среду взаимной многозадачности самого Tarantool’а.

Один из встраиваемых Lua-пакетов — это API для функционала СУБД. Таким образом, некоторые разработчики рассматривают Tarantool как СУБД с популярным языком для написания хранимых процедур, другие рассматривают его как Lua-интерпретатор, а третьи – как вариант замены сразу нескольких компонентов в многозвенных веб-приложениях. Производительность Tarantool’а может достигать сотен тысяч транзакций в секунду на ноутбуке, и ее можно наращивать «вверх» или «вширь» за счет новых серверных ферм.

Возможности СУБД

Компонент «box» — серверная часть с функционалом СУБД — это важная часть Tarantool’а, хотя он может работать и без данного компонента.

API для функционала СУБД позволяет хранить Lua-объекты, управлять коллекциями объектов, создавать и удалять вторичные ключи, делать атомарные изменения, конфигурировать и мониторить репликацию, производить контролируемое переключение при отказе (failover), а также исполнять код на Lua, который вызывается событиями в базе. А для прозрачного доступа к удаленным (remote) экземплярам баз данных разработан API для вызова удаленных процедур.

В архитектуре серверной части СУБД Tarantool’а реализована концепция «движков» базы данных (storage engines), где в разных ситуациях используются разные наборы алгоритмов и структуры данных. В Tarantool’е есть два встроенных движка: in-memory движок, который держит все данные и индексы в оперативной памяти, и двухуровневый движок для B-деревьев, который обрабатывает данные размером в 10-1000 раз больше того, что может поместиться в оперативной памяти. Все движки в Tarantool’е поддерживают транзакции и репликацию, поскольку они используют единый механизм упреждающей записи (WAL = write ahead log). Это механизм обеспечивает согласованность и сохранность данных при сбоях. Таким образом, изменения не считаются завершенными, пока не проходит запись в лог WAL. Подсистема логирования также поддерживает групповые коммиты.

Tarantool’s in-memory storage engine (memtx) keeps all the data in random-access memory, and therefore has very low read latency. It also keeps persistent copies of the data in non-volatile storage, such as disk, when users request «snapshots». If an instance of the server stops and the random-access memory is lost, then restarts, it reads the latest snapshot and then replays the transactions that are in the log – therefore no data is lost.

Tarantool’s in-memory engine is lock-free in typical situations. Instead of the operating system’s concurrency primitives, such as mutexes, Tarantool uses cooperative multitasking to handle thousands of connections simultaneously. There is a fixed number of independent execution threads. The threads do not share state. Instead they exchange data using low-overhead message queues. While this approach limits the number of cores that the instance will use, it removes competition for the memory bus and ensures peak scalability of memory access and network throughput. CPU utilization of a typical highly-loaded Tarantool instance is under 10%. Searches are possible via secondary index keys as well as primary keys.

Tarantool’s disk-based storage engine is a fusion of ideas from modern filesystems, log-structured merge trees and classical B-trees. All data is organized into ranges. Each range is represented by a file on disk. Range size is a configuration option and normally is around 64MB. Each range is a collection of pages, serving different purposes. Pages in a fully merged range contain non-overlapping ranges of keys. A range can be partially merged if there were a lot of changes in its key range recently. In that case some pages represent new keys and values in the range. The disk-based storage engine is append only: new data never overwrites old data. The disk-based storage engine is named vinyl.

Tarantool поддерживает работу с составными ключами в индексах. Возможные типы ключей: HASH, TREE, BITSET и RTREE.

Tarantool также поддерживает асинхронную репликацию — как локальную, так и на удаленных серверах. При этом репликацию можно настроить по принципу мастер-мастер, когда несколько узлов могут не только обрабатывать входящую нагрузку, но и получать данные от других узлов.

Руководство пользователя

Предисловие

Добро пожаловать в мир Tarantool! Сейчас вы читаете «Руководство пользователя». Мы советуем начинать именно с него, а затем переходить к «Справочникам», если вам понадобятся более подробные сведения.

Как пользоваться документацией

To get started, you can install and launch Tarantool using a Docker container, a binary package, or the online Tarantool server at http://try.tarantool.org. Either way, as the first tryout, you can follow the introductory exercises from Chapter 2 «Getting started». If you want more hands-on experience, proceed to Tutorials after you are through with Chapter 2.

В главе 3 «Функционал СУБД» рассказано о возможностях Tarantool’а как NoSQL СУБД, а в главе 4 «Сервер приложений» — о возможностях Tarantool’а как сервера приложений Lua.

Chapter 5 «Server administration» and Chapter 6 «Replication» are primarily for administrators.

Chapter 7 «Connectors» is strictly for users who are connecting from a different language such as C or Perl or Python — other users will find no immediate need for this chapter.

Chapter 8 «FAQ» gives answers to some frequently asked questions about Tarantool.

Опытным же пользователям будут полезны «Справочники», «Руководство участника проекта» и комментарии в исходном коде.

Как связаться с сообществом разработчиков Tarantool’а

Оставить сообщение о найденых дефектах или сделать запрос на новый функционал можно тут: http://github.com/tarantool/tarantool/issues

Пообщаться напрямую с командой разработки Tarantool’а можно в telegram или на форумах (англоязычном или русскоязычном).

Conventions used in this manual

Square brackets [ and ] enclose optional syntax.

Two dots in a row .. mean the preceding tokens may be repeated.

A vertical bar | means the preceding and following tokens are mutually exclusive alternatives.

Начало работы

В этой главе объясняется, как установить и запустить Tarantool, а также как создать простую базу данных.

Эта глава состоит из следующих разделов:

Использование Docker-образа

Для практики и в тестовых целях мы рекомендуем использовать официальные образы Tarantool’а для Docker. Официальный образ содержит определенную версию Tarantool’а (1.6 или 1.7) и все популярные внешние модули для Tarantool’а. Все необходимое уже установлено и настроено на платформе Linux. Данные образы - это самый простой способ установить и запустить Tarantool.

Примечание

Если вы никогда раньше не работали с Docker, рекомендуем сперва прочитать эту обучающую статью.

Запуск контейнера

Если Docker не установлен на вашей машине, следуйте официальным инструкциям по установке для вашей ОС.

Для использования полнофункционального экземпляра Tarantool’а запустите контейнер с минимальными настройками:

$ docker run \
  --name mytarantool \
  -d -p 3301:3301 \
  -v /data/dir/on/host:/var/lib/tarantool \
  tarantool/tarantool:1.7

Эта команда запускает новый контейнер с именем „mytarantool“. Docker запускает его из официального образа „tarantool/tarantool:1.7“ с предустановленным Tarantool’ом 1.7 и всеми внешними модулями.

Tarantool будет принимать входящие подключения по адресу localhost:3301. Можно сразу начать его использовать как key-value хранилище.

Tarantool сохраняет данные внутри контейнера. Чтобы ваше тестовые данные остались доступны после остановки контейнера, эта команда также монтирует директорию /data/dir/on/host (здесь необходимо указать абсолютный путь до существующей локальной директории), расположенную на машине, в директорию /var/lib/tarantool (Tarantool традиционно использует эту директорию в контейнере для сохранения данных), расположенную в контейнере. Таким образом все изменения в смонтированной директории, внесенные на стороне контейнера, также отражаются в расположенной на пользовательском диске директории.

Модуль Tarantool’а для работы с базой данных уже настроен и запущен в контейнере. Ручная настройка не требуется, если только вы не используете Tarantool как сервер приложений и не запускаете его вместе с приложением.

Подключение к экземпляру Tarantool’а

Для подключения к запущенному в контейнере экземпляру Tarantool’а, выполните эту команду:

$ docker exec -i -t mytarantool console

Эта команда:

  • Требует от Tarantool’а открыть порт с интерактивной консолью для входящих подключений.
  • Подключается через стандартный Unix-сокет к Tarantool-серверу, запущенному внутри контейнера, из-под пользователя „admin’.

Tarantool показывает приглашение командной строки:

tarantool.sock>

Теперь вы можете вводить запросы в командной строке.

Примечание

На боевых серверах интерактивный режим Tarantool’а предназначен только для системных администраторов. Мы же используем его в большинстве примеров в данном руководстве, потому что интерактивный режим хорошо подходит для обучения.

Создание базы данных

Подключившись к консоли, давайте создадим простую тестовую базу данных.

Сначала создайте первое пространство (с именем „tester“) и первый индекс (с именем „primary’):

tarantool.sock> s = box.schema.space.create('tester')
tarantool.sock> s:create_index('primary', {
              >  type = 'hash',
              >  parts = {1, 'unsigned'}
              > })

Затем вставьте в созданное пространство три кортежа (наш термин для «записей»):

tarantool.sock> t = s:insert({1, 'Roxette'})
tarantool.sock> t = s:insert({2, 'Scorpions', 2015})
tarantool.sock> t = s:insert({3, 'Ace of Base', 1993})

После этого произведите выборку кортежа из первого пространства в базе данных по первому заданному ключу:

tarantool.sock> s:select{3}

Теперь вывод в окне терминала выглядит следующим образом:

tarantool.sock> s = box.schema.space.create('tester')
2017-01-17 12:04:18.158 ... creating './00000000000000000000.xlog.inprogress'
---
...
tarantool.sock> s:create_index('primary', {type = 'hash', parts = {1, 'unsigned'}})
---
...
tarantool.sock> t = s:insert{1, 'Roxette'}
---
...
tarantool.sock> t = s:insert{2, 'Scorpions', 2015}
---
...
tarantool.sock> t = s:insert{3, 'Ace of Base', 1993}
---
...
tarantool.sock> s:select{3}
---
- - [3, 'Ace of Base', 1993]
...
tarantool.sock>

Для добавления другого индекса по второму полю используйте эту команду:

tarantool.sock> s:create_index('secondary', {
              >  type = 'hash',
              >  parts = {2, 'string'}
              > })

Остановка контейнера

После завершения тестирования для корректной остановки контейнера выполните эту команду:

$ docker stop mytarantool

Это был временный контейнер, поэтому после остановки содержимое его диска/памяти обнулилось. Но так как вы монтировали локальную директорию в контейнер, все данные Tarantool’а сохранились на диске вашей машины. Если вы запустите новый контейнер и смонтируете в него ту же директорию с данными, Tarantool восстановит все данные с диска и продолжит с ними работать.

Использование бинарного пакета

Для промышленной разработки мы рекомендуем использовать официальные бинарные пакеты. Можно выбрать одну из двух версий Tarantool’а: 1.7 (стабильная) и 1.8 (альфа). Автоматическая система сборки создает, тестирует и публикует пакеты после каждого коммита в соответствующую ветку (1.7 или 1.8) репозитория Tarantool’а на GitHub.

Чтобы скачать и установить бинарный пакет для вашей операционной системы, откройте терминал с командной строкой и введите инструкции, которые даны для вашей операционной системы на странице для скачивания.

Запуск экземпляра Tarantool’а

Для запуска экземпляра Tarantool’а выполните эту команду:

$ # если вы скачали бинарный пакет с помощью apt-get или yum, введите:
$ /usr/bin/tarantool
$ # если вы скачали бинарный пакет в формате TAR
$ # и разархивировали его в директорию ~/tarantool, введите:
$ ~/tarantool/bin/tarantool

Tarantool запускается в интерактивном режиме и показывает приглашение командной строки:

tarantool>

Теперь вы можете вводить запросы в командной строке.

Примечание

На боевых серверах интерактивный режим Tarantool’а предназначен только для системных администраторов. Мы же используем его в большинстве примеров в данном руководстве, потому что интерактивный режим хорошо подходит для обучения.

Создание базы данных

Далее объясняется, как создать простую тестовую базу данных после установки Tarantool’а.

Создайте новую директорию (она понадобится только для тестовых целей, и ее можно будет удалить по окончании экспериментов):

$ mkdir ~/tarantool_sandbox
$ cd ~/tarantool_sandbox

Чтобы запустить модуль Tarantool’а для работы с базой данных и сделать так, чтобы запущенный экземпляр принимал TCP-запросы на порту 3301, выполните эту команду:

tarantool> box.cfg{listen = 3301}

Сначала создайте первое пространство (с именем „tester“) и первый индекс (с именем „primary’):

tarantool> s = box.schema.space.create('tester')
tarantool> s:create_index('primary', {
              >  type = 'hash',
              >  parts = {1, 'unsigned'}
              > })

Затем вставьте в созданное пространство три кортежа (наш термин для «записей»):

tarantool> t = s:insert({1, 'Roxette'})
tarantool> t = s:insert({2, 'Scorpions', 2015})
tarantool> t = s:insert({3, 'Ace of Base', 1993})

После этого произведите выборку кортежа из первого пространства в базе данных по первому заданному ключу:

tarantool> s:select{3}

Теперь вывод в окне терминала выглядит следующим образом:

tarantool> s = box.schema.space.create('tester')
2017-01-17 12:04:18.158 ... creating './00000000000000000000.xlog.inprogress'
---
...
tarantool>s:create_index('primary', {type = 'hash', parts = {1, 'unsigned'}})
---
...
tarantool> t = s:insert{1, 'Roxette'}
---
...
tarantool> t = s:insert{2, 'Scorpions', 2015}
---
...
tarantool> t = s:insert{3, 'Ace of Base', 1993}
---
...
tarantool> s:select{3}
---
- - [3, 'Ace of Base', 1993]
...
tarantool>

Для добавления другого индекса по второму полю используйте эту команду:

tarantool> s:create_index('secondary', {
         >  type = 'hash',
         >  parts = {2, 'string'}
         > })

Установка удаленного подключения

В запросе box.cfg{listen = 3301}, который мы отправили ранее, параметр listen может принимать в качестве значения URI (универсальный идентификатор ресурса) любой формы. В нашем случае это просто локальный порт 3301. Вы можете отправлять запросы на указанный URI, используя:

  1. telnet,
  2. коннектор,
  3. другой экземпляр Tarantool’а (с помощью модуля console), либо
  4. утилиту tarantoolctl.

Давайте попробуем вариант с tarantoolctl.

Переключитесь на другой терминал. Например, в Linux-системе для этого нужно запустить еще один экземпляр Bash. В новом терминале можно сменить текущую рабочую директорию на любую другую, необязательно использовать ~/tarantool_sandbox.

Запустите утилиту tarantoolctl:

$ tarantoolctl connect '3301'

Данная команда означает «использовать утилиту tarantoolctl для подключения к Tarantool-серверу, который слушает по адресу localhost:3301».

Введите следующий запрос:

tarantool> box.space.tester:select{2}

Это означает «послать запрос тому Tarantool-серверу и вывести результат на экран». Результатом в данном случае будет один из кортежей, что вы вставляли ранее. В окне терминала теперь должно отображаться примерно следующее:

$ tarantoolctl connect 3301
/usr/local/bin/tarantoolctl: connected to localhost:3301
localhost:3301> box.space.tester:select{2}
---
- - [2, 'Scorpions', 2015]
...

localhost:3301>

Вы можете посылать запросы box.space...:insert{} и box.space...:select{} неограниченное количество раз на любом из двух запущенных экземпляров Tarantool’а.

Закончив тестирование, выполните следующие шаги:

  • Для удаления пространства: s:drop()
  • Для остановки tarantoolctl: ctrl+C или ctrl+D
  • Для остановки Tarantool’а (альтернативный вариант): стандартная Lua-функция os.exit()
  • Для остановки Tarantool’а (из другого терминала): sudo pkill -f tarantool
  • Для удаления директории-песочницы: rm -r ~/tarantool_sandbox

Функционал СУБД

In this chapter, we introduce the basic concepts of working with Tarantool as a database manager.

Эта глава состоит из следующих разделов:

Модель данных

В этом разделе описывается то, как в Tarantool’е организовано хранение данных и какие операции с данным он поддерживает.

If you tried to create a database as suggested in our «Getting started» exercises, then your test database now looks like this:

../../../../_images/data_model.svg

Пространство

A space – „tester“ in our example – is a container.

When Tarantool is being used to store data, there is always at least one space. Each space has a unique name specified by the user. Besides, each space has a unique numeric identifier which can be specified by the user, but usually is assigned automatically by Tarantool. Finally, a space always has an engine: memtx (default) – in-memory engine, fast but limited in size, or vinyl – on-disk engine for huge data sets.

A space is a container for tuples. To be functional, it needs to have a primary index. It can also have secondary indexes.

Tuple

A tuple plays the same role as a “row” or a “record”, and the components of a tuple (which we call “fields”) play the same role as a “row column” or “record field”, except that:

  • fields can be composite structures, such as arrays or maps, and
  • fields don’t need to have names.

Any given tuple may have any number of fields, and the fields may be of different types. The identifier of a field is the field’s number, base 1 (in Lua and other 1-based languages) or base 0 (in PHP or C/C++). For example, “1” or «0» can be used in some contexts to refer to the first field of a tuple.

Tuples in Tarantool are stored as MsgPack arrays.

When Tarantool returns a tuple value in console, it uses the YAML format, for example: [3, 'Ace of Base', 1993].

Индекс

An index is a group of key values and pointers.

As with spaces, you should specify the index name, and let Tarantool come up with a unique numeric identifier («index id»).

An index always has a type. The default index type is „TREE“. TREE indexes are provided by all Tarantool engines, can index unique and non-unique values, support partial key searches, comparisons and ordered results. Additionally, memtx engine supports HASH, RTREE and BITSET indexes.

An index may be multi-part, that is, you can declare that an index key value is composed of two or more fields in the tuple, in any order. For example, for an ordinary TREE index, the maximum number of parts is 255.

An index may be unique, that is, you can declare that it would be illegal to have the same key value twice.

The first index defined on a space is called the primary key index, and it must be unique. All other indexes are called secondary indexes, and they may be non-unique.

An index definition may include identifiers of tuple fields and their expected types (see allowed indexed field types below).

In our example, we first defined the primary index (named „primary“) based on field #1 of each tuple:

tarantool> i = s:create_index('primary', {type = 'hash', parts = {1, 'unsigned'}})

The effect is that, for all tuples in space „tester“, field #1 must exist and must contain an unsigned integer. The index type is „hash“, so values in field #1 must be unique, because keys in HASH indexes are unique.

After that, we defined a secondary index (named „secondary“) based on field #2 of each tuple:

tarantool> i = s:create_index('secondary', {type = 'tree', parts = {2, 'string'}})

The effect is that, for all tuples in space „tester“, field #2 must exist and must contain a string. The index type is „tree“, so values in field #2 must not be unique, because keys in TREE indexes may be non-unique.

Примечание

Space definitions and index definitions are stored permanently in Tarantool’s system spaces _space and _index (for details, see reference on box.space submodule).

You can add, drop, or alter the definitions at runtime, with some restrictions. See syntax details in reference on box module.

Типы данных

Tarantool is both a database and an application server. Hence a developer often deals with two type sets: the programming language types (e.g. Lua) and the types of the Tarantool storage format (MsgPack).

Lua vs MsgPack
Scalar / compound MsgPack   type Lua type Example value
scalar nil «nil» msgpack.NULL
scalar boolean «boolean» true
scalar string «string» „A B C“
scalar integer «number» 12345
scalar double «number» 1.2345
compound map «table» (with string keys) {„a“: 5, „b“: 6}
compound array «table» (with integer keys) [1, 2, 3, 4, 5]
compound array tuple («cdata») [12345, „A B C“]

Тип nil (нулевой) может иметь только одно значение, также называемое nil, но часто отображаемое как null. Нулевое значение можно сравнивать со значениями любых типов с помощью операторов == (равен) или ~= (не равен), но никакие другие операции для нулевых значений не доступны. Нулевые значения также нельзя использовать в Lua-таблицах; вместо нулевого значения в таком случае можно указать yaml.NULL, либо json.NULL, либо msgpack.NULL

A boolean is either true or false.

A string is a variable-length sequence of bytes, usually represented with alphanumeric characters inside single quotes. In both Lua and MsgPack, strings are treated as binary data, with no attempts to determine a string’s character set or to perform any string conversion. So, string sorting and comparison are done byte-by-byte, without any special collation rules applied. (Example: numbers are ordered by their point on the number line, so 2345 is greater than 500; meanwhile, strings are ordered by the encoding of the first byte, then the encoding of the second byte, and so on, so „2345“ is less than „500“.)

In Lua, a number is double-precision floating-point, but Tarantool allows both integer and floating-point values. Tarantool will try to store a Lua number as floating-point if the value contains a decimal point or is very large (greater than 100 trillion = 1e14), otherwise Tarantool will store it as an integer. To ensure that even very large numbers are stored as integers, use the tonumber64 function, or the LL (Long Long) suffix, or the ULL (Unsigned Long Long) suffix. Here are examples of numbers using regular notation, exponential notation, the ULL suffix and the tonumber64 function: -55, -2.7e+20, 100000000000000ULL, tonumber64('18446744073709551615').

Lua tables with string keys are stored as MsgPack maps; Lua tables with integer keys starting with 1 – as MsgPack arrays. Nils may not be used in Lua tables; the workaround is to use msgpack.NULL

A tuple is a light reference to a MsgPack array stored in the database. It is a special type (cdata) to avoid conversion to a Lua table on retrieval. A few functions may return tables with multiple tuples. For more tuple examples, see box.tuple.

Примечание

Tarantool uses the MsgPack format for database storage, which is variable-length. So, for example, the smallest number requires only one byte, but the largest number requires nine bytes.

Examples of insert requests with different data types:

tarantool> box.space.K:insert{1,nil,true,'A B C',12345,1.2345}
---
- [1, null, true, 'A B C', 12345, 1.2345]
...
tarantool> box.space.K:insert{2,{['a']=5,['b']=6}}
---
- [2, {'a': 5, 'b': 6}]
...
tarantool> box.space.K:insert{3,{1,2,3,4,5}}
---
- [3, [1, 2, 3, 4, 5]]
...
Indexed field types

Indexes restrict values which Tarantool’s MsgPack may contain. This is why, for example, „unsigned“ is a separate indexed field type, compared to ‘integer’ data type in MsgPack: they both store ‘integer’ values, but an „unsigned“ index contains only non-negative integer values and an ‘integer’ index contains all integer values.

Here’s how Tarantool indexed field types correspond to MsgPack data types.

Indexed field type MsgPack data type
(and possible values)
Тип индекса Примеры
unsigned (may also be called ‘uint’ or ‘num’, but ‘num’ is deprecated) integer (integer between 0 and 18446744073709551615, i.e. about 18 quintillion) TREE, BITSET or HASH 123456
integer (may also be called ‘int’) integer (integer between -9223372036854775808 and 18446744073709551615) TREE or HASH -2^63
number

integer (integer between -9223372036854775808 and 18446744073709551615)

double (single-precision floating point number or double-precision floating point number)

TREE or HASH

1.234

-44

1.447e+44

string (may also be called ‘str’) string (any set of octets, up to the maximum length) TREE, BITSET or HASH

‘A B C’

‘65 66 67’

boolean bool (true or false) TREE or HASH true
array array (list of numbers representing points in a geometric figure) RTREE

{10, 11}

{3, 5, 9, 10}

scalar

bool (true or false)

integer (integer between -9223372036854775808 and 18446744073709551615)

double (single-precision floating point number or double-precision floating point number)

string (any set of octets)

Note: When there is a mix of types, the key order is: booleans, then numbers, then strings.

TREE or HASH

true

-1

1.234

‘’

‘ру’

Sequences

A sequence is a generator of ordered integer values.

As with spaces and indexes, you should specify the sequence name, and let Tarantool come up with a unique numeric identifier («sequence id»).

As well, you can specify several options when creating a new sequence. The options determine what value will be generated whenever the sequence is used.

Options for box.schema.sequence.create()
Option name Type and meaning Default Примеры
start Integer. The value to generate the first time a sequence is used 1 start=0
min Integer. Values smaller than this cannot be generated 1 min=-1000
max Integer. Values larger than this cannot be generated 9223372036854775807 max=0
cycle Boolean. Whether to start again when values cannot be generated false cycle=true
cache Integer. The number of values to store in a cache 0 cache=0
step Integer. What to add to the previous generated value, when generating a new value 1 step=-1

Once a sequence exists, it can be altered, dropped, reset, forced to generate the next value, or associated with an index.

For an initial example, we generate a sequence named „S“.

tarantool> box.schema.sequence.create('S',{min=5, start=5})
---
- step: 1
  id: 5
  min: 5
  cache: 0
  uid: 1
  max: 9223372036854775807
  cycle: false
  name: S
  start: 5
...

The result shows that the new sequence has all default values, except for the two that were specified, min and start.

Then we get the next value, with the next() function.

tarantool> box.sequence.S:next()
---
- 5
...

The result is the same as the start value. If we called next() again, we would get 6 (because the previous value plus the step value is 6), and so on.

Then we create a new table, and say that its primary key may be generated from the sequence.

tarantool> s=box.schema.space.create('T');s:create_index('I',{sequence='S'})
---
...

Then we insert a tuple, without specifying a value for the primary key.

tarantool> box.space.T:insert{nil,'other stuff'}
---
- [6, 'other stuff']
...

The result is a new tuple where the first field has a value of 6. This arrangement, where the system automatically generates the values for a primary key, is sometimes called «auto-incrementing» or «identity».

For syntax and implementation details, see the reference for box.schema.sequence.

Persistence

In Tarantool, updates to the database are recorded in the so-called write ahead log (WAL) files. This ensures data persistence. When a power outage occurs or the Tarantool instance is killed incidentally, the in-memory database is lost. In this situation, WAL files are used to restore the data. Namely, Tarantool reads the WAL files and redoes the requests (this is called the «recovery process»). You can change the timing of the WAL writer, or turn it off, by setting wal_mode.

Tarantool также сохраняет ряд файлов со статическими снимками данных (snapshots). Файл со снимком — это дисковая копия всех данных в базе на какой-то момент. Вместо того, чтобы зачитывать все WAL-файлы, появившиеся с момента создания базы, Tarantool в процессе восстановления может загрузить самый свежий снимок и затем зачитать только те WAL-файлы, которые были сделаны с момента сохранения снимка. После создания новых файлов, старые WAL-файлы могут быть удалены в целях экономии места на диске.

To force immediate creation of a snapshot file, you can use Tarantool’s box.snapshot() request. To enable automatic creation of snapshot files, you can use Tarantool’s checkpoint daemon. The checkpoint daemon sets intervals for forced checkpoints. It makes sure that the states of both memtx and vinyl storage engines are synchronized and saved to disk, and automatically removes old WAL files.

Snapshot files can be created even if there is no WAL file.

Примечание

The memtx engine makes only regular checkpoints with the interval set in checkpoint daemon configuration.

The vinyl engine runs checkpointing in the background at all times.

See the Internals section for more details about the WAL writer and the recovery process.

Операции

Data operations

The basic data operations supported in Tarantool are:

  • one data-retrieval operation (SELECT), and
  • five data-manipulation operations (INSERT, UPDATE, UPSERT, DELETE, REPLACE).

All of them are implemented as functions in box.space submodule.

Examples

  • INSERT: Add a new tuple to space „tester“.

    The first field, field[1], will be 999 (MsgPack type is integer).

    The second field, field[2], will be „Taranto“ (MsgPack type is string).

    tarantool> box.space.tester:insert{999, 'Taranto'}
    
  • UPDATE: Update the tuple, changing field field[2].

    The clause «{999}», which has the value to look up in the index of the tuple’s primary-key field, is mandatory, because update() requests must always have a clause that specifies a unique key, which in this case is field[1].

    The clause «{{„=“, 2, „Tarantino“}}» specifies that assignment will happen to field[2] with the new value.

    tarantool> box.space.tester:update({999}, {{'=', 2, 'Tarantino'}})
    
  • UPSERT: Upsert the tuple, changing field field[2] again.

    The syntax of upsert() is similar to the syntax of update(). However, the execution logic of these two requests is different. UPSERT is either UPDATE or INSERT, depending on the database’s state. Also, UPSERT execution is postponed after transaction commit, so, unlike update(), upsert() doesn’t return data back.

    tarantool> box.space.tester:upsert({999}, {{'=', 2, 'Tarantism'}})
    
  • REPLACE: Replace the tuple, adding a new field.

    This is also possible with the update() request, but the update() request is usually more complicated.

    tarantool> box.space.tester:replace{999, 'Tarantella', 'Tarantula'}
    
  • SELECT: Retrieve the tuple.

    The clause «{999}» is still mandatory, although it does not have to mention the primary key.

    tarantool> box.space.tester:select{999}
    
  • DELETE: Delete the tuple.

    In this example, we identify the primary-key field.

    tarantool> box.space.tester:delete{999}
    

All the functions operate on tuples and accept only unique key values. So, the number of tuples in the space is always 0 or 1, since the keys are unique.

Functions insert(), upsert() and replace() accept only primary-key values. Functions select(), delete() and update() may accept either a primary-key value or a secondary-key value.

Примечание

Besides Lua, you can use Perl, PHP, Python or other programming language connectors. The client server protocol is open and documented. See this annotated BNF.

Операции с индексами

Index operations are automatic: if a data-manipulation request changes a tuple, then it also changes the index keys defined for the tuple.

The simple index-creation operation that we’ve illustrated before is:

:samp:`box.space.{имя-пространства}:create_index('{имя-индекса}')`

This creates a unique TREE index on the first field of all tuples (often called «Field#1»), which is assumed to be numeric.

The simple SELECT request that we’ve illustrated before is:

:extsamp:`box.space.{*{имя-пространства}*}:select({*{значение}*})`

This looks for a single tuple via the first index. Since the first index is always unique, the maximum number of returned tuples will be: one.

The following SELECT variations exist:

  1. Помимо условия равенства, при поиске могут использоваться и другие условия сравнения.

    box.space.space-name:select(value, {iterator = 'GT'})
    

    The comparison operators are LT, LE, EQ, REQ, GE, GT (for «less than», «less than or equal», «equal», «reversed equal», «greater than or equal», «greater than» respectively). Comparisons make sense if and only if the index type is ‘TREE“.

    Этот вариант поиска может вернуть более одного кортежа. В таком случае кортежи будут отсортированы в порядке убывания по ключу (если использовался оператор LT, LE или REQ), либо в порядке возрастания (во всех остальных случаях).

  2. Поиск может производиться по вторичному индексу.

    box.space.space-name.index.index-name:select(value)
    

    При поиске по первичному индексу имя индекса можно не указывать. При поиске же по вторичному индексу имя индекса указывать необходимо.

  3. Поиск может производиться как по всему ключу, так и по его частям.

    -- Suppose an index has two parts
    tarantool> box.space.space-name.index.index-name.parts
    ---
    - - type: unsigned
        fieldno: 1
      - type: string
        fieldno: 2
    ...
    -- Suppose the space has three tuples
    box.space.space-name:select()
    ---
    - - [1, 'A']
      - [1, 'B']
      - [2, '']
    ...
    
  4. The search may be for all fields, using a table for the value:

    box.space.space-name:select({1, 'A'})
    

    либо же по одному полю (в этом случае используется таблица или скалярное значение):

    box.space.space-name:select(1)
    

    In the second case, the result will be two tuples: {1, 'A'} and {1, 'B'}.

    You can specify even zero fields, causing all three tuples to be returned. (Notice that partial key searches are available only in TREE indexes.)

Examples

  • Пример работы с BITSET-индексом:

    tarantool> box.schema.space.create('bitset_example')
    tarantool> box.space.bitset_example:create_index('primary')
    tarantool> box.space.bitset_example:create_index('bitset',{unique=false,type='BITSET', parts={2,'unsigned'}})
    tarantool> box.space.bitset_example:insert{1,1}
    tarantool> box.space.bitset_example:insert{2,4}
    tarantool> box.space.bitset_example:insert{3,7}
    tarantool> box.space.bitset_example:insert{4,3}
    tarantool> box.space.bitset_example.index.bitset:select(2, {iterator='BITS_ANY_SET'})
    

    Мы получим следующий результат:

    ---
    - - [3, 7]
      - [4, 3]
    ...
    

    поскольку (7 AND 2) не равно 0 и (3 AND 2) не равно 0.

  • Пример работы с RTREE-индексом:

    tarantool> box.schema.space.create('rtree_example')
    tarantool> box.space.rtree_example:create_index('primary')
    tarantool> box.space.rtree_example:create_index('rtree',{unique=false,type='RTREE', parts={2,'ARRAY'}})
    tarantool> box.space.rtree_example:insert{1, {3, 5, 9, 10}}
    tarantool> box.space.rtree_example:insert{2, {10, 11}}
    tarantool> box.space.rtree_example.index.rtree:select({4, 7, 5, 9}, {iterator = 'GT'})
    

    Мы получим следующий результат:

    ---
    - - [1, [3, 5, 9, 10]]
    ...
    

    because a rectangle whose corners are at coordinates 4,7,5,9 is entirely within a rectangle whose corners are at coordinates 3,5,9,10.

Additionally, there exist index iterator operations. They can only be used with code in Lua and C/C++. Index iterators are for traversing indexes one key at a time, taking advantage of features that are specific to an index type, for example evaluating Boolean expressions when traversing BITSET indexes, or going in descending order when traversing TREE indexes.

See also other index operations like alter() and drop() in reference for box.index submodule.

Complexity factors

In reference for box.space and box.index submodules, there are notes about which complexity factors might affect the resource usage of each function.

Complexity factor Эффект
Размер индекса The number of index keys is the same as the number of tuples in the data set. For a TREE index, if there are more keys, then the lookup time will be greater, although of course the effect is not linear. For a HASH index, if there are more keys, then there is more RAM used, but the number of low-level steps tends to remain constant.
Тип индекса Typically, a HASH index is faster than a TREE index if the number of tuples in the space is greater than one.
Количество обращений к индексам

Ordinarily, only one index is accessed to retrieve one tuple. But to update the tuple, there must be N accesses if the space has N different indexes.

Note re storage engine: Vinyl optimizes away such accesses if secondary index fields are unchanged by the update. So, this complexity factor applies only to memtx, since it always makes a full-tuple copy on every update.

Количество обращений к кортежам A few requests, for example SELECT, can retrieve multiple tuples. This factor is usually less important than the others.
Настройки WAL Важным параметром для записи в WAL является wal_mode. Если запись в WAL отключена или задана запись с задержкой, но этот фактор не так важен. Если же запись в WAL производится при каждом запросе на изменение данных, то при каждом таком запросе приходится ждать, пока отработает обращение к более медленному диску, и данный фактор становится важнее всех остальных.

Контроль транзакций

Transactions in Tarantool occur in fibers on a single thread. That is why Tarantool has a guarantee of execution atomicity. That requires emphasis.

Threads, fibers and yields

How does Tarantool process a basic operation? As an example, let’s take this query:

tarantool> box.space.tester:update({3}, {{'=', 2, 'size'}, {'=', 3, 0}})

This is equivalent to an SQL statement like:

UPDATE tester SET "field[2]" = 'size', "field[3]" = 0 WHERE "field[1]" = 3

This query will be processed with three operating system threads:

  1. If we issue the query on a remote client, then the network thread on the server side receives the query, parses the statement and changes it to a server executable message which has already been checked, and which the server instance can understand without parsing everything again.

  2. The network thread ships this message to the instance’s «transaction processor» thread using a lock-free message bus. Lua programs execute directly in the transaction processor thread, and do not require parsing and preparation.

    The instance’s transaction processor thread uses the primary-key index on field[1] to find the location of the tuple. It determines that the tuple can be updated (not much can go wrong when you’re merely changing an unindexed field value to something shorter).

  3. The transaction processor thread sends a message to the write-ahead logging (WAL) thread to commit the transaction. When done, the WAL thread replies with a COMMIT or ROLLBACK result, which is returned to the client.

Notice that there is only one transaction processor thread in Tarantool. Some people are used to the idea that there can be multiple threads operating on the database, with (say) thread #1 reading row #x, while thread #2 writes row #y. With Tarantool, no such thing ever happens. Only the transaction processor thread can access the database, and there is only one transaction processor thread for each Tarantool instance.

Like any other Tarantool thread, the transaction processor thread can handle many fibers. A fiber is a set of computer instructions that may contain «yield» signals. The transaction processor thread will execute all computer instructions until a yield, then switch to execute the instructions of a different fiber. Thus (say) the thread reads row #x for the sake of fiber #1, then writes row #y for the sake of fiber #2.

Yields must happen, otherwise the transaction processor thread would stick permanently on the same fiber. There are two types of yields:

  • implicit yields: every data-change operation or network-access causes an implicit yield, and every statement that goes through the Tarantool client causes an implicit yield.
  • explicit yields: in a Lua function, you can (and should) add «yield» statements to prevent hogging. This is called cooperative multitasking.

Cooperative multitasking

Cooperative multitasking means: unless a running fiber deliberately yields control, it is not preempted by some other fiber. But a running fiber will deliberately yield when it encounters a “yield point”: a transaction commit, an operating system call, or an explicit «yield» request. Any system call which can block will be performed asynchronously, and any running fiber which must wait for a system call will be preempted, so that another ready-to-run fiber takes its place and becomes the new running fiber.

This model makes all programmatic locks unnecessary: cooperative multitasking ensures that there will be no concurrency around a resource, no race conditions, and no memory consistency issues.

When requests are small, for example simple UPDATE or INSERT or DELETE or SELECT, fiber scheduling is fair: it takes only a little time to process the request, schedule a disk write, and yield to a fiber serving the next client.

However, a function might perform complex computations or might be written in such a way that yields do not occur for a long time. This can lead to unfair scheduling, when a single client throttles the rest of the system, or to apparent stalls in request processing. Avoiding this situation is the responsibility of the function’s author.

Transactions

In the absence of transactions, any function that contains yield points may see changes in the database state caused by fibers that preempt. Multi-statement transactions exist to provide isolation: each transaction sees a consistent database state and commits all its changes atomically. At commit time, a yield happens and all transaction changes are written to the write ahead log in a single batch.

To implement isolation, Tarantool uses a simple optimistic scheduler: the first transaction to commit wins. If a concurrent active transaction has read a value modified by a committed transaction, it is aborted.

The cooperative scheduler ensures that, in absence of yields, a multi-statement transaction is not preempted and hence is never aborted. Therefore, understanding yields is essential to writing abort-free code.

Примечание

You can’t mix storage engines in a transaction today.

Implicit yields

The only explicit yield requests in Tarantool are fiber.sleep() and fiber.yield(), but many other requests «imply» yields because Tarantool is designed to avoid blocking.

Database operations usually do not yield, but it depends on the engine:

  • In memtx, reads or writes do not require I/O and do not yield.
  • In vinyl, not all data is in memory, and SELECT often incurs a disc I/O, and therefore yields, while a write may stall waiting for memory to free up, thus also causing a yield.

In the «autocommit» mode, all data change operations are followed by an automatic commit, which yields. So does an explicit commit of a multi-statement transaction, box.commit().

Many functions in modules fio, net_box, console and socket (the «os» and «network» requests) yield.

Example #1

  • Engine = memtx
    select() insert() has one yield, at the end of insertion, caused by implicit commit; select() has nothing to write to the WAL and so does not yield.
  • Engine = vinyl
    select() insert() has between one and three yields, since select() may yield if the data is not in cache, insert() may yield waiting for available memory, and there is an implicit yield at commit.
  • The sequence begin() insert() insert() commit() yields only at commit if the engine is memtx, and can yield up to 3 times if the engine is vinyl.

Example #2

Assume that in space ‘tester’ there are tuples in which the third field represents a positive dollar amount. Let’s start a transaction, withdraw from tuple#1, deposit in tuple#2, and end the transaction, making its effects permanent.

tarantool> function txn_example(from, to, amount_of_money)
         >   box.begin()
         >   box.space.tester:update(from, {{'-', 3, amount_of_money}})
         >   box.space.tester:update(to,   {{'+', 3, amount_of_money}})
         >   box.commit()
         >   return "ok"
         > end
---
...
tarantool> txn_example({999}, {1000}, 1.00)
---
- "ok"
...

If wal_mode = ‘none’, then implicit yielding at commit time does not take place, because there are no writes to the WAL.

If a task is interactive – sending requests to the server and receiving responses – then it involves network IO, and therefore there is an implicit yield, even if the request that is sent to the server is not itself an implicit yield request. Therefore, the sequence:

select
select
select

causes blocking (in memtx), if it is inside a function or Lua program being executed on the server instance, but causes yielding (in both memtx and vinyl) if it is done as a series of transmissions from a client, including a client which operates via telnet, via one of the connectors, or via the MySQL and PostgreSQL rocks, or via the interactive mode when using Tarantool as a client.

After a fiber has yielded and then has regained control, it immediately issues testcancel.

Ограничение доступа

Understanding security details is primarily an issue for administrators. Meanwhile, ordinary users should at least skim this section to get an idea of how Tarantool makes it possible for administrators to prevent unauthorized access to the database and to certain functions.

In a nutshell:

  • There is a method to guarantee with password checks that users really are who they say they are (“authentication”).
  • There is a _user system space, where usernames and password-hashes are stored.
  • There are functions for saying that certain users are allowed to do certain things (“privileges”).
  • There is a _priv system space, where privileges are stored. Whenever a user tries to do an operation, there is a check whether the user has the privilege to do the operation (“access control”).

Further on, we explain all of this in more detail.

Users

There is a current user for any program working with Tarantool, local or remote. If a remote connection is using a binary port, the current user, by default, is „guest“. If the connection is using an admin-console port, the current user is „admin“. When executing a Lua initialization script, the current user is also ‘admin’.

The current user name can be found with box.session.user().

The current user can be changed:

  • For a binary port connection – with AUTH protocol command, supported by most clients;
  • For an admin-console connection and in a Lua initialization script – with box.session.su;
  • For a stored function invoked with CALL command over a binary port – with SETUID property enabled for the function, which makes Tarantool temporarily replace the current user with the function’s creator, with all creator’s privileges, during function execution.

Passwords

Each user (except „guest“) may have a password. The password is any alphanumeric string.

Tarantool passwords are stored in the _user system space with a cryptographic hash function so that, if the password is ‘x’, the stored hash-password is a long string like ‘lL3OvhkIPOKh+Vn9Avlkx69M/Ck=‘. When a client connects to a Tarantool instance, the instance sends a random salt value which the client must mix with the hashed-password before sending to the instance. Thus the original value ‘x’ is never stored anywhere except in the user’s head, and the hashed value is never passed down a network wire except when mixed with a random salt.

Примечание

For more details of the password hashing algorithm (e.g. for the purpose of writing a new client application), read the scramble.h header file.

This system prevents malicious onlookers from finding passwords by snooping in the log files or snooping on the wire. It is the same system that MySQL introduced several years ago, which has proved adequate for medium-security installations. Nevertheless, administrators should warn users that no system is foolproof against determined long-term attacks, so passwords should be guarded and changed occasionally. Administrators should also advise users to choose long unobvious passwords, but it is ultimately up to the users to choose or change their own passwords.

There are two functions for managing passwords in Tarantool: box.schema.user.password() for changing a user’s password and box.schema.user.passwd() for getting a hash-password.

Owners and privileges

In Tarantool, all objects are organized into a hierarchy of ownership. Ordinarily the owner of every object is its creator. The creator of the initial database state (we call it ‘universe’) – including the database itself, the system spaces, the users – is ‘admin’.

An object’s owner can share some rights on the object by granting privileges to other users. The following privileges are implemented:

  • Read an object,
  • Write, i.e. modify contents of an object,
  • Execute, i.e. use an object (if the privilege makes sense for the object; for example, spaces can not be «executed», but functions can).

Примечание

Currently, «drop» and «grant» privileges can not be granted to other users. This possibility will be added in future versions of Tarantool.

This is how the privilege system works under the hood. To be able to create objects, a user needs to have write access to Tarantool’s system spaces. The „admin“ user, who is at the top of the hierarchy and who is the ultimate source of privileges, shares write access to a system space (e.g. _space) with some users. Now the users can insert data into the system space (e.g. creating new spaces) and themselves become creators/definers of new objects. For the objects they created, the users can in turn share privileges with other users.

This is why only an object’s owner can drop the object, but other ordinary users cannot. Meanwhile, „admin“ can drop any object or delete any other user, because „admin“ is the creator and ultimate owner of them all.

The syntax of all grant()/revoke() commands in Tarantool follows this basic idea.

  • Their first argument is the user who gets the grant or whose grant is revoked.
  • Their second argument is the type of privilege granted, or a list of privileges.
  • Their third argument is the object type on which the privilege is granted, or the word „universe“. Possible object types are „space“, „function“, „user“, „role“, „sequence“.
  • Their fourth argument is the object name if the object type was specified („universe“ has no name because there is only one „universe“, but otherwise you must specify the name).

Example #1

Here we say that user „guest“ can do common operations on any object.

box.schema.user.grant('guest', 'read,write,execute', 'universe')

Example #2

Here we create a Lua function that will be executed under the user id of its creator, even if called by another user.

First, we create two spaces („u“ and „i“) and grant a no-password user („internal“) full access to them. Then we define a function („read_and_modify“) and the no-password user becomes this function’s creator. Finally, we grant another user („public_user“) access to execute Lua functions created by the no-password user.

box.schema.space.create('u')
box.schema.space.create('i')
box.space.u:create_index('pk')
box.space.i:create_index('pk')

box.schema.user.create('internal')

box.schema.user.grant('internal', 'read,write', 'space', 'u')
box.schema.user.grant('internal', 'read,write', 'space', 'i')
box.schema.user.grant('internal', 'read,write', 'space', '_func')

function read_and_modify(key)
  local u = box.space.u
  local i = box.space.i
  local fiber = require('fiber')
  local t = u:get{key}
  if t ~= nil then
           u:put{key, box.session.uid()}
           i:put{key, fiber.time()}
  end
end

box.session.su('internal')
box.schema.func.create('read_and_modify', {setuid= true})
box.session.su('admin')
box.schema.user.create('public_user', {password = 'secret'})
box.schema.user.grant('public_user', 'execute', 'function', 'read_and_modify')

Roles

A role is a container for privileges which can be granted to regular users. Instead of granting or revoking individual privileges, you can put all the privileges in a role and then grant or revoke the role.

Role information is stored in the _user space, but the third field in the tuple – the type field – is ‘role’ rather than ‘user’.

An important feature in role management is that roles can be nested. For example, role R1 can be granted a privilege «role R2», so users with the role R1 will subsequently get all privileges from both roles R1 and R2. In other words, a user gets all the privileges that are granted to a user’s roles, directly or indirectly.

Пример

-- This example will work for a user with many privileges, such as 'admin'
-- Create space T with a primary index
box.schema.space.create('T')
box.space.T:create_index('primary', {})
-- Create user U1 so that later we can change the current user to U1
box.schema.user.create('U1')
-- Create two roles, R1 and R2
box.schema.role.create('R1')
box.schema.role.create('R2')
-- Grant role R2 to role R1 and role R1 to user U1 (order doesn't matter)
box.schema.role.grant('R1', 'execute', 'role', 'R2')
box.schema.user.grant('U1', 'execute', 'role', 'R1')
-- Grant read/write privileges for space T to role R2
-- (but not to role R1 and not to user U1)
box.schema.role.grant('R2', 'read,write', 'space', 'T')
-- Change the current user to user U1
box.session.su('U1')
-- An insertion to space T will now succeed because, due to nested roles,
-- user U1 has write privilege on space T
box.space.T:insert{1}

For details about Tarantool functions related to role management, see reference on box.schema submodule.

Sessions and security

A session is the state of a connection to Tarantool. It contains:

  • an integer id identifying the connection,
  • the current user associated with the connection,
  • text description of the connected peer, and
  • session local state, such as Lua variables and functions.

In Tarantool, a single session can execute multiple concurrent transactions. Each transaction is identified by a unique integer id, which can be queried at start of the transaction using box.session.sync().

Примечание

To track all connects and disconnects, you can use connection and authentication triggers.

Триггеры

Triggers, also known as callbacks, are functions which the server executes when certain events happen.

There are three types of triggers in Tarantool:

All triggers have the following characteristics:

  • Triggers associate a function with an event. The request to «define a trigger» implies passing the trigger’s function to one of the «on_event()» functions: box.session.on_connect(), box.session.on_auth(), box.session.on_disconnect(), or space_object:on_replace().
  • Triggers are defined only by the „admin“ user.
  • Triggers are stored in the Tarantool instance’s memory, not in the database. Therefore triggers disappear when the instance is shut down. To make them permanent, put function definitions and trigger settings into Tarantool’s initialization script.
  • Triggers have low overhead. If a trigger is not defined, then the overhead is minimal: merely a pointer dereference and check. If a trigger is defined, then its overhead is equivalent to the overhead of calling a function.
  • There can be multiple triggers for one event. In this case, triggers are executed in the reverse order that they were defined in.
  • Triggers must work within the event context. However, effects are undefined if a function contains requests which normally could not occur immediately after the event, but only before the return from the event. For example, putting os.exit() or box.rollback() in a trigger function would be bringing in requests outside the event context.
  • Triggers are replaceable. The request to «redefine a trigger» implies passing a new trigger function and an old trigger function to one of the «on_event()» functions.
  • The «on_event()» functions all have parameters which are function pointers, and they all return function pointers. Remember that a Lua function definition such as «function f() x = x + 1 end» is the same as «f = function () x = x + 1 end» – in both cases f gets a function pointer. And «trigger = box.session.on_connect(f)» is the same as «trigger = box.session.on_connect(function () x = x + 1 end)» – in both cases trigger gets the function pointer which was passed.

To get a list of triggers, you can use:

  • on_connect() – with no arguments – to return a table of all connect-trigger functions;
  • on_auth() to return all authentication-trigger functions;
  • on_disconnect() to return all disconnect-trigger functions;
  • on_replace() to return all replace-trigger functions.

Пример

Here we log connect and disconnect events into Tarantool server log.

log = require('log')

function on_connect_impl()
  log.info("connected "..box.session.peer()..", sid "..box.session.id())
end

function on_disconnect_impl()
  log.info("disconnected, sid "..box.session.id())
end

function on_auth_impl(user)
  log.info("authenticated sid "..box.session.id().." as "..user)
end

function on_connect() pcall(on_connect_impl) end
function on_disconnect() pcall(on_disconnect_impl) end
function on_auth(user) pcall(on_auth_impl, user) end

box.session.on_connect(on_connect)
box.session.on_disconnect(on_disconnect)
box.session.on_auth(on_auth)

Ограничения

Number of parts in an index

For TREE or HASH indexes, the maximum is 255 (box.schema.INDEX_PART_MAX). For ref:RTREE <box_index-rtree> indexes, the maximum is 1 but the field is an ARRAY of up to 20 dimensions. For BITSET indexes, the maximum is 1.

Number of indexes in a space

128 (box.schema.INDEX_MAX).

Number of fields in a tuple

The theoretical maximum is 2,147,483,647 (box.schema.FIELD_MAX). The practical maximum is whatever is specified by the space’s field_count member, or the maximal tuple length.

Number of bytes in a tuple

The maximal number of bytes in a tuple is roughly equal to memtx_max_tuple_size or vinyl_max_tuple_size (with a metadata overhead of about 20 bytes per tuple, which is added on top of useful bytes). By default, the value of either memtx_max_tuple_size or vinyl_max_tuple_size is 1,048,576. To increase it, specify a larger value when starting the Tarantool instance. For example, box.cfg{memtx_max_tuple_size=2*1048576}.

Number of bytes in an index key

If a field in a tuple can contain a million bytes, then the index key can contain a million bytes, so the maximum is determined by factors such as Number of bytes in a tuple, not by the index support.

Number of spaces

The theoretical maximum is 2147483647 (box.schema.SPACE_MAX) but the practical maximum is around 65,000.

Number of connections

The practical limit is the number of file descriptors that one can set with the operating system.

Space size

The total maximum size for all spaces is in effect set by memtx_memory, which in turn is limited by the total available memory.

Update operations count

The maximum number of operations that can be in a single update is 4000 (BOX_UPDATE_OP_CNT_MAX).

Number of users and roles

32 (BOX_USER_MAX).

Length of an index name or space name or user name

65000 (box.schema.NAME_MAX).

Number of replicas in a replica set

32 (box.schema.REPLICA_MAX).

Сервер приложений

In this chapter, we introduce the basics of working with Tarantool as a Lua application server.

Эта глава состоит из следующих разделов:

Launching an application

Using Tarantool as an application server, you can write your own applications. Tarantool’s native language for writing applications is Lua, so a typical application would be a file that contains your Lua script. But you can also write applications in C or C++.

Примечание

If you’re new to Lua, we recommend going over the interactive Tarantool tutorial before proceeding with this chapter. To launch the tutorial, say tutorial() in Tarantool console:

tarantool> tutorial()
---
- |
 Tutorial -- Screen #1 -- Hello, Moon
 ====================================

 Welcome to the Tarantool tutorial.
 It will introduce you to Tarantool’s Lua application server
 and database server, which is what’s running what you’re seeing.
 This is INTERACTIVE -- you’re expected to enter requests
 based on the suggestions or examples in the screen’s text.
 <...>

Let’s create and launch our first Lua application for Tarantool. Here’s a simplest Lua application, the good old «Hello, world!»:

#!/usr/bin/env tarantool
print('Hello, world!')

We save it in a file. Let it be myapp.lua in the current directory.

Now let’s discuss how we can launch our application with Tarantool.

Launching in Docker

If we run Tarantool in a Docker container, the following command will start Tarantool without any application:

# create a temporary container and run it in interactive mode
$ docker run --rm -t -i tarantool/tarantool

To run Tarantool with our application, we can say:

# create a temporary container and
# launch Tarantool with our application
$ docker run --rm -t -i \
             -v `pwd`/myapp.lua:/opt/tarantool/myapp.lua \
             -v /data/dir/on/host:/var/lib/tarantool \
             tarantool/tarantool tarantool /opt/tarantool/myapp.lua

Here two resources on the host get mounted in the container:

  • our application file (\`pwd\`/myapp.lua) and
  • Tarantool data directory (/data/dir/on/host).

By convention, the directory for Tarantool application code inside a container is /opt/tarantool, and the directory for data is /var/lib/tarantool.

Launching a binary program

If we run Tarantool from a binary package or from a source build, we can launch our application:

  • in the script mode,
  • как серверное приложение, либо
  • as a daemon service.

The simplest way is to pass the filename to Tarantool at start:

Tarantool starts, executes our script in the script mode and exits.

Now let’s turn this script into a server application. We use box.cfg from Tarantool’s built-in Lua module to:

  • launch the database (a database has a persistent on-disk state, which needs to be restored after we start an application) and
  • configure Tarantool as a server that accepts requests over a TCP port.

We also add some simple database logic, using space.create() and create_index() to create a space with a primary index. We use the function box.once() to make sure that our logic will be executed only once when the database is initialized for the first time, so we don’t try to create an existing space or index on each invocation of the script:

Now we launch our application in the same manner as before:

This time, Tarantool executes our script and keeps working as a server, accepting TCP requests on port 3301. We can see Tarantool in the current session’s process list:

But the Tarantool instance will stop if we close the current terminal window. To detach Tarantool and our application from the terminal window, we can launch it in the daemon mode. To do so, we add some parameters to box.cfg{}:

  • background = true that actually tells Tarantool to work as a daemon service,
  • log = 'dir-name' that tells the Tarantool daemon where to store its log file (other log settings are available in Tarantool log module), and
  • pid_file = 'file-name' that tells the Tarantool daemon where to store its pid file.

Например:

box.cfg {
   listen = 3301
   background = true,
   log = '1.log',
   pid_file = '1.pid'
}

We launch our application in the same manner as before:

Tarantool executes our script, gets detached from the current shell session (you won’t see it with ps | grep "tarantool") and continues working in the background as a daemon attached to the global session (with SID = 0):

Now that we have discussed how to create and launch a Lua application for Tarantool, let’s dive deeper into programming practices.

Creating an application

Further we walk you through key programming practices that will give you a good start in writing Lua applications for Tarantool. For an adventure, this is a story of implementing… a real microservice based on Tarantool! We implement a backend for a simplified version of Pokémon Go, a location-based augmented reality game released in mid-2016. In this game, players use a mobile device’s GPS capability to locate, capture, battle and train virtual monsters called «pokémon», who appear on the screen as if they were in the same real-world location as the player.

To stay within the walk-through format, let’s narrow the original gameplay as follows. We have a map with pokémon spawn locations. Next, we have multiple players who can send catch-a-pokémon requests to the server (which runs our Tarantool microservice). The server replies whether the pokémon is caught or not, increases the player’s pokémon counter if yes, and triggers the respawn-a-pokémon method that spawns a new pokémon at the same location in a while.

We leave client-side applications outside the scope of this story. Yet we promise a mini-demo in the end to simulate real users and give us some fun. :-)

../../../../_images/aster.svg

First, what would be the best way to deliver our microservice?

Modules, rocks and applications

To make our game logic available to other developers and Lua applications, let’s put it into a Lua module.

A module (called «rock» in Lua) is an optional library which enhances Tarantool functionality. So, we can install our logic as a module in Tarantool and use it from any Tarantool application or module. Like applications, modules in Tarantool can be written in Lua (rocks), C or C++.

Modules are good for two things:

  • easier code management (reuse, packaging, versioning), and
  • hot code reload without restarting the Tarantool instance.

Technically, a module is a file with source code that exports its functions in an API. For example, here is a Lua module named mymodule.lua that exports one function named myfun:

local exports = {}
exports.myfun = function(input_string)
   print('Hello', input_string)
end
return exports

To launch the function myfun() – from another module, from a Lua application, or from Tarantool itself, – we need to save this module as a file, then load this module with the require() directive and call the exported function.

For example, here’s a Lua application that uses myfun() function from mymodule.lua module:

-- loading the module
local mymodule = require('mymodule')

-- calling myfun() from within test() function
local test = function()
  mymodule.myfun()
end

A thing to remember here is that the require() directive takes load paths to Lua modules from the package.path variable. This is a semicolon-separated string, where a question mark is used to interpolate the module name. By default, this variable contains system-wide Lua paths and the working directory. But if we put our modules inside a specific folder (e.g. scripts/), we need to add this folder to package.path before any calls to require():

package.path = 'scripts/?.lua;' .. package.path

For our microservice, a simple and convenient solution would be to put all methods in a Lua module (say pokemon.lua) and to write a Lua application (say game.lua) that initializes the gaming environment and starts the game loop.

../../../../_images/aster.svg

Now let’s get down to implementation details. In our game, we need three entities:

  • map, which is an array of pokémons with coordinates of respawn locations; in this version of the game, let a location be a rectangle identified with two points, upper-left and lower-right;
  • player, which has an ID, a name, and coordinates of the player’s location point;
  • pokémon, which has the same fields as the player, plus a status (active/inactive, that is present on the map or not) and a catch probability (well, let’s give our pokémons a chance to escape :-) )

We’ll store these entities as tuples in Tarantool spaces. But to deliver our backend application as a microservice, the good practice would be to send/receive our data in the universal JSON format, thus using Tarantool as a document storage.

Avro schemas

To store JSON data as tuples, we will apply a savvy practice which reduces data footprint and ensures all stored documents are valid. We will use Tarantool module avro-schema which checks the schema of a JSON document and converts it to a Tarantool tuple. The tuple will contain only field values, and thus take a lot less space than the original document. In avro-schema terms, converting JSON documents to tuples is «flattening», and restoring the original documents is «unflattening». The usage is quite straightforward:

  1. For each entity, we need to define a schema in Apache Avro schema syntax, where we list the entity’s fields with their names and Avro data types.
  2. At initialization, we call avro-schema.create() that creates objects in memory for all schema entities, and compile() that generates flatten/unflatten methods for each entity.
  3. Further on, we just call flatten/unflatten methods for a respective entity on receiving/sending the entity’s data.

Here’s what our schema definitions for the player and pokémon entities look like:

local schema = {
    player = {
        type="record",
        name="player_schema",
        fields={
            {name="id", type="long"},
            {name="name", type="string"},
            {
                name="location",
                type= {
                    type="record",
                    name="player_location",
                    fields={
                        {name="x", type="double"},
                        {name="y", type="double"}
                    }
                }
            }
        }
    },
    pokemon = {
        type="record",
        name="pokemon_schema",
        fields={
            {name="id", type="long"},
            {name="status", type="string"},
            {name="name", type="string"},
            {name="chance", type="double"},
            {
                name="location",
                type= {
                    type="record",
                    name="pokemon_location",
                    fields={
                        {name="x", type="double"},
                        {name="y", type="double"}
                    }
                }
            }
        }
    }
}

And here’s how we create and compile our entities at initialization:

-- load avro-schema module with require()
local avro = require('avro_schema')

-- create models
local ok_m, pokemon = avro.create(schema.pokemon)
local ok_p, player = avro.create(schema.player)
if ok_m and ok_p then
    -- compile models
    local ok_cm, compiled_pokemon = avro.compile(pokemon)
    local ok_cp, compiled_player = avro.compile(player)
    if ok_cm and ok_cp then
        -- start the game
        <...>
    else
        log.error('Schema compilation failed')
    end
else
    log.info('Schema creation failed')
end
return false

As for the map entity, it would be an overkill to introduce a schema for it, because we have only one map in the game, it has very few fields, and – which is most important – we use the map only inside our logic, never exposing it to external users.

../../../../_images/aster.svg

Next, we need methods to implement the game logic. To simulate object-oriented programming in our Lua code, let’s store all Lua functions and shared variables in a single local variable (let’s name it as game). This will allow us to address functions or variables from within our module as self.func_name or self.var_name. Like this:

local game = {
    -- a local variable
    num_players = 0,

    -- a method that prints a local variable
    hello = function(self)
      print('Hello! Your player number is ' .. self.num_players .. '.')
    end,

    -- a method that calls another method and returns a local variable
    sign_in = function(self)
      self.num_players = self.num_players + 1
      self:hello()
      return self.num_players
    end
}

In OOP terms, we can now regard local variables inside game as object fields, and local functions as object methods.

Примечание

In this manual, Lua examples use local variables. Use global variables with caution, since the module’s users may be unaware of them.

To enable/disable the use of undeclared global variables in your Lua code, use Tarantool’s strict module.

So, our game module will have the following methods:

  • catch() to calculate whether the pokémon was caught (besides the coordinates of both the player and pokémon, this method will apply a probability factor, so not every pokémon within the player’s reach will be caught);
  • respawn() to add missing pokémons to the map, say, every 60 seconds (we assume that a frightened pokémon runs away, so we remove a pokémon from the map on any catch attempt and add it back to the map in a while);
  • notify() to log information about caught pokémons (like «Player 1 caught pokémon A»);
  • start() to initialize the game (it will create database spaces, create and compile avro schemas, and launch respawn()).

Besides, it would be convenient to have methods for working with Tarantool storage. For example:

  • add_pokemon() to add a pokémon to the database, and
  • map() to populate the map with all pokémons stored in Tarantool.

We’ll need these two methods primarily when initializing our game, but we can also call them later, for example to test our code.

Bootstrapping a database

Let’s discuss game initialization. In start() method, we need to populate Tarantool spaces with pokémon data. Why not keep all game data in memory? Why use a database? The answer is: persistence. Without a database, we risk losing data on power outage, for example. But if we store our data in an in-memory database, Tarantool takes care to persist it on disk whenever it’s changed. This gives us one more benefit: quick startup in case of failure. Tarantool has a smart algorithm that quickly loads all data from disk into memory on startup, so the warm-up takes little time.

We’ll be using functions from Tarantool built-in box module:

  • box.schema.create_space('pokemons') to create a space named pokemon for storing information about pokémons (we don’t create a similar space for players, because we intend to only send/receive player information via API calls, so we needn’t store it);
  • box.space.pokemons:create_index('primary', {type = 'hash', parts = {1, 'unsigned'}}) to create a primary HASH index by pokémon ID;
  • box.space.pokemons:create_index('status', {type = 'tree', parts = {2, 'str'}}) to create a secondary TREE index by pokémon status.

Notice the parts = argument in the index specification. The pokémon ID is the first field in a Tarantool tuple since it’s the first member of the respective Avro type. So does the pokémon status. The actual JSON document may have ID or status fields at any position of the JSON map.

The implementation of start() method looks like this:

-- create game object
start = function(self)
    -- create spaces and indexes
    box.once('init', function()
        box.schema.create_space('pokemons')
        box.space.pokemons:create_index(
            "primary", {type = 'hash', parts = {1, 'unsigned'}}
        )
        box.space.pokemons:create_index(
            "status", {type = "tree", parts = {2, 'str'}}
        )
    end)

    -- create models
    local ok_m, pokemon = avro.create(schema.pokemon)
    local ok_p, player = avro.create(schema.player)
    if ok_m and ok_p then
        -- compile models
        local ok_cm, compiled_pokemon = avro.compile(pokemon)
        local ok_cp, compiled_player = avro.compile(player)
        if ok_cm and ok_cp then
            -- start the game
            <...>
        else
            log.error('Schema compilation failed')
        end
    else
        log.info('Schema creation failed')
    end
    return false
end

GIS

Now let’s discuss catch(), which is the main method in our gaming logic.

Here we receive the player’s coordinates and the target pokémon’s ID number, and we need to answer whether the player has actually caught the pokémon or not (remember that each pokémon has a chance to escape).

First thing, we validate the received player data against its Avro schema. And we check whether such a pokémon exists in our database and is displayed on the map (the pokémon must have the active status):

catch = function(self, pokemon_id, player)
    -- check player data
    local ok, tuple = self.player_model.flatten(player)
    if not ok then
        return false
    end
    -- get pokemon data
    local p_tuple = box.space.pokemons:get(pokemon_id)
    if p_tuple == nil then
        return false
    end
    local ok, pokemon = self.pokemon_model.unflatten(p_tuple)
    if not ok then
        return false
    end
    if pokemon.status ~= self.state.ACTIVE then
        return false
    end
    -- more catch logic to follow
    <...>
end

Next, we calculate the answer: caught or not.

To work with geographical coordinates, we use Tarantool gis module.

To keep things simple, we don’t load any specific map, assuming that we deal with a world map. And we do not validate incoming coordinates, assuming again that all received locations are within the planet Earth.

We use two geo-specific variables:

  • wgs84, which stands for the latest revision of the World Geodetic System standard, WGS84. Basically, it comprises a standard coordinate system for the Earth and represents the Earth as an ellipsoid.
  • nationalmap, which stands for the US National Atlas Equal Area. This is a projected coordinates system based on WGS84. It gives us a zero base for location projection and allows positioning our players and pokémons in meters.

Both these systems are listed in the EPSG Geodetic Parameter Registry, where each system has a unique number. In our code, we assign these listing numbers to respective variables:

wgs84 = 4326,
nationalmap = 2163,

For our game logic, we need one more variable, catch_distance, which defines how close a player must get to a pokémon before trying to catch it. Let’s set the distance to 100 meters.

catch_distance = 100,

Now we’re ready to calculate the answer. We need to project the current location of both player (p_pos) and pokémon (m_pos) on the map, check whether the player is close enough to the pokémon (using catch_distance), and calculate whether the player has caught the pokémon (here we generate some random value and let the pokémon escape if the random value happens to be less than 100 minus pokémon’s chance value):

-- project locations
local m_pos = gis.Point(
    {pokemon.location.x, pokemon.location.y}, self.wgs84
):transform(self.nationalmap)
local p_pos = gis.Point(
    {player.location.x, player.location.y}, self.wgs84
):transform(self.nationalmap)

-- check catch distance condition
if p_pos:distance(m_pos) > self.catch_distance then
    return false
end
-- try to catch pokemon
local caught = math.random(100) >= 100 - pokemon.chance
if caught then
    -- update and notify on success
    box.space.pokemons:update(
        pokemon_id, {{'=', self.STATUS, self.state.CAUGHT}}
    )
    self:notify(player, pokemon)
end
return caught

Index iterators

By our gameplay, all caught pokémons are returned back to the map. We do this for all pokémons on the map every 60 seconds using respawn() method. We iterate through pokémons by status using Tarantool index iterator function index:pairs and reset the statuses of all «caught» pokémons back to «active» using box.space.pokemons:update().

respawn = function(self)
    fiber.name('Respawn fiber')
    for _, tuple in box.space.pokemons.index.status:pairs(
           self.state.CAUGHT) do
        box.space.pokemons:update(
            tuple[self.ID],
            {{'=', self.STATUS, self.state.ACTIVE}}
        )
    end
 end

For readability, we introduce named fields:

ID = 1, STATUS = 2,

The complete implementation of start() now looks like this:

-- create game object
start = function(self)
    -- create spaces and indexes
    box.once('init', function()
       box.schema.create_space('pokemons')
       box.space.pokemons:create_index(
           "primary", {type = 'hash', parts = {1, 'unsigned'}}
       )
       box.space.pokemons:create_index(
           "status", {type = "tree", parts = {2, 'str'}}
       )
    end)

    -- create models
    local ok_m, pokemon = avro.create(schema.pokemon)
    local ok_p, player = avro.create(schema.player)
    if ok_m and ok_p then
        -- compile models
        local ok_cm, compiled_pokemon = avro.compile(pokemon)
        local ok_cp, compiled_player = avro.compile(player)
        if ok_cm and ok_cp then
            -- start the game
            self.pokemon_model = compiled_pokemon
            self.player_model = compiled_player
            self.respawn()
            log.info('Started')
            return true
         else
            log.error('Schema compilation failed')
         end
    else
        log.info('Schema creation failed')
    end
    return false
end

Fibers

But wait! If we launch it as shown above – self.respawn() – the function will be executed only once, just like all the other methods. But we need to execute respawn() every 60 seconds. Creating a fiber is the Tarantool way of making application logic work in the background at all times.

A fiber exists for executing instruction sequences but it is not a thread. The key difference is that threads use preemptive multitasking, while fibers use cooperative multitasking. This gives fibers the following two advantages over threads:

  • Better controllability. Threads often depend on the kernel’s thread scheduler to preempt a busy thread and resume another thread, so preemption may occur unpredictably. Fibers yield themselves to run another fiber while executing, so yields are controlled by application logic.
  • Higher performance. Threads require more resources to preempt as they need to address the system kernel. Fibers are lighter and faster as they don’t need to address the kernel to yield.

Yet fibers have some limitations as compared with threads, the main limitation being no multi-core mode. All fibers in an application belong to a single thread, so they all use the same CPU core as the parent thread. Meanwhile, this limitation is not really serious for Tarantool applications, because a typical bottleneck for Tarantool is the HDD, not the CPU.

A fiber has all the features of a Lua coroutine and all programming concepts that apply for Lua coroutines will apply for fibers as well. However, Tarantool has made some enhancements for fibers and has used fibers internally. So, although use of coroutines is possible and supported, use of fibers is recommended.

Well, performance or controllability are of little importance in our case. We’ll launch respawn() in a fiber to make it work in the background all the time. To do so, we’ll need to amend respawn():

respawn = function(self)
    -- let's give our fiber a name;
    -- this will produce neat output in fiber.info()
    fiber.name('Respawn fiber')
    while true do
        for _, tuple in box.space.pokemons.index.status:pairs(
                self.state.CAUGHT) do
            box.space.pokemons:update(
                tuple[self.ID],
                {{'=', self.STATUS, self.state.ACTIVE}}
            )
        end
        fiber.sleep(self.respawn_time)
    end
end

and call it as a fiber in start():

start = function(self)
    -- create spaces and indexes
        <...>
    -- create models
        <...>
    -- compile models
        <...>
    -- start the game
       self.pokemon_model = compiled_pokemon
       self.player_model = compiled_player
       fiber.create(self.respawn, self)
       log.info('Started')
    -- errors if schema creation or compilation fails
       <...>
end

Logging

One more helpful function that we used in start() was log.infо() from Tarantool log module. We also need this function in notify() to add a record to the log file on every successful catch:

-- event notification
notify = function(self, player, pokemon)
    log.info("Player '%s' caught '%s'", player.name, pokemon.name)
end

We use default Tarantool log settings, so we’ll see the log output in console when we launch our application in script mode.

../../../../_images/aster.svg

Great! We’ve discussed all programming practices used in our Lua module (see pokemon.lua).

Now let’s prepare the test environment. As planned, we write a Lua application (see game.lua) to initialize Tarantool’s database module, initialize our game, call the game loop and simulate a couple of player requests.

To launch our microservice, we put both pokemon.lua module and game.lua application in the current directory, install all external modules, and launch the Tarantool instance running our game.lua application (this example is for Ubuntu):

$ ls
game.lua  pokemon.lua
$ sudo apt-get install tarantool-gis
$ sudo apt-get install tarantool-avro-schema
$ tarantool game.lua

Tarantool starts and initializes the database. Then Tarantool executes the demo logic from game.lua: adds a pokémon named Pikachu (its chance to be caught is very high, 99.1), displays the current map (it contains one active pokémon, Pikachu) and processes catch requests from two players. Player1 is located just near the lonely Pikachu pokémon and Player2 is located far away from it. As expected, the catch results in this output are «true» for Player1 and «false» for Player2. Finally, Tarantool displays the current map which is empty, because Pikachu is caught and temporarily inactive:

$ tarantool game.lua
2017-01-09 20:19:24.605 [6282] main/101/game.lua C> version 1.7.3-43-gf5fa1e1
2017-01-09 20:19:24.605 [6282] main/101/game.lua C> log level 5
2017-01-09 20:19:24.605 [6282] main/101/game.lua I> mapping 1073741824 bytes for tuple arena...
2017-01-09 20:19:24.609 [6282] main/101/game.lua I> initializing an empty data directory
2017-01-09 20:19:24.634 [6282] snapshot/101/main I> saving snapshot `./00000000000000000000.snap.inprogress'
2017-01-09 20:19:24.635 [6282] snapshot/101/main I> done
2017-01-09 20:19:24.641 [6282] main/101/game.lua I> ready to accept requests
2017-01-09 20:19:24.786 [6282] main/101/game.lua I> Started
---
- {'id': 1, 'status': 'active', 'location': {'y': 2, 'x': 1}, 'name': 'Pikachu', 'chance': 99.1}
...

2017-01-09 20:19:24.789 [6282] main/101/game.lua I> Player 'Player1' caught 'Pikachu'
true
false
--- []
...

2017-01-09 20:19:24.789 [6282] main C> entering the event loop

nginx

In the real life, this microservice would work over HTTP. Let’s add nginx web server to our environment and make a similar demo. But how do we make Tarantool methods callable via REST API? We use nginx with Tarantool nginx upstream module and create one more Lua script (app.lua) that exports three of our game methods – add_pokemon(), map() and catch() – as REST endpoints of the nginx upstream module:

local game = require('pokemon')
box.cfg{listen=3301}
game:start()

-- add, map and catch functions exposed to REST API
function add(request, pokemon)
    return {
        result=game:add_pokemon(pokemon)
    }
end

function map(request)
    return {
        map=game:map()
    }
end

function catch(request, pid, player)
    local id = tonumber(pid)
    if id == nil then
        return {result=false}
    end
    return {
        result=game:catch(id, player)
    }
end

An easy way to configure and launch nginx would be to create a Docker container based on a Docker image with nginx and the upstream module already installed (see http/Dockerfile). We take a standard nginx.conf, where we define an upstream with our Tarantool backend running (this is another Docker container, see details below):

upstream tnt {
      server pserver:3301 max_fails=1 fail_timeout=60s;
      keepalive 250000;
}

and add some Tarantool-specific parameters (see descriptions in the upstream module’s README file):

server {
  server_name tnt_test;

  listen 80 default deferred reuseport so_keepalive=on backlog=65535;

  location = / {
      root /usr/local/nginx/html;
  }

  location /api {
    # answers check infinity timeout
    tnt_read_timeout 60m;
    if ( $request_method = GET ) {
       tnt_method "map";
    }
    tnt_http_rest_methods get;
    tnt_http_methods all;
    tnt_multireturn_skip_count 2;
    tnt_pure_result on;
    tnt_pass_http_request on parse_args;
    tnt_pass tnt;
  }
}

Likewise, we put Tarantool server and all our game logic in a second Docker container based on the official Tarantool 1.7 image (see src/Dockerfile) and set the container’s default command to tarantool app.lua. This is the backend.

Non-blocking IO

To test the REST API, we create a new script (client.lua), which is similar to our game.lua application, but makes HTTP POST and GET requests rather than calling Lua functions:

local http = require('curl').http()
local json = require('json')
local URI = os.getenv('SERVER_URI')
local fiber = require('fiber')

local player1 = {
    name="Player1",
    id=1,
    location = {
        x=1.0001,
        y=2.0003
    }
}
local player2 = {
    name="Player2",
    id=2,
    location = {
        x=30.123,
        y=40.456
    }
}

local pokemon = {
    name="Pikachu",
    chance=99.1,
    id=1,
    status="active",
    location = {
        x=1,
        y=2
    }
}

function request(method, body, id)
    local resp = http:request(
        method, URI, body
    )
    if id ~= nil then
        print(string.format('Player %d result: %s',
            id, resp.body))
    else
        print(resp.body)
    end
end

local players = {}
function catch(player)
    fiber.sleep(math.random(5))
    print('Catch pokemon by player ' .. tostring(player.id))
    request(
        'POST', '{"method": "catch",
        "params": [1, '..json.encode(player)..']}',
        tostring(player.id)
    )
    table.insert(players, player.id)
end

print('Create pokemon')
request('POST', '{"method": "add",
    "params": ['..json.encode(pokemon)..']}')
request('GET', '')

fiber.create(catch, player1)
fiber.create(catch, player2)

-- wait for players
while #players ~= 2 do
    fiber.sleep(0.001)
end

request('GET', '')
os.exit()

When you run this script, you’ll notice that both players have equal chances to make the first attempt at catching the pokémon. In a classical Lua script, a networked call blocks the script until it’s finished, so the first catch attempt can only be done by the player who entered the game first. In Tarantool, both players play concurrently, since all modules are integrated with Tarantool cooperative multitasking and use non-blocking I/O.

Indeed, when Player1 makes its first REST call, the script doesn’t block. The fiber running catch() function on behalf of Player1 issues a non-blocking call to the operating system and yields control to the next fiber, which happens to be the fiber of Player2. Player2’s fiber does the same. When the network response is received, Player1’s fiber is activated by Tarantool cooperative scheduler, and resumes its work. All Tarantool modules use non-blocking I/O and are integrated with Tarantool cooperative scheduler. For module developers, Tarantool provides an API.

For our HTTP test, we create a third container based on the official Tarantool 1.7 image (see client/Dockerfile) and set the container’s default command to tarantool client.lua.

../../../../_images/aster.svg

To run this test locally, download our pokemon project from GitHub and say:

$ docker-compose build
$ docker-compose up

Docker Compose builds and runs all the three containers: pserver (Tarantool backend), phttp (nginx) and pclient (demo client). You can see log messages from all these containers in the console, pclient saying that it made an HTTP request to create a pokémon, made two catch requests, requested the map (empty since the pokémon is caught and temporarily inactive) and exited:

pclient_1  | Create pokemon
<...>
pclient_1  | {"result":true}
pclient_1  | {"map":[{"id":1,"status":"active","location":{"y":2,"x":1},"name":"Pikachu","chance":99.100000}]}
pclient_1  | Catch pokemon by player 2
pclient_1  | Catch pokemon by player 1
pclient_1  | Player 1 result: {"result":true}
pclient_1  | Player 2 result: {"result":false}
pclient_1  | {"map":[]}
pokemon_pclient_1 exited with code 0

Congratulations! Here’s the end point of our walk-through. As further reading, see more about installing and contributing a module.

See also reference on Tarantool modules and C API, and don’t miss our Lua cookbook recipes.

Installing a module

Modules in Lua and C that come from Tarantool developers and community contributors are available in the following locations:

  • Tarantool modules repository, and
  • Tarantool deb/rpm repositories.

Installing a module from a repository

See README in tarantool/rocks repository for detailed instructions.

Installing a module from deb/rpm

Follow these steps:

  1. Install Tarantool as recommended on the download page.

  2. Install the module you need. Look up the module’s name on Tarantool rocks page and put the prefix «tarantool-» before the module name to avoid ambiguity:

    # for Ubuntu/Debian:
    $ sudo apt-get install tarantool-<module-name>
    
    # for RHEL/CentOS/Amazon:
    $ sudo yum install tarantool-<module-name>
    

    For example, to install the module shard on Ubuntu, say:

    $ sudo apt-get install tarantool-shard
    

Теперь можно:

  • load any module with

    tarantool> local-name = require('module-name')
    
  • search locally for installed modules using package.path (Lua) or package.cpath (C):

    tarantool> package.path
    ---
    - ./?.lua;./?/init.lua; /usr/local/share/tarantool/?.lua;/usr/local/share/
    tarantool/?/init.lua;/usr/share/tarantool/?.lua;/usr/share/tarantool/?/ini
    t.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua;/
    usr/share/lua/5.1/?.lua;/usr/share/lua/5.1/?/init.lua;
    ...
    
    tarantool> package.cpath
    ---
    - ./?.so;/usr/local/lib/x86_64-linux-gnu/tarantool/?.so;/usr/lib/x86_64-li
    nux-gnu/tarantool/?.so;/usr/local/lib/tarantool/?.so;/usr/local/lib/x86_64
    -linux-gnu/lua/5.1/?.so;/usr/lib/x86_64-linux-gnu/lua/5.1/?.so;/usr/local/
    lib/lua/5.1/?.so;
    ...
    

    Примечание

    Question-marks stand for the module name that was specified earlier when saying require('module-name').

Contributing a module

We have already discussed how to create a simple module in Lua for local usage. Now let’s discuss how to create a more advanced Tarantool module and then get it published on Tarantool rocks page and included in official Tarantool images for Docker.

To help our contributors, we have created modulekit, a set of templates for creating Tarantool modules in Lua and C.

Примечание

As a prerequisite for using modulekit, install tarantool-dev package first. For example, in Ubuntu say:

$ sudo apt-get install tarantool-dev

Contributing a module in Lua

See README in «luakit» branch of tarantool/modulekit repository for detailed instructions and examples.

Contributing a module in C

In some cases, you may want to create a Tarantool module in C rather than in Lua. For example, to work with specific hardware or low-level system interfaces.

See README in «ckit» branch of tarantool/modulekit repository for detailed instructions and examples.

Примечание

Вы можете аналогичным образом создавать модули на C++ при условии, что в их коде не будут выбрасываться исключения.

Reloading a module

You can reload any Tarantool application or module with zero downtime.

Перезагрузка модуля на Lua

Here’s an example that illustrates the most typical case – «update and reload».

Примечание

In this example, we use recommended administration practices based on instance files and tarantoolctl utility.

  1. Update the application file.

    For example, a module in /usr/share/tarantool/app.lua:

    local function start()
      -- initial version
      box.once("myapp:v1.0", function()
        box.schema.space.create("somedata")
        box.space.somedata:create_index("primary")
        ...
      end)
    
      -- migration code from 1.0 to 1.1
      box.once("myapp:v1.1", function()
        box.space.somedata.index.primary:alter(...)
        ...
      end)
    
      -- migration code from 1.1 to 1.2
      box.once("myapp:v1.2", function()
        box.space.somedata.index.primary:alter(...)
        box.space.somedata:insert(...)
        ...
      end)
    end
    
    -- start some background fibers if you need
    
    local function stop()
      -- stop all background fibers and clean up resources
    end
    
    local function api_for_call(xxx)
      -- do some business
    end
    
    return {
      start = start,
      stop = stop,
      api_for_call = api_for_call
    }
    
  2. Обновить файл экземпляра.

    For example, /etc/tarantool/instances.enabled/my_app.lua:

    #!/usr/bin/env tarantool
    --
    -- hot code reload example
    --
    
    box.cfg({listen = 3302})
    
    -- ATTENTION: unload it all properly!
    local app = package.loaded['app']
    if app ~= nil then
      -- stop the old application version
      app.stop()
      -- unload the application
      package.loaded['app'] = nil
      -- unload all dependencies
      package.loaded['somedep'] = nil
    end
    
    -- load the application
    log.info('require app')
    app = require('app')
    
    -- start the application
    app.start({some app options controlled by sysadmins})
    

    The important thing here is to properly unload the application and its dependencies.

  3. Manually reload the application file.

    For example, using tarantoolctl:

    $ tarantoolctl eval my_app /etc/tarantool/instances.enabled/my_app.lua
    

Перезагрузка модуля на С

After you compiled a new version of a C module (*.so shared library), call box.schema.func.reload(„module-name“) from your Lua script to reload the module.

Developing with an IDE

You can use IntelliJ IDEA as an IDE to develop and debug Lua applications for Tarantool.

  1. Download and install the IDE from the official web-site.

    JetBrains provides specialized editions for particular languages: IntelliJ IDEA (Java), PHPStorm (PHP), PyCharm (Python), RubyMine (Ruby), CLion (C/C++), WebStorm (Web) and others. So, download a version that suits your primary programming language.

    Tarantool integration is supported for all editions.

  2. Configure the IDE:

    1. Start IntelliJ IDEA.

    2. Click Configure button and select Plugins.

      ../../../../_images/ide_1.png
    3. Click Browse repositories.

      ../../../../_images/ide_2.png
    4. Install EmmyLua plugin.

      Примечание

      Please don’t be confused with Lua plugin, which is less powerful than EmmyLua.

      ../../../../_images/ide_3.png
    5. Restart IntelliJ IDEA.

    6. Click Configure, select Project Defaults and then Run Configurations.

      ../../../../_images/ide_4.png
    7. Find Lua Application in the sidebar at the left.

    8. In Program, type a path to an installed tarantool binary.

      By default, this is tarantool or /usr/bin/tarantool on most platforms.

      If you installed tarantool from sources to a custom directory, please specify the proper path here.

      ../../../../_images/ide_5.png

      Now IntelliJ IDEA is ready to use with Tarantool.

  3. Create a new Lua project.

    ../../../../_images/ide_6.png
  4. Add a new Lua file, for example init.lua.

    ../../../../_images/ide_7.png
  5. Write your code, save the file.

  6. To run you application, click Run -> Run in the main menu and select your source file in the list.

    ../../../../_images/ide_8.png

    Or click Run -> Debug to start debugging.

    Примечание

    To use Lua debugger, please upgrade Tarantool to version 1.7.5-29-gbb6170e4b or later.

    ../../../../_images/ide_9.png

Книга рецептов

Here are contributions of Lua programs for some frequent or tricky situations.

You can execute any of these programs by copying the code into a .lua file, and then entering chmod +x ./program-name.lua and ./program-name.lua on the terminal.

The first line is a «hashbang»:

#!/usr/bin/env tarantool

This runs Tarantool Lua application server, which should be on the execution path.

Use freely.

hello_world.lua

The standard example of a simple program.

#!/usr/bin/env tarantool

print('Hello, World!')

console_start.lua

Use box.once() to initialize a database (creating spaces) if this is the first time the server has been run. Then use console.start() to start interactive mode.

#!/usr/bin/env tarantool

-- Configure database
box.cfg {
    listen = 3313
}

box.once("bootstrap", function()
    box.schema.space.create('tweedledum')
    box.space.tweedledum:create_index('primary',
        { type = 'TREE', parts = {1, 'unsigned'}})
end)

require('console').start()

fio_read.lua

Use the fio module to open, read, and close a file.

#!/usr/bin/env tarantool

local fio = require('fio')
local errno = require('errno')
local f = fio.open('/tmp/xxxx.txt', {'O_RDONLY' })
if not f then
    error("Failed to open file: "..errno.strerror())
end
local data = f:read(4096)
f:close()
print(data)

fio_write.lua

Use the fio module to open, write, and close a file.

#!/usr/bin/env tarantool

local fio = require('fio')
local errno = require('errno')
local f = fio.open('/tmp/xxxx.txt', {'O_CREAT', 'O_WRONLY', 'O_APPEND'},
    tonumber('0666', 8))
if not f then
    error("Failed to open file: "..errno.strerror())
end
f:write("Hello\n");
f:close()

ffi_printf.lua

Use the LuaJIT ffi library to call a C built-in function: printf(). (For help understanding ffi, see the FFI tutorial.)

#!/usr/bin/env tarantool

local ffi = require('ffi')
ffi.cdef[[
    int printf(const char *format, ...);
]]

ffi.C.printf("Hello, %s\n", os.getenv("USER"));

ffi_gettimeofday.lua

Use the LuaJIT ffi library to call a C function: gettimeofday(). This delivers time with millisecond precision, unlike the time function in Tarantool’s clock module.

#!/usr/bin/env tarantool

local ffi = require('ffi')
ffi.cdef[[
    typedef long time_t;
    typedef struct timeval {
    time_t tv_sec;
    time_t tv_usec;
} timeval;
    int gettimeofday(struct timeval *t, void *tzp);
]]

local timeval_buf = ffi.new("timeval")
local now = function()
    ffi.C.gettimeofday(timeval_buf, nil)
    return tonumber(timeval_buf.tv_sec * 1000 + (timeval_buf.tv_usec / 1000))
end

ffi_zlib.lua

Use the LuaJIT ffi library to call a C library function. (For help understanding ffi, see the FFI tutorial.)

#!/usr/bin/env tarantool

local ffi = require("ffi")
ffi.cdef[[
    unsigned long compressBound(unsigned long sourceLen);
    int compress2(uint8_t *dest, unsigned long *destLen,
    const uint8_t *source, unsigned long sourceLen, int level);
    int uncompress(uint8_t *dest, unsigned long *destLen,
    const uint8_t *source, unsigned long sourceLen);
]]
local zlib = ffi.load(ffi.os == "Windows" and "zlib1" or "z")

-- Lua wrapper for compress2()
local function compress(txt)
    local n = zlib.compressBound(#txt)
    local buf = ffi.new("uint8_t[?]", n)
    local buflen = ffi.new("unsigned long[1]", n)
    local res = zlib.compress2(buf, buflen, txt, #txt, 9)
    assert(res == 0)
    return ffi.string(buf, buflen[0])
end

-- Lua wrapper for uncompress
local function uncompress(comp, n)
    local buf = ffi.new("uint8_t[?]", n)
    local buflen = ffi.new("unsigned long[1]", n)
    local res = zlib.uncompress(buf, buflen, comp, #comp)
    assert(res == 0)
    return ffi.string(buf, buflen[0])
end

-- Simple test code.
local txt = string.rep("abcd", 1000)
print("Uncompressed size: ", #txt)
local c = compress(txt)
print("Compressed size: ", #c)
local txt2 = uncompress(c, #txt)
assert(txt2 == txt)

ffi_meta.lua

Use the LuaJIT ffi library to access a C object via a metamethod (a method which is defined with a metatable).

#!/usr/bin/env tarantool

local ffi = require("ffi")
ffi.cdef[[
typedef struct { double x, y; } point_t;
]]

local point
local mt = {
  __add = function(a, b) return point(a.x+b.x, a.y+b.y) end,
  __len = function(a) return math.sqrt(a.x*a.x + a.y*a.y) end,
  __index = {
    area = function(a) return a.x*a.x + a.y*a.y end,
  },
}
point = ffi.metatype("point_t", mt)

local a = point(3, 4)
print(a.x, a.y)  --> 3  4
print(#a)        --> 5
print(a:area())  --> 25
local b = a + point(0.5, 8)
print(#b)        --> 12.5

count_array.lua

Use the „#“ operator to get the number of items in an array-like Lua table. This operation has O(log(N)) complexity.

#!/usr/bin/env tarantool

array = { 1, 2, 3}
print(#array)

count_array_with_nils.lua

Missing elements in arrays, which Lua treats as «nil»s, cause the simple «#» operator to deliver improper results. The «print(#t)» instruction will print «4»; the «print(counter)» instruction will print «3»; the «print(max)» instruction will print «10». Other table functions, such as table.sort(), will also misbehave when «nils» are present.

#!/usr/bin/env tarantool

local t = {}
t[1] = 1
t[4] = 4
t[10] = 10
print(#t)
local counter = 0
for k,v in pairs(t) do counter = counter + 1 end
print(counter)
local max = 0
for k,v in pairs(t) do if k > max then max = k end end
print(max)

count_array_with_nulls.lua

Use explicit NULL values to avoid the problems caused by Lua’s nil == missing value behavior. Although json.NULL == nil is true, all the print instructions in this program will print the correct value: 10.

#!/usr/bin/env tarantool

local json = require('json')
local t = {}
t[1] = 1; t[2] = json.NULL; t[3]= json.NULL;
t[4] = 4; t[5] = json.NULL; t[6]= json.NULL;
t[6] = 4; t[7] = json.NULL; t[8]= json.NULL;
t[9] = json.NULL
t[10] = 10
print(#t)
local counter = 0
for k,v in pairs(t) do counter = counter + 1 end
print(counter)
local max = 0
for k,v in pairs(t) do if k > max then max = k end end
print(max)

count_map.lua

Get the number of elements in a map-like table.

#!/usr/bin/env tarantool

local map = { a = 10, b = 15, c = 20 }
local size = 0
for _ in pairs(map) do size = size + 1; end
print(size)

swap.lua

Use a Lua peculiarity to swap two variables without needing a third variable.

#!/usr/bin/env tarantool

local x = 1
local y = 2
x, y = y, x
print(x, y)

class.lua

Create a class, create a metatable for the class, create an instance of the class. Another illustration is at http://lua-users.org/wiki/LuaClassesWithMetatable.

#!/usr/bin/env tarantool

-- define class objects
local myclass_somemethod = function(self)
    print('test 1', self.data)
end

local myclass_someothermethod = function(self)
    print('test 2', self.data)
end

local myclass_tostring = function(self)
    return 'MyClass <'..self.data..'>'
end

local myclass_mt = {
    __tostring = myclass_tostring;
    __index = {
        somemethod = myclass_somemethod;
        someothermethod = myclass_someothermethod;
    }
}

-- create a new object of myclass
local object = setmetatable({ data = 'data'}, myclass_mt)
print(object:somemethod())
print(object.data)

garbage.lua

Force Lua garbage collection with the collectgarbage function.

#!/usr/bin/env tarantool

collectgarbage('collect')

fiber_producer_and_consumer.lua

Start one fiber for producer and one fiber for consumer. Use fiber.channel() to exchange data and synchronize. One can tweak the channel size (ch_size in the program code) to control the number of simultaneous tasks waiting for processing.

#!/usr/bin/env tarantool

local fiber = require('fiber')
local function consumer_loop(ch, i)
    -- initialize consumer synchronously or raise an error()
    fiber.sleep(0) -- allow fiber.create() to continue
    while true do
        local data = ch:get()
        if data == nil then
            break
        end
        print('consumed', i, data)
        fiber.sleep(math.random()) -- simulate some work
    end
end

local function producer_loop(ch, i)
    -- initialize consumer synchronously or raise an error()
    fiber.sleep(0) -- allow fiber.create() to continue
    while true do
        local data = math.random()
        ch:put(data)
        print('produced', i, data)
    end
end

local function start()
    local consumer_n = 5
    local producer_n = 3

    -- Create a channel
    local ch_size = math.max(consumer_n, producer_n)
    local ch = fiber.channel(ch_size)

    -- Start consumers
    for i=1, consumer_n,1 do
        fiber.create(consumer_loop, ch, i)
    end

    -- Start producers
    for i=1, producer_n,1 do
        fiber.create(producer_loop, ch, i)
    end
end

start()
print('started')

socket_tcpconnect.lua

Use socket.tcp_connect() to connect to a remote host via TCP. Display the connection details and the result of a GET request.

#!/usr/bin/env tarantool

local s = require('socket').tcp_connect('google.com', 80)
print(s:peer().host)
print(s:peer().family)
print(s:peer().type)
print(s:peer().protocol)
print(s:peer().port)
print(s:write("GET / HTTP/1.0\r\n\r\n"))
print(s:read('\r\n'))
print(s:read('\r\n'))

socket_tcp_echo.lua

Use socket.tcp_connect() to set up a simple TCP server, by creating a function that handles requests and echos them, and passing the function to socket.tcp_server(). This program has been used to test with 100,000 clients, with each client getting a separate fiber.

#!/usr/bin/env tarantool

local function handler(s, peer)
    s:write("Welcome to test server, " .. peer.host .."\n")
    while true do
        local line = s:read('\n')
        if line == nil then
            break -- error or eof
        end
        if not s:write("pong: "..line) then
            break -- error or eof
        end
    end
end

local server, addr = require('socket').tcp_server('localhost', 3311, handler)

getaddrinfo.lua

Use socket.getaddrinfo() to perform non-blocking DNS resolution, getting both the AF_INET6 and AF_INET information for „google.com“. This technique is not always necessary for tcp connections because socket.tcp_connect() performs socket.getaddrinfo under the hood, before trying to connect to the first available address.

#!/usr/bin/env tarantool

local s = require('socket').getaddrinfo('google.com', 'http', { type = 'SOCK_STREAM' })
print('host=',s[1].host)
print('family=',s[1].family)
print('type=',s[1].type)
print('protocol=',s[1].protocol)
print('port=',s[1].port)
print('host=',s[2].host)
print('family=',s[2].family)
print('type=',s[2].type)
print('protocol=',s[2].protocol)
print('port=',s[2].port)

socket_udp_echo.lua

Tarantool does not currently have a udp_server function, therefore socket_udp_echo.lua is more complicated than socket_tcp_echo.lua. It can be implemented with sockets and fibers.

#!/usr/bin/env tarantool

local socket = require('socket')
local errno = require('errno')
local fiber = require('fiber')

local function udp_server_loop(s, handler)
    fiber.name("udp_server")
    while true do
        -- try to read a datagram first
        local msg, peer = s:recvfrom()
        if msg == "" then
            -- socket was closed via s:close()
            break
        elseif msg ~= nil then
            -- got a new datagram
            handler(s, peer, msg)
        else
            if s:errno() == errno.EAGAIN or s:errno() == errno.EINTR then
                -- socket is not ready
                s:readable() -- yield, epoll will wake us when new data arrives
            else
                -- socket error
                local msg = s:error()
                s:close() -- save resources and don't wait GC
                error("Socket error: " .. msg)
            end
        end
    end
end

local function udp_server(host, port, handler)
    local s = socket('AF_INET', 'SOCK_DGRAM', 0)
    if not s then
        return nil -- check errno:strerror()
    end
    if not s:bind(host, port) then
        local e = s:errno() -- save errno
        s:close()
        errno(e) -- restore errno
        return nil -- check errno:strerror()
    end

    fiber.create(udp_server_loop, s, handler) -- start a new background fiber
    return s
end

A function for a client that connects to this server could look something like this …

local function handler(s, peer, msg)
    -- You don't have to wait until socket is ready to send UDP
    -- s:writable()
    s:sendto(peer.host, peer.port, "Pong: " .. msg)
end

local server = udp_server('127.0.0.1', 3548, handler)
if not server then
    error('Failed to bind: ' .. errno.strerror())
end

print('Started')

require('console').start()

http_get.lua

Use the http module to get data via HTTP.

#!/usr/bin/env tarantool

local http_client = require('http.client')
local json = require('json')
local r = http_client.get('http://api.openweathermap.org/data/2.5/weather?q=Oakland,us')
if r.status ~= 200 then
    print('Failed to get weather forecast ', r.reason)
    return
end
local data = json.decode(r.body)
print('Oakland wind speed: ', data.wind.speed)

http_send.lua

Use the http module to send data via HTTP.

#!/usr/bin/env tarantool

local http_client = require('http.client')
local json = require('json')
local data = json.encode({ Key = 'Value'})
local headers = { Token = 'xxxx', ['X-Secret-Value'] = 42 }
local r = http_client.post('http://localhost:8081', data, { headers = headers})
if r.status == 200 then
    print 'Success'
end

http_server.lua

Use the http rock (which must first be installed) to turn Tarantool into a web server.

#!/usr/bin/env tarantool

local function handler(self)
    return self:render{ json = { ['Your-IP-Is'] = self.peer.host } }
end

local server = require('http.server').new(nil, 8080) -- listen *:8080
server:route({ path = '/' }, handler)
server:start()
-- connect to localhost:8080 and see json

http_generate_html.lua

Use the http rock (which must first be installed) to generate HTML pages from templates. The http rock has a fairly simple template engine which allows execution of regular Lua code inside text blocks (like PHP). Therefore there is no need to learn new languages in order to write templates.

#!/usr/bin/env tarantool

local function handler(self)
local fruits = { 'Apple', 'Orange', 'Grapefruit', 'Banana'}
    return self:render{ fruits = fruits }
end

local server = require('http.server').new(nil, 8080) -- nil means '*'
server:route({ path = '/', file = 'index.html.lua' }, handler)
server:start()

An «HTML» file for this server, including Lua, could look like this (it would produce «1 Apple | 2 Orange | 3 Grapefruit | 4 Banana»).

<html>
<body>
    <table border="1">
        % for i,v in pairs(fruits) do
        <tr>
            <td><%= i %></td>
            <td><%= v %></td>
        </tr>
        % end
    </table>
</body>
</html>

Администрирование серверной части

Tarantool устроен таким образом, что возможно запустить несколько экземпляров программы на одном компьютере.

Здесь мы показываем, как администрировать экземпляры Tarantool’а с помощью любой из следующих утилит:

  • встроенные утилиты systemd или
  • tarantoolctl, утилита, поставляемая и устанавливаемая вместе с дистрибутивом Tarantool’а.

Примечание

  • В отличие от остальной части руководства, в этой главе мы используем общесистемные пути.
  • Здесь мы приводим примеры консольного вывода для Fedora.

Эта глава включает в себя следующие разделы:

Настройка экземпляров Tarantool’а

Для каждого экземпляра Tarantool’а понадобится два файла:

  • [Необязательный] Файл приложения, содержащий логику данного экземпляра. Поместите его в папку /usr/share/tarantool/.

    Например, /usr/share/tarantool/my_app.lua (здесь мы реализуем его как Lua-модуль, который запускает базу данных и экспортирует функцию start() для API -вызовов):

    local function start()
        box.schema.space.create("somedata")
        box.space.somedata:create_index("primary")
        <...>
    end
    
    return {
      start = start;
    }
    
  • Файл экземпляра, содержащий логику и параметры инициализации данного экземпляра. Поместите этот файл или символьную ссылку на него в папку /etc/tarantool/instances.enabled.

    Например, /etc/tarantool/instances.enabled/my_app.lua (здесь мы загружаем модуль my_app.lua и вызываем из него функцию start()):

    #!/usr/bin/env tarantool
    
    box.cfg {
        listen = 3301;
    }
    
    -- load my_app module and call start() function
    -- with some app options controlled by sysadmins
    local m = require('my_app').start({...})
    

Файл экземпляра

После столь краткого предисловия может возникнуть вопрос: что из себя представляет файл экземпляра, для чего он нужен и как tarantoolctl использует его? Если Tarantool - это сервер приложений, так почему бы не запускать хранящееся в /usr/share/tarantool приложение напрямую?

Типичное приложение для Tarantool - это не скрипт, а демон, запущенный в фоновом режиме и обрабатывающий запросы, которые, как правило, посылаются через TCP/IP-сокет. Необходимо запускать этот демон со стартом операционной системы и управлять им с помощью стандартных средств операционной системы для управления сервисами – таких как systemd или init.d. С этой целью и были созданы файлы экземпляра.

Файлов экземпляра может быть больше одного. Например, одно и то же приложение в /usr/share/tarantool может быть запущено на нескольких экземплярах Tarantool’а, у каждого из которых есть свой файл экземпляра. Или в /usr/share/tarantool может быть несколько приложений, и на каждое из них будет опять же приходиться свой файл экземпляра.

Обычно файл экземпляра создает системный администратор, а файл приложения предоставляет разработчик в Lua-модуле или rpm/deb-пакете.

По своему устройству файл экземпляра ничем не отличается от Lua-приложения. Однако с его помощью должна настраиваться база данных, поэтому в нем должен содержаться вызов box.cfg{}, потому что это единственный способ превратить Tarantool-скрипт в фоновый процесс, а tarantoolctl - это инструмент для управления фоновыми процессами. За исключением этого вызова, файл экземпляра может содержать произвольный код на Lua и, теоретически, даже всю бизнес-логику приложения. Однако мы не рекомендуем хранить весь код в файле экземпляра, потому что это приводит как к замусориванию самого файла, так и к ненужному копированию кода при необходимости запустить несколько экземпляров приложения.

Конфигурационный файл tarantoolctl

Файлы экземпляра содержат конфигурацию экземпляра, тогда как конфигурационный файл tarantoolctl содержит конфигурацию, которую tarantoolctl использует, чтобы переопределять конфигурацию экземпляров. Другими словами, он содержит общесистемную конфигурацию по умолчанию.

Большинство параметров схожи с теми, которые используются в box.cfg{}. Ниже даны настройки по умолчанию (устанавливаемые в /etc/default/tarantool как часть дистрибутива Tarantool):

default_cfg = {
    pid_file  = "/var/run/tarantool",
    wal_dir   = "/var/lib/tarantool",
    memtx_dir = "/var/lib/tarantool",
    vinyl_dir = "/var/lib/tarantool",
    log       = "/var/log/tarantool",
    username  = "tarantool",
}
instance_dir = "/etc/tarantool/instances.enabled"

где:

  • pid_file
    Директория, где хранятся pid-файл и socket-файл; tarantoolctl добавляет “/имя_экземпляра” к имени директории.
  • wal_dir
    Директория, где хранятся .xlog-файлы; tarantoolctl добавляет “/имя_экземпляра” к имени директории.
  • memtx_dir
    Директория, где хранятся .snap-файлы; tarantoolctl добавляет “/имя_экземпляра” к имени директории.
  • vinyl_dir
    Директория, где хранятся vinyl-файлы; tarantoolctl добавляет “/имя_экземпляра” к имени директории.
  • log
    Директория, где хранятся файлы журнала с сообщениями от Tarantool-приложения; tarantoolctl добавляет “/имя_экземпляра” к имени директории.
  • username
    Пользователь, запускающий экземпляр Tarantool’а. Это пользователь операционной системы, а не Tarantool-клиента. Став демоном, Tarantool сменит своего пользователя на указанного.
  • instance_dir
    Директория, где хранятся все файлы экземпляра для данного компьютера. Поместите сюда файлы экземпляра или создайте символьные ссылки на них.

В качестве полноценного примера можно использовать скрипт example.lua, который поставляется вместе с Tarantool и задает все конфигурационные параметры.

Запуск/остановка экземпляра

Lua-приложение выполняется Tarantool’ом, тогда как файл экземпляра выполняется Tarantool-скриптом tarantoolctl.

Вот что делает tarantoolctl при вводе следующей команды:

$ tarantoolctl start <имя_экземпляра>
  1. Считывает и разбирает аргументы командной строки. В нашем случае последний аргумент содержит имя экземпляра.

  2. Считывает и разбирает собственный конфигурационный файл. Этот файл содержит параметры tarantoolctl по умолчанию – такие как путь до директории, в которой располагаются экземпляры.

    Конфигурационный файл с параметрами tarantoolctl по умолчанию устанавливается в /etc/default/tarantool. Этот файл используется, когда tarantoolctl вызывается с root-правами. Если вызов производит локальный пользователь, tarantoolctl сперва ищет свой файл с параметрами по умолчанию в текущей директории ($PWD/.tarantoolctl), затем в домашней директории текущего пользователя ($HOME/.config/tarantool/tarantool). Если файл не найден, tarantoolctl принимает встроенные параметры по умолчанию.

  3. Ищет файл экземпляра в директории, где располагаются экземпляры, - например, в /etc/tarantool/instances.enabled. tarantoolctl строит путь до файла экземпляра следующим образом: «путь до директории с экземплярами» + «имя экземпляра» + «.lua».

  4. Переопределяет функцию box.cfg{}, чтобы предобработать ее параметры и сделать так, чтобы пути к экземплярам указывали на пути, прописанные в конфигурационном файле tarantoolctl. Например, если в конфигурационном файле указано, что рабочей директорией экземпляра является /var/tarantool, то новая реализация box.cfg{} сделает так, чтобы параметр work_dir в box.cfg{} имел значение /var/tarantool/<имя_экземпляра>, независимо от того, какой путь указан в самом файле экземпляра.

  5. Создает так называемый «файл для управления экземпляром». Это Unix-сокет с прикрепленной к нему Lua-консолью. В дальнейшем tarantoolctl использует этот файл для получения состояния экземпляра, отправки команд и т.д.

  6. Наконец, использует Lua-команду dofile для выполнения файла экземпляра.

При запуске экземпляра с помощью инструментария systemd указанным ниже способом (имя экземпляра - my_app):

$ systemctl start tarantool@my_app
$ ps axuf|grep exampl[e]
taranto+  5350  1.3  0.3 1448872 7736 ?        Ssl  20:05   0:28 tarantool my_app.lua <running>

…на самом деле вызывается tarantoolctl - так же, как и в случае tarantoolctl start my_app.

Для проверки файла экземпляра на наличие синтаксических ошибок перед запуском экземпляра my_app используйте команду:

$ tarantoolctl check my_app

Для включения автоматической загрузки экземпляра my_app при запуске всей системы используйте команду:

$ systemctl enable tarantool@my_app

Для остановки работающего экземпляра my_app используйте команду:

$ tarantoolctl stop my_app
$ # - ИЛИ -
$ systemctl stop tarantool@my_app

Для перезапуска (т.е. остановки и запуска) работающего экземпляра my_app используйте команду:

$ tarantoolctl restart my_app
$ # - ИЛИ -
$ systemctl restart tarantool@my_app

Локальный запуск Tarantool

Иногда бывает необходимо запустить Tarantool локально - например, для тестирования. Давайте настроим локальный экземпляр, запустим его и будем мониторить с помощью tarantoolctl.

Сперва создадим директорию-песочницу по следующему пути:

$ mkdir ~/tarantool_test

…и поместим конфигурационный файл с параметрами tarantoolctl по умолчанию в $HOME/.config/tarantool/tarantool. Содержимое файла будет таким:

default_cfg = {
    pid_file  = "/home/user/tarantool_test/my_app.pid",
    wal_dir   = "/home/user/tarantool_test",
    snap_dir  = "/home/user/tarantool_test",
    vinyl_dir = "/home/user/tarantool_test",
    log       = "/home/user/tarantool_test/log",
}
instance_dir = "/home/user/tarantool_test"

Примечание

  • Указывайте полный путь к домашней директории пользователя вместо «~/».
  • Опустите параметр username. Обычно, когда запуск производит локальный пользователь, у tarantoolctl нет разрешения на смену текущего пользователя. Экземпляр будет работать с пользователем „admin“.

Далее создадим файл экземпляра ~/tarantool_test/my_app.lua. Содержимое файла будет таким:

box.cfg{listen = 3301}
box.schema.user.passwd('Gx5!')
box.schema.user.grant('guest','read,write,execute','universe')
fiber = require('fiber')
box.schema.space.create('tester')
box.space.tester:create_index('primary',{})
i = 0
while 0 == 0 do
    fiber.sleep(5)
    i = i + 1
    print('insert ' .. i)
    box.space.tester:insert{i, 'my_app tuple'}
end

Проверим наш файл экземпляра, сперва запустив его без tarantoolctl:

$ cd ~/tarantool_test
$ tarantool my_app.lua
2017-04-06 10:42:15.762 [54085] main/101/my_app.lua C> version 1.7.3-489-gd86e36d5b
2017-04-06 10:42:15.763 [54085] main/101/my_app.lua C> log level 5
2017-04-06 10:42:15.764 [54085] main/101/my_app.lua I> mapping 268435456 bytes for tuple arena...
2017-04-06 10:42:15.774 [54085] iproto/101/main I> binary: bound to [::]:3301
2017-04-06 10:42:15.774 [54085] main/101/my_app.lua I> initializing an empty data directory
2017-04-06 10:42:15.789 [54085] snapshot/101/main I> saving snapshot `./00000000000000000000.snap.inprogress'
2017-04-06 10:42:15.790 [54085] snapshot/101/main I> done
2017-04-06 10:42:15.791 [54085] main/101/my_app.lua I> vinyl checkpoint done
2017-04-06 10:42:15.791 [54085] main/101/my_app.lua I> ready to accept requests
insert 1
insert 2
insert 3
<...>

Запустим экземпляр Tarantool’а с помощью tarantoolctl:

$ tarantoolctl start my_app

В консоли должны появиться сообщения о том, что экземпляр запущен. Затем выполним следующую команду:

$ ls -l ~/tarantool_test/my_app

В консоли должны появиться .snap-файл и .xlog-файл. Затем выполним следующую команду:

$ less ~/tarantool_test/log/my_app.log

В консоли должно отобразиться содержимое файла журнала для приложения my_app, в том числе сообщения об ошибках, если они были. Затем выполним серию команд:

$ tarantoolctl enter my_app
tarantool> box.cfg{}
tarantool> console = require('console')
tarantool> console.connect('localhost:3301')
tarantool> box.space.tester:select({0}, {iterator = 'GE'})

В консоли должны появиться несколько кортежей, которые создало приложение my_app.

Теперь остановим приложение my_app. Корректный способ остановки - это использовать``tarantoolctl``:

$ tarantoolctl stop my_app

Последний шаг - удаление тестовых данных.

$ rm -R tarantool_test

Журналирование

Все важные события Tarantool записывает в файл журнала - например, в /var/log/tarantool/my_app.log. tarantoolctl строит путь до файла журнала следующим образом: «путь до директории с экземплярами» + «имя экземпляра» + «.lua».

Запишем что-нибудь в файл журнала:

$ tarantoolctl enter my_app
/bin/tarantoolctl: connected to unix/:/var/run/tarantool/my_app.control
unix/:/var/run/tarantool/my_app.control> require('log').info("Hello for the manual readers")
---
...

Затем проверим содержимое журнала:

$ tail /var/log/tarantool/my_app.log
2017-04-04 15:54:04.977 [29255] main/101/tarantoolctl C> version 1.7.3-382-g68ef3f6a9
2017-04-04 15:54:04.977 [29255] main/101/tarantoolctl C> log level 5
2017-04-04 15:54:04.978 [29255] main/101/tarantoolctl I> mapping 134217728 bytes for tuple arena...
2017-04-04 15:54:04.985 [29255] iproto/101/main I> binary: bound to [::1]:3301
2017-04-04 15:54:04.986 [29255] main/101/tarantoolctl I> recovery start
2017-04-04 15:54:04.986 [29255] main/101/tarantoolctl I> recovering from `/var/lib/tarantool/my_app/00000000000000000000.snap'
2017-04-04 15:54:04.988 [29255] main/101/tarantoolctl I> ready to accept requests
2017-04-04 15:54:04.988 [29255] main/101/tarantoolctl I> set 'checkpoint_interval' configuration option to 3600
2017-04-04 15:54:04.988 [29255] main/101/my_app I> Run console at unix/:/var/run/tarantool/my_app.control
2017-04-04 15:54:04.989 [29255] main/106/console/unix/:/var/ I> started
2017-04-04 15:54:04.989 [29255] main C> entering the event loop
2017-04-04 15:54:47.147 [29255] main/107/console/unix/: I> Hello for the manual readers

При включенном журналировании системный администратор должен обеспечивать своевременную ротацию журналов, чтобы избежать переполнения дискового пространства. Ротация журналов в tarantoolctl производится с помощью программы logrotate, которую необходимо установить заранее.

Файл /etc/logrotate.d/tarantool поставляется со стандартным дистрибутивом Tarantool. Его можно редактировать для изменения поведения по умолчанию. Содержимое файла обычно выглядит так:

/var/log/tarantool/*.log {
    daily
    size 512k
    missingok
    rotate 10
    compress
    delaycompress
    create 0640 tarantool adm
    postrotate
        /usr/bin/tarantoolctl logrotate `basename ${1%%.*}`
    endscript
}

Если вы используете другую программу для ротации журналов, можно вызвать команду tarantoolctl logrotate, чтобы экземпляры переоткрыли свои файлы журнала после того, как выбранная вами программа переместила их.

Примечание

Tarantool может писать события в файл журнала, syslog или программу, указанную в конфигурационном файле (см. параметр log).

По умолчанию запись производится в файл журнала, как указано в исходных настройках tarantoolctl. Скрипт tarantoolctl автоматически определяет, когда экземпляр использует для журналирования syslog или внешнюю программу, и не изменяет то, куда ведется запись. В таких случаях ротацию журналов обычно выполняет та же программа, которая используется для журналирования. Именно поэтому команда tarantoolctl logrotate сработает только в том случае, если в файле экземпляра включена возможность вести запись в файл.

Безопасность

Tarantool разрешает два типа подключений:

  • Используя функцию console.listen() из модуля console, можно настроить порт для подключения к серверной административной консоли. Этот вариант для администраторов, которым необходимо подключиться к работающему экземпляру и послать некоторые запросы. tarantoolctl вызывает console.listen(), чтобы создать управляющий сокет для каждого запущенного экземпляра.
  • Используя параметр box.cfg{listen=…} из модуля box, можно настроить бинарный порт для соединений, которые читают и пишут в базу данных или вызывают хранимые процедуры.

Если вы подключены к административной консоли:

  • Клиент-серверный протокол - это простой текст.
  • Пароль не требуется.
  • Пользователь автоматически получает права администратора.
  • Каждая команда напрямую обрабатывается встроенным интерпретатором Lua.

Поэтому порты для административной консоли следует настраивать очень осторожно. Если это TCP-порт, он должен быть открыть только для определенного IP-адреса. В идеале вместо TCP-порта лучше настроить доменный Unix-сокет, который требует наличие прав доступа к серверной машине. Тогда типичная настройка порта для административной консоли будет выглядеть следующим образом:

console.listen('/var/lib/tarantool/socket_name.sock')

а типичный URI для соединения будет таким:

/var/lib/tarantool/socket_name.sock

если у приемника событий есть права на запись в /var/lib/tarantool и у коннектора есть права на чтение из /var/lib/tarantool. Еще один способ подключиться к административной консоли экземпляра, запущенного с помощью tarantoolctl, - использовать tarantoolctl enter.

Выяснить, является ли некоторый TCP-порт портом для административной консоли, можно с помощью telnet. Например:

$ telnet 0 3303
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.
Tarantool 1.7.3 (Lua console)
type 'help' for interactive help

В этом примере в ответе от сервера нет слова «binary» и есть слова «Lua console». Это значит, что мы успешно подключились к порту для административной консоли и можем вводить администраторские запросы на этом терминале.

Если вы подключены к бинарному порту:

  • Клиент-серверный протокол - бинарный.
  • Автоматически выбирается пользователь „guest“.
  • Для смены пользователя необходимо пройти аутентификацию.

Для удобства использования команда tarantoolctl connect автоматически определяет тип подключения при установке соединения и использует команду бинарного протокола EVAL для выполнения Lua-команд по бинарному подключению. Чтобы выполнить команду EVAL, аутентифицированный пользователь должен иметь глобальные «EXECUTE»-права.

Поэтому при невозможности подключиться к машине по ssh системный администратор может получить удаленный доступ к экземпляру, создав пользователя Tarantool с глобальными «EXECUTE»-правами и непустым паролем.

Просмотр состояния сервера

Использование Tarantool’а в качестве клиента

Tarantool входит в интерактивный режим, если:

Tarantool выводит приглашение командной строки (например, «tarantool>») - и вы можете посылать запросы. Если использовать Tarantool таким образом, он может выступать клиентом для удаленного сервера, см. простые примеры в Руководстве для начинающих.

Скрипт tarantoolctl использует интерактивный режим для реализации команд «enter» и «connect».

Выполнение кода на экземпляре Tarantool’а

You can attach to an instance’s admin console and execute some Lua code using tarantoolctl:

$ # для локальных экземпляров:
$ tarantoolctl enter my_app
/bin/tarantoolctl: Found my_app.lua in /etc/tarantool/instances.available
/bin/tarantoolctl: Connecting to /var/run/tarantool/my_app.control
/bin/tarantoolctl: connected to unix/:/var/run/tarantool/my_app.control
unix/:/var/run/tarantool/my_app.control> 1 + 1
---
- 2
...
unix/:/var/run/tarantool/my_app.control>

$ # для локальных и удаленных экземпляров:
$ tarantoolctl connect username:password@127.0.0.1:3306

Можно также использовать tarantoolctl для выполнения Lua-кода на запущенном экземпляре Tarantool-сервера, не подключаясь к его административной консоли. Например:

# выполнение команд напрямую из командной строки
$ <комманда> | tarantoolctl eval my_app
<...>
$ # - ИЛИ -
# выполнение команд из скрипта
$ tarantoolctl eval my_app script.lua
<...>

Примечание

Еще можно использовать модули console и net.box из Tarantool-сервера. Также вы можете писать свои клиентские программы с использованием любого из доступных коннекторов. Однако большинство примеров в данном документе использует или tarantoolctl connect, или Tarantool-сервер как клиент.

Проверка состояния экземпляра

Чтобы проверить статус экземпляра Tarantool-сервера, выполните команду:

$ tarantoolctl status my_app
my_app is running (pid: /var/run/tarantool/my_app.pid)
$ # - ИЛИ -
$ systemctl status tarantool@my_app
tarantool@my_app.service - Tarantool Database Server
Loaded: loaded (/etc/systemd/system/tarantool@.service; disabled; vendor preset: disabled)
Active: active (running)
Docs: man:tarantool(1)
Process: 5346 ExecStart=/usr/bin/tarantoolctl start %I (code=exited, status=0/SUCCESS)
Main PID: 5350 (tarantool)
Tasks: 11 (limit: 512)
CGroup: /system.slice/system-tarantool.slice/tarantool@my_app.service
+ 5350 tarantool my_app.lua <running>

Если вы используете систему, на которой доступна утилита systemd, выполните следующую команду для проверки содержимого журнала загрузки:

$ journalctl -u tarantool@my_app -n 5
-- Logs begin at Fri 2016-01-08 12:21:53 MSK, end at Thu 2016-01-21 21:17:47 MSK. --
Jan 21 21:17:47 localhost.localdomain systemd[1]: Stopped Tarantool Database Server.
Jan 21 21:17:47 localhost.localdomain systemd[1]: Starting Tarantool Database Server...
Jan 21 21:17:47 localhost.localdomain tarantoolctl[5969]: /usr/bin/tarantoolctl: Found my_app.lua in /etc/tarantool/instances.available
Jan 21 21:17:47 localhost.localdomain tarantoolctl[5969]: /usr/bin/tarantoolctl: Starting instance...
Jan 21 21:17:47 localhost.localdomain systemd[1]: Started Tarantool Database Server

Более подробная информация содержится в отчетах, которые можно получить с помощью функций из следующих подмодулей:

  • box.cfg - проверка и указание всех конфигурационных параметров Tarantool-сервера,
  • box.slab - мониторинг использования и фрагментированности памяти, выделенной для хранение данных в Tarantool’е,
  • box.info - просмотр переменных Tarantool-сервера - в первую очередь тех, что относятся к репликации,
  • box.stat - просмотр статистики Tarantool’а по запросам и использованию сети,

Можно также попробовать воспользоваться Lua-модулем tarantool/prometheus, который облегчает сбор метрик (например, использование памяти или количество запросов) с Tarantool-приложений и баз данных и их публикацию через протокол Prometheus.

Пример

Очень часто администраторам приходится вызывать функцию box.slab.info(), которая показывает подробную статистику по использованию памяти для конкретного экземпляра Tarantool’а.

tarantool> box.slab.info()
---
- items_size: 228128
  items_used_ratio: 1.8%
  quota_size: 1073741824
  quota_used_ratio: 0.8%
  arena_used_ratio: 43.2%
  items_used: 4208
  quota_used: 8388608
  arena_size: 2325176
  arena_used: 1003632
...

Профилирование производительности

Иногда Tarantool может работать медленнее, чем обычно. Причин такого поведения может быть несколько: проблемы с диском, Lua-скрипты, активно использующие процессор, или неправильная настройка. В таких случаях в журнале Tarantool’а могут отсутствовать необходимые подробности, поэтому единственным признаком неправильного поведения является наличие в журнале записей вида W> too long DELETE: 8.546 sec. Ниже приведены инструменты и приемы, которые облегчают снятие профиля производительности Tarantool’а. Эта процедура может помочь при решении проблем с замедлением.

Примечание

Большинство инструментов, за исключением fiber.info(), предназначено для дистрибутивов GNU/Linux, но не для FreeBSD или Mac OS.

fiber.info()

Самый простой способ профилирования - это использование встроенного функционала Tarantool’а. fiber.info() возвращает информацию обо всех работающих файберах с соответствующей трассировкой стека для языка C. Эти данные показывают, сколько файберов запущенно на данный момент и какие функции, написанные на C, вызываются чаще остальных.

Сначала войдите в интерактивную административную консоль вашего экземпляра Tarantool’а:

$ tarantoolctl enter NAME

После этого загрузите модуль fiber:

tarantool> fiber = require('fiber')

Теперь можно получить необходимую информацию с помощью fiber.info().

На этом шаге в вашей консоли должно выводиться следующее:

tarantool> fiber = require('fiber')
---
...
tarantool> fiber.info()
---
- 360:
    csw: 2098165
    backtrace:
    - '#0 0x4d1b77 in wal_write(journal*, journal_entry*)+487'
    - '#1 0x4bbf68 in txn_commit(txn*)+152'
    - '#2 0x4bd5d8 in process_rw(request*, space*, tuple**)+136'
    - '#3 0x4bed48 in box_process1+104'
    - '#4 0x4d72f8 in lbox_replace+120'
    - '#5 0x50f317 in lj_BC_FUNCC+52'
    fid: 360
    memory:
      total: 61744
      used: 480
    name: main
  129:
    csw: 113
    backtrace: []
    fid: 129
    memory:
      total: 57648
      used: 0
    name: 'console/unix/:'
...

Мы рекомендуем присваивать создаваемым файберам понятные имена, чтобы их можно было легко найти в списке, выводимом fiber.info(). В примере ниже создается файбер с именем myworker:

tarantool> fiber = require('fiber')
---
...
tarantool> f = fiber.create(function() while true do fiber.sleep(0.5) end end)
---
...
tarantool> f:name('myworker') <!-- assigning the name to a fiber
---
...
tarantool> fiber.info()
---
- 102:
    csw: 14
    backtrace:
    - '#0 0x501a1a in fiber_yield_timeout+90'
    - '#1 0x4f2008 in lbox_fiber_sleep+72'
    - '#2 0x5112a7 in lj_BC_FUNCC+52'
    fid: 102
    memory:
      total: 57656
      used: 0
    name: myworker <!-- newly created background fiber
  101:
    csw: 284
    backtrace: []
    fid: 101
    memory:
      total: 57656
      used: 0
    name: interactive
...

Для принудительного завершения файбера используется команда fiber.kill(fid):

tarantool> fiber.kill(102)
---
...
tarantool> fiber.info()
---
- 101:
    csw: 324
    backtrace: []
    fid: 101
    memory:
      total: 57656
      used: 0
    name: interactive
...

Если вам необходимо динамически получать информацию с помощью fiber.info(), вам может пригодиться приведенный ниже скрипт. Он каждые полсекунды подключается к экземпляру Tarantool’а, указанному в переменной NAME, выполняет команду fiber.info() и записывает ее выход в файл fiber-info.txt:

$ rm -f fiber.info.txt
$ watch -n 0.5 "echo 'require(\"fiber\").info()' | tarantoolctl enter NAME | tee -a fiber-info.txt"

Если вы не можете самостоятельно разобраться, какой именно файбер вызывает проблемы с производительностью, запустите данный скрипт на 10-15 секунд и пришлите получившийся файл команде Tarantool’а на адрес support@tarantool.org.

Простейшие профилировщики

pstack <pid>

Чтобы использовать этот инструмент, его необходимо установить с помощью пакетного менеджера, поставляемого с вашим дистрибутивом Linux. Данная команда выводит трассировку стека выполнения для работающего процесса с соответствующим PID. При необходимости команду можно запустить несколько раз, чтобы выявить узкое место, которое вызывает падение производительности.

После установки воспользуйтесь следующей командой:

$ pstack $(pidof tarantool INSTANCENAME.lua)

Затем выполните:

$ echo $(pidof tarantool INSTANCENAME.lua)

чтобы вывести на экран PID экземпляра Tarantool’а, использующего файл INSTANCENAME.lua.

В вашей консоли должно отображаться приблизительно следующее:

Thread 19 (Thread 0x7f09d1bff700 (LWP 24173)):
#0 0x00007f0a1a5423f2 in ?? () from /lib64/libgomp.so.1
#1 0x00007f0a1a53fdc0 in ?? () from /lib64/libgomp.so.1
#2 0x00007f0a1ad5adc5 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f0a1a050ced in clone () from /lib64/libc.so.6
Thread 18 (Thread 0x7f09d13fe700 (LWP 24174)):
#0 0x00007f0a1a5423f2 in ?? () from /lib64/libgomp.so.1
#1 0x00007f0a1a53fdc0 in ?? () from /lib64/libgomp.so.1
#2 0x00007f0a1ad5adc5 in start_thread () from /lib64/libpthread.so.0
#3 0x00007f0a1a050ced in clone () from /lib64/libc.so.6
<...>
Thread 2 (Thread 0x7f09c8bfe700 (LWP 24191)):
#0 0x00007f0a1ad5e6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x000000000045d901 in wal_writer_pop(wal_writer*) ()
#2 0x000000000045db01 in wal_writer_f(__va_list_tag*) ()
#3 0x0000000000429abc in fiber_cxx_invoke(int (*)(__va_list_tag*), __va_list_tag*) ()
#4 0x00000000004b52a0 in fiber_loop ()
#5 0x00000000006099cf in coro_init ()
Thread 1 (Thread 0x7f0a1c47fd80 (LWP 24172)):
#0 0x00007f0a1a0512c3 in epoll_wait () from /lib64/libc.so.6
#1 0x00000000006051c8 in epoll_poll ()
#2 0x0000000000607533 in ev_run ()
#3 0x0000000000428e13 in main ()

gdb -ex «bt» -p <pid>

Как и в случае с pstack, перед использованием GNU-отладчик (также известный как gdb) необходимо сначала установить через пакетный менеджер, встроенный в ваш дистрибутив Linux.

После установки воспользуйтесь следующей командой:

$ gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p $(pidof tarantool INSTANCENAME.lua)

Затем выполните:

$ echo $(pidof tarantool INSTANCENAME.lua)

чтобы вывести на экран PID экземпляра Tarantool’а, использующего файл INSTANCENAME.lua.

После использования отладчика в консоль должна выводиться следующая информация:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

[CUT]

Thread 1 (Thread 0x7f72289ba940 (LWP 20535)):
#0 _int_malloc (av=av@entry=0x7f7226e0eb20 <main_arena>, bytes=bytes@entry=504) at malloc.c:3697
#1 0x00007f7226acf21a in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3234
#2 0x00000000004631f8 in vy_merge_iterator_reserve (capacity=3, itr=0x7f72264af9e0) at /usr/src/tarantool/src/box/vinyl.c:7629
#3 vy_merge_iterator_add (itr=itr@entry=0x7f72264af9e0, is_mutable=is_mutable@entry=true, belong_range=belong_range@entry=false) at /usr/src/tarantool/src/box/vinyl.c:7660
#4 0x00000000004703df in vy_read_iterator_add_mem (itr=0x7f72264af990) at /usr/src/tarantool/src/box/vinyl.c:8387
#5 vy_read_iterator_use_range (itr=0x7f72264af990) at /usr/src/tarantool/src/box/vinyl.c:8453
#6 0x000000000047657d in vy_read_iterator_start (itr=<optimized out>) at /usr/src/tarantool/src/box/vinyl.c:8501
#7 0x00000000004766b5 in vy_read_iterator_next (itr=itr@entry=0x7f72264af990, result=result@entry=0x7f72264afad8) at /usr/src/tarantool/src/box/vinyl.c:8592
#8 0x000000000047689d in vy_index_get (tx=tx@entry=0x7f7226468158, index=index@entry=0x2563860, key=<optimized out>, part_count=<optimized out>, result=result@entry=0x7f72264afad8) at /usr/src/tarantool/src/box/vinyl.c:5705
#9 0x0000000000477601 in vy_replace_impl (request=<optimized out>, request=<optimized out>, stmt=0x7f72265a7150, space=0x2567ea0, tx=0x7f7226468158) at /usr/src/tarantool/src/box/vinyl.c:5920
#10 vy_replace (tx=0x7f7226468158, stmt=stmt@entry=0x7f72265a7150, space=0x2567ea0, request=<optimized out>) at /usr/src/tarantool/src/box/vinyl.c:6608
#11 0x00000000004615a9 in VinylSpace::executeReplace (this=<optimized out>, txn=<optimized out>, space=<optimized out>, request=<optimized out>) at /usr/src/tarantool/src/box/vinyl_space.cc:108
#12 0x00000000004bd723 in process_rw (request=request@entry=0x7f72265a70f8, space=space@entry=0x2567ea0, result=result@entry=0x7f72264afbc8) at /usr/src/tarantool/src/box/box.cc:182
#13 0x00000000004bed48 in box_process1 (request=0x7f72265a70f8, result=result@entry=0x7f72264afbc8) at /usr/src/tarantool/src/box/box.cc:700
#14 0x00000000004bf389 in box_replace (space_id=space_id@entry=513, tuple=<optimized out>, tuple_end=<optimized out>, result=result@entry=0x7f72264afbc8) at /usr/src/tarantool/src/box/box.cc:754
#15 0x00000000004d72f8 in lbox_replace (L=0x413c5780) at /usr/src/tarantool/src/box/lua/index.c:72
#16 0x000000000050f317 in lj_BC_FUNCC ()
#17 0x00000000004d37c7 in execute_lua_call (L=0x413c5780) at /usr/src/tarantool/src/box/lua/call.c:282
#18 0x000000000050f317 in lj_BC_FUNCC ()
#19 0x0000000000529c7b in lua_cpcall ()
#20 0x00000000004f6aa3 in luaT_cpcall (L=L@entry=0x413c5780, func=func@entry=0x4d36d0 <execute_lua_call>, ud=ud@entry=0x7f72264afde0) at /usr/src/tarantool/src/lua/utils.c:962
#21 0x00000000004d3fe7 in box_process_lua (handler=0x4d36d0 <execute_lua_call>, out=out@entry=0x7f7213020600, request=request@entry=0x413c5780) at /usr/src/tarantool/src/box/lua/call.c:382
#22 box_lua_call (request=request@entry=0x7f72130401d8, out=out@entry=0x7f7213020600) at /usr/src/tarantool/src/box/lua/call.c:405
#23 0x00000000004c0f27 in box_process_call (request=request@entry=0x7f72130401d8, out=out@entry=0x7f7213020600) at /usr/src/tarantool/src/box/box.cc:1074
#24 0x000000000041326c in tx_process_misc (m=0x7f7213040170) at /usr/src/tarantool/src/box/iproto.cc:942
#25 0x0000000000504554 in cmsg_deliver (msg=0x7f7213040170) at /usr/src/tarantool/src/cbus.c:302
#26 0x0000000000504c2e in fiber_pool_f (ap=<error reading variable: value has been optimized out>) at /usr/src/tarantool/src/fiber_pool.c:64
#27 0x000000000041122c in fiber_cxx_invoke(fiber_func, typedef __va_list_tag __va_list_tag *) (f=<optimized out>, ap=<optimized out>) at /usr/src/tarantool/src/fiber.h:645
#28 0x00000000005011a0 in fiber_loop (data=<optimized out>) at /usr/src/tarantool/src/fiber.c:641
#29 0x0000000000688fbf in coro_init () at /usr/src/tarantool/third_party/coro/coro.c:110

Запустите отладчик в цикле, чтобы собрать достаточно информации, которая поможет установить причину спада производительности Tarantool’а. Можно воспользоваться следующим скриптом:

$ rm -f stack-trace.txt
$ watch -n 0.5 "gdb -ex 'set pagination 0' -ex 'thread apply all bt' --batch -p $(pidof tarantool INSTANCENAME.lua) | tee -a stack-trace.txt"

С точки зрения структуры и функциональности, этот скрипт идентичен тому, что используется выше с fiber.info().

Если вам не удается отыскать причину пониженной производительности, запустите данный скрипт на 10-15 секунд и пришлите получившийся файл stack-trace.txt команде Tarantool’а на адрес support@tarantool.org.

Предупреждение

Следует использовать pstack и gdb с осторожностью: каждый раз, подключаясь с работающему процессу, они приостанавливают выполнение этого процесса приблизительно на одну секунду, что может иметь серьезные последствия для высоконагруженных сервисов.

gperftools

Чтобы использовать профилировщик процессора из набора Google Performance Tools с Tarantool’ом, необходимо сначала установить зависимости:

  • Если вы используете Debian/Ubuntu, запустите эту команду:
$ apt-get install libgoogle-perftools4
  • Если вы используете RHEL/CentOS/Fedora, запустите эту команду:
$ yum install gperftools-libs

После этого установите привязки для Lua:

$ tarantoolctl rocks install gperftools

После окончания установки войдите в интерактивную административную консоль вашего экземпляра Tarantool’а:

$ tarantoolctl enter NAME

Для запуска профилировщика выполните следующий код:

tarantool> cpuprof = require('gperftools.cpu')
tarantool> cpuprof.start('/home/<имя_пользователя>/tarantool-on-production.prof')

На сбор метрик производительности у профилировщика уходит по крайней мере пара минут. По истечении этого времени можно сохранять информацию на диск (неограниченное количество раз):

tarantool> cpuprof.flush()

Для остановки профилировщика выполните следующую команду:

tarantool> cpuprof.stop()

Теперь можно проанализировать собранные данные с помощью утилиты pprof, которая входит в пакет gperftools:

$ pprof --text /usr/bin/tarantool /home/<имя_пользователя>/tarantool-on-production.prof

Примечание

В дистрибутивах Debian/Ubuntu утилита pprof называется google-pprof.

В консоль должно выводиться приблизительно следующее:

Total: 598 samples
      83 13.9% 13.9% 83 13.9% epoll_wait
      54 9.0% 22.9% 102 17.1%
vy_mem_tree_insert.constprop.35
      32 5.4% 28.3% 34 5.7% __write_nocancel
      28 4.7% 32.9% 42 7.0% vy_mem_iterator_start_from
      26 4.3% 37.3% 26 4.3% _IO_str_seekoff
      21 3.5% 40.8% 21 3.5% tuple_compare_field
      19 3.2% 44.0% 19 3.2%
::TupleCompareWithKey::compare
      19 3.2% 47.2% 38 6.4% tuple_compare_slowpath
      12 2.0% 49.2% 23 3.8% __libc_calloc
       9 1.5% 50.7% 9 1.5%
::TupleCompare::compare@42efc0
       9 1.5% 52.2% 9 1.5% vy_cache_on_write
       9 1.5% 53.7% 57 9.5% vy_merge_iterator_next_key
       8 1.3% 55.0% 8 1.3% __nss_passwd_lookup
       6 1.0% 56.0% 25 4.2% gc_onestep
       6 1.0% 57.0% 6 1.0% lj_tab_next
       5 0.8% 57.9% 5 0.8% lj_alloc_malloc
       5 0.8% 58.7% 131 21.9% vy_prepare
perf

Этот инструмент для мониторинга и анализа производительности устанавливается отдельно с помощью пакетного менеджера. Попробуйте ввести в окне консоли команду perf и следуйте подсказкам, чтобы установить необходимые пакеты.

Примечание

По умолчанию некоторые команды из пакета perf можно выполнять только с root-правами, поэтому необходимо либо зайти в систему из-под пользователя root, либо добавлять перед каждой командой sudo.

Чтобы начать сбор показателей производительности, выполните следующую команду:

$ perf record -g -p $(pidof tarantool INSTANCENAME.lua)

Эта команда сохраняет собранные данные в файл perf.data, который находится в текущей рабочей папке. Для остановки процесса (обычно через 10-15 секунд) нажмите ctrl+C. В консоли должно появиться следующее:

^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.225 MB perf.data (1573 samples) ]

Затем выполните эту команду:

$ perf report -n -g --stdio | tee perf-report.txt

Она превращает содержащиеся в perf.data статистические данные в отчет о производительности, который сохраняется в файл perf-report.txt.

Получившийся отчет выглядит следующим образом:

# Samples: 14K of event 'cycles'
# Event count (approx.): 9927346847
#
# Children Self Samples Command Shared Object Symbol
# ........ ........ ............ ......... .................. .......................................
#
    35.50% 0.55% 79 tarantool tarantool [.] lj_gc_step
            |
             --34.95%--lj_gc_step
                       |
                       |--29.26%--gc_onestep
                       | |
                       | |--13.85%--gc_sweep
                       | | |
                       | | |--5.59%--lj_alloc_free
                       | | |
                       | | |--1.33%--lj_tab_free
                       | | | |
                       | | | --1.01%--lj_alloc_free
                       | | |
                       | | --1.17%--lj_cdata_free
                       | |
                       | |--5.41%--gc_finalize
                       | | |
                       | | |--1.06%--lj_obj_equal
                       | | |
                       | | --0.95%--lj_tab_set
                       | |
                       | |--4.97%--rehashtab
                       | | |
                       | | --3.65%--lj_tab_resize
                       | | |
                       | | |--0.74%--lj_tab_set
                       | | |
                       | | --0.72%--lj_tab_newkey
                       | |
                       | |--0.91%--propagatemark
                       | |
                       | --0.67%--lj_cdata_free
                       |
                        --5.43%--propagatemark
                                  |
                                   --0.73%--gc_mark

Инструменты gperftools и perf отличаются от pstack и gdb низкими накладными расходами (пренебрежимо малыми по сравнению с pstack и gdb): они подключаются к работающим процессам без больших задержек, а потому могут использоваться без серьезных последствий.

Контроль за фоновыми программами

Сигналы от сервера

Tarantool processes these signals during the event loop in the transaction processor thread:

Сигнал Эффект
SIGHUP Может привести к ротации журналов, см. пример в справочнике по параметрам журналирования Tarantool’а.
SIGUSR1 Может привести к созданию снимка состояния базы данных, см. описание функции box.snapshot.
SIGTERM Может привести к корректному завершению работы (с предварительным сохранением всех данных).
SIGINT (или «прерывание от клавиатуры») Может привести к корректному завершению работы.
SIGKILL Приводит к аварийному завершению работы.

Other signals will result in behavior defined by the operating system. Signals other than SIGKILL may be ignored, especially if Tarantool is executing a long-running procedure which prevents return to the event loop in the transaction processor thread.

Автоматическая перезагрузка экземпляра

На платформах, где доступна утилита systemd, systemd автоматически перезагружает все экземпляры Tarantool’а при сбое. Чтобы продемонстрировать это, отключим один из экземпляров:

$ systemctl status tarantool@my_app|grep PID
Main PID: 5885 (tarantool)
$ tarantoolctl enter my_app
/bin/tarantoolctl: Found my_app.lua in /etc/tarantool/instances.available
/bin/tarantoolctl: Connecting to /var/run/tarantool/my_app.control
/bin/tarantoolctl: connected to unix/:/var/run/tarantool/my_app.control
unix/:/var/run/tarantool/my_app.control> os.exit(-1)
/bin/tarantoolctl: unix/:/var/run/tarantool/my_app.control: Remote host closed connection

А теперь убедимся, что systemd перезапустила его:

$ systemctl status tarantool@my_app|grep PID
Main PID: 5914 (tarantool)

И под конец проверим содержимое журнала загрузки:

$ journalctl -u tarantool@my_app -n 8
-- Logs begin at Fri 2016-01-08 12:21:53 MSK, end at Thu 2016-01-21 21:09:45 MSK. --
Jan 21 21:09:45 localhost.localdomain systemd[1]: tarantool@my_app.service: Unit entered failed state.
Jan 21 21:09:45 localhost.localdomain systemd[1]: tarantool@my_app.service: Failed with result 'exit-code'.
Jan 21 21:09:45 localhost.localdomain systemd[1]: tarantool@my_app.service: Service hold-off time over, scheduling restart.
Jan 21 21:09:45 localhost.localdomain systemd[1]: Stopped Tarantool Database Server.
Jan 21 21:09:45 localhost.localdomain systemd[1]: Starting Tarantool Database Server...
Jan 21 21:09:45 localhost.localdomain tarantoolctl[5910]: /usr/bin/tarantoolctl: Found my_app.lua in /etc/tarantool/instances.available
Jan 21 21:09:45 localhost.localdomain tarantoolctl[5910]: /usr/bin/tarantoolctl: Starting instance...
Jan 21 21:09:45 localhost.localdomain systemd[1]: Started Tarantool Database Server.

Создание дампов памяти

Tarantool создает дамп памяти при получении одного из следующих сигналов: SIGSEGV, SIGFPE, SIGABRT или SIGQUIT. При сбое Tarantool’а дамп создается автоматически.

На платформах, где доступна утилита systemd, coredumpctl автоматически сохраняет дампы памяти и трассировку стека при аварийном завершении Tarantool-сервера. Вот как включить создание дампов памяти в Unix-системе:

  1. Убедитесь, что лимиты для сессии установлены таким образом, чтобы можно было создавать дампы памяти, - выполните команду ulimit -c unlimited. Также проверьте «man 5 core» на другие причины, по которым дамп памяти может не создаваться.
  2. Создайте директорию для записи дампов памяти и убедитесь, что в эту директорию действительно можно производить запись. На Linux путь до директории задается в параметре ядра, который настраивается через /proc/sys/kernel/core_pattern.
  3. Убедитесь, что дампы памяти включают трассировку стека. При использовании бинарного дистрибутива Tarantool’а эта информация включается автоматически. При сборке Tarantool’а из исходников, если передать CMake флаг -DCMAKE_BUILD_TYPE=Release, вы не получите подробной информации.

Для симуляции сбоя можно попытаться выполнить нелегальную команду на работающем экземпляре Tarantool’а:

$ # !!! пожалуйста, никогда не делайте этого на боевом сервере !!!
$ tarantoolctl enter my_app
unix/:/var/run/tarantool/my_app.control> require('ffi').cast('char *', 0)[0] = 48
/bin/tarantoolctl: unix/:/var/run/tarantool/my_app.control: Remote host closed connection

Есть другой способ: если вы знаете PID экземпляра ($PID в нашем примере), можно остановить этот экземпляр, запустив отладчик gdb:

$ gdb -batch -ex "generate-core-file" -p $PID

или послав вручную сигнал SIGABRT:

$ kill -SIGABRT $PID

Примечание

Чтобы узнать PID экземпляра, можно:

  • посмотреть его с помощью box.info.pid,
  • использовать команду ps -A | grep tarantool, или
  • выполнить systemctl status tarantool@my_app|grep PID.

Чтобы посмотреть на последние сбои Tarantool-демона на платформах, где доступна утилита systemd, выполните команду:

$ coredumpctl list /usr/bin/tarantool
MTIME                            PID   UID   GID SIG PRESENT EXE
Sat 2016-01-23 15:21:24 MSK   20681  1000  1000   6   /usr/bin/tarantool
Sat 2016-01-23 15:51:56 MSK   21035   995   992   6   /usr/bin/tarantool

Чтобы сохранить дамп памяти в файл, выполните команду:

$ coredumpctl -o filename.core info <pid>

Трассировка стека

Так как Tarantool хранит кортежи в памяти, файлы с дампами памяти могут быть довольно большими. Чтобы найти проблему, обычно целый файл не нужен - достаточно только «трассировки стека» или «обратной трассировки».

Чтобы сохранить трассировку стека в файл, выполните команду:

$ gdb -se "tarantool" -ex "bt full" -ex "thread apply all bt" --batch -c core> /tmp/tarantool_trace.txt

где:

  • «tarantool» - это путь до исполняемого файла Tarantool’а,
  • «core» - это путь до файла с дампом памяти, и
  • «/tmp/tarantool_trace.txt» - это пример пути до файла, в который сохраняется трассировка стека.

Примечание

Иногда может оказаться, что файл с трассировкой стека не содержит отладочных символов - в таких строках вместо имени будет стоять ”??”. Если это произошло, ознакомьтесь с инструкциями на этих двух wiki-страницах Tarantool’а: How to debug core dump of stripped tarantool и How to debug core from different OS.

Чтобы получить трассировку стека и прочую полезную информацию в консоли, выполните команду:

$ coredumpctl info 21035
          PID: 21035 (tarantool)
          UID: 995 (tarantool)
          GID: 992 (tarantool)
       Signal: 6 (ABRT)
    Timestamp: Sat 2016-01-23 15:51:42 MSK (4h 36min ago)
 Command Line: tarantool my_app.lua <running>
   Executable: /usr/bin/tarantool
Control Group: /system.slice/system-tarantool.slice/tarantool@my_app.service
         Unit: tarantool@my_app.service
        Slice: system-tarantool.slice
      Boot ID: 7c686e2ef4dc4e3ea59122757e3067e2
   Machine ID: a4a878729c654c7093dc6693f6a8e5ee
     Hostname: localhost.localdomain
      Message: Process 21035 (tarantool) of user 995 dumped core.

               Stack trace of thread 21035:
               #0  0x00007f84993aa618 raise (libc.so.6)
               #1  0x00007f84993ac21a abort (libc.so.6)
               #2  0x0000560d0a9e9233 _ZL12sig_fatal_cbi (tarantool)
               #3  0x00007f849a211220 __restore_rt (libpthread.so.0)
               #4  0x0000560d0aaa5d9d lj_cconv_ct_ct (tarantool)
               #5  0x0000560d0aaa687f lj_cconv_ct_tv (tarantool)
               #6  0x0000560d0aaabe33 lj_cf_ffi_meta___newindex (tarantool)
               #7  0x0000560d0aaae2f7 lj_BC_FUNCC (tarantool)
               #8  0x0000560d0aa9aabd lua_pcall (tarantool)
               #9  0x0000560d0aa71400 lbox_call (tarantool)
               #10 0x0000560d0aa6ce36 lua_fiber_run_f (tarantool)
               #11 0x0000560d0a9e8d0c _ZL16fiber_cxx_invokePFiP13__va_list_tagES0_ (tarantool)
               #12 0x0000560d0aa7b255 fiber_loop (tarantool)
               #13 0x0000560d0ab38ed1 coro_init (tarantool)
               ...

Отладчик

Для запуска отладчика gdb, выполните команду:

$ coredumpctl gdb <pid>

Мы очень рекомендуем установить пакет tarantool-debuginfo, чтобы сделать отладку средствами gdb более эффективной. Например:

$ dnf debuginfo-install tarantool

С помощью gdb можно узнать, какие еще debuginfo-пакеты нужно установить:

$ # gdb -p <pid>
...
Missing separate debuginfos, use: dnf debuginfo-install
glibc-2.22.90-26.fc24.x86_64 krb5-libs-1.14-12.fc24.x86_64
libgcc-5.3.1-3.fc24.x86_64 libgomp-5.3.1-3.fc24.x86_64
libselinux-2.4-6.fc24.x86_64 libstdc++-5.3.1-3.fc24.x86_64
libyaml-0.1.6-7.fc23.x86_64 ncurses-libs-6.0-1.20150810.fc24.x86_64
openssl-libs-1.0.2e-3.fc24.x86_64

В трассировке стека присутствуют символические имена, даже если у вас не установлен пакет tarantool-debuginfo.

Аварийное восстановление

Минимальная отказоустойчивая конфигурация Tarantool’а - это репликационный кластер, содержащий мастер и реплику или два мастера.

Основная рекомендация - настраивать все экземпляры Tarantool’а в кластере таким образом, чтобы они регулярно создавали файлы-снимки.

Ниже дано несколько инструкций для типовых аварийных сценариев.

Master-replica

Конфигурация: один мастер и одна реплика.

Проблема: мастер вышел из строя.

План действий:

  1. Убедитесь, что мастер полностью остановлен. Например, подключитесь к мастеру и используйте команду systemctl stop tarantool@<имя_экземпляра>.
  2. Переключите реплику в режим мастера, установив параметру box.cfg.read_only значение false. Теперь вся нагрузка пойдет только на реплику (по сути ставшую мастером).
  3. Настройте на свободной машине замену вышедшему из строя мастеру, установив параметру replication в качестве значения URI реплики (которая в данный момент выполняет роль мастера), чтобы новая реплика начала синхронизироваться с текущим мастером. Значение параметра box.cfg.read_only в новом экземпляре должно быть установлено на true.

Все немногочисленные транзакции в WAL-файле мастера, которые он не успел передать реплике до выхода из строя, будут потеряны. Однако если удастся получить .xlog-файл мастера, их можно будет восстановить. Для этого:

  1. Узнайте позицию вышедшего из строя мастера - эта информация доступна из нового мастера.

    1. Посмотрите UUID экземпляра в .:ref:xlog-файле <internals-wal> вышедшего из строя мастера:

      $ head -5 *.xlog | grep Instance
      Instance: ed607cad-8b6d-48d8-ba0b-dae371b79155
      
    2. Используйте этот UUID на новом мастере для поиска позиции:

      tarantool>box.info.vclock[box.space._cluster.index.uuid:select{'ed607cad-8b6d-48d8-ba0b-dae371b79155'}[1][1]]
      ---
      - 23425
      <...>
      
  2. Запишите транзакции из .xlog-файла вышедшего из строя мастера в новый мастер, начиная с позиции нового мастера:

    1. Локально выполните эту команду на новом мастере, чтобы узнать его ID экземпляра:

      tarantool> box.space._cluster:select{}
      ---
      - - [1, '88580b5c-4474-43ab-bd2b-2409a9af80d2']
      ...
      
    2. Запишите транзакции в новый мастер:

      $ tarantoolctl <uri_нового_мастера> <xlog_файл> play --from-lsn 23425 --replica 1
      

Master-master

Конфигурация: два мастера.

Проблема: мастер #1 вышел из строя.

План действий:

  1. Пусть вся нагрузка идет только на мастер #2 (действующий мастер).

2. Follow the same steps as in the master-replica recovery scenario to create a new master and salvage lost data.

Потеря данных

Конфигурация: master-master или master-replica.

Проблема: данные были удалены на одном мастере, а затем эти изменения реплицировались на другом узле (мастере или реплике).

Эта инструкция применима только для данных, хранящихся на движке memtx. План действий:

  1. Переключите все узлы в режим read-only и отключите командой box.backup.begin() создание контрольных точек. Последнее действие необходимо, чтобы сборщик мусора автоматически не удалил более старые контрольные точки.
  2. Возьмите последний корректный .snap-файл и, используя команду tarantoolctl cat, выясните, на каком именно lsn произошла потеря данных.
  3. Запустите новый экземпляр (экземпляр #1) и с помощью команды tarantoolctl play скопируйте в него содержимое .snap/.xlog-файлов вплоть до вычисленного lsn.
  4. Настройте новую реплику с помощью восстановленного мастера (экземпляра #1).

Резервное копирование

Архитектура Tarantool-хранилища позволяет производить только запись в конец файлов: сами файлы никогда не перезаписываются. Старые файлы удаляются сборщиком мусора после определенной контрольной точки. В настройках файбера, отвечающего за работу с контрольными точками, можно указать, какое количество предыдущих контрольных точек сборщик мусора должен оставить.

Горячее резервирование (memtx)

Это особый случай, когда все таблицы хранятся в памяти.

Последний созданный Tarantool’ом файл-снимок является резервной копией всей базы данных; а созданные следом WAL-файлы являются инкрементными копиями. Поэтому процедура резервирования сводится к копированию последнего файла-снимка и следующих за ним WAL-файлов.

  1. С помощью tar создайте (сжатую) копию последнего .snap-файла и следующих за ним .xlog-файлов из директорий memtx_dir и wal_dir.
  2. Если того требуют правила безопасности, зашифруйте получившийся .tar-файл.
  3. Скопируйте .tar-файл в надежное место.

В дальнейшем базу данных можно восстановить, разархивировав содержимое .tar-файла в директории memtx_dir и wal_dir.

Горячее резервирование (vinyl/memtx)

Vinyl хранит свои файлы в vinyl_dir и создает для каждого спейса в базе данных отдельную поддиректорию. Создание дампов и компактификация - это процессы, которые могут лишь добавлять записи в конец файла, поэтому в результате создаются новые файлы. Старые же удаляются сборщиком мусора после каждой контрольной точки.

Для создания смешанной резервной копии:

  1. Выполните команду box.backup.start()` в административной консоли. Эта команда приостановит сборку мусора до вызова box.backup.stop() и покажет список файлов для резервирования.
  2. Скопируйте файлы из списка в надежное место. Это касается файлов-снимков memtx, выполняемых vinyl-файлов и индексных файлов, соответствующих последней контрольной точке.
  3. Возобновите сборку мусора с помощью команды box.backup.stop().

Непрерывное удаленное резервирование

Репликация используется не только для резервирования, но и для выравнивания нагрузки.

Поэтому процесс создания резервной копии сводится к обновлению (при необходимости) одной из реплик с последующим холодным резервированием. Так как все остальные реплики продолжают функционировать, с точки зрения конечного пользователя, этот процесс не является холодным резервированием. Такое резервирование можно выполнять регулярно с помощью планировщика cron или файбера Tarantool’а.

Непрерывное резервирование

По ходу работы системы необходимо сохранять записи об изменениях, внесенных со времени последнего холодного резервирования.

Для этого нужна специальная утилита для копирования файлов (например, rsync), которая позволит удаленно и на постоянной основе копировать только изменившиеся части WAL-файла, а не весь файл целиком.

Можно взять и обычную утилиту для копирования целых файлов, но тогда придется создавать файлы-снимки и WAL-файлы на каждое изменение, чтобы нужно было копировать только новые файлы.

Обновление

Обновление базы данных Tarantool

Если вы создали базу данных в старой версии Tarantool’а, а потом обновили Tarantool до более свежей версии, вызовите команду box.schema.upgrade(). Она обновляет системные спейсы Tarantool’а так, чтобы они совпадали с текущей установленной версией Tarantool’а.

Например, вот что происходит, если выполнить команду box.schema.upgrade() для базы данных, созданной в Tarantool версии 1.6.4 (показана лишь малая часть выводимых сообщений):

tarantool> box.schema.upgrade()
alter index primary on _space set options to {"unique":true}, parts to [[0,"unsigned"]]
alter space _schema set options to {}
create view _vindex...
grant read access to 'public' role for _vindex view
set schema version to 1.7.0
---
...

Обновление экземпляра Tarantool’а

Tarantool поддерживает обратную совместимость между двумя последовательными версиями. Например, обновление Tarantool 1.6 до 1.7 или Tarantool 1.7 до 1.8 не должно вызвать затруднений, тогда как миграции с Tarantool 1.6 прямиком на 1.8 могут препятствовать несовместимые изменения.

Как обновить Tarantool 1.6 до 1.7

Этот процесс предназначен для обновления индивидуальных экземпляров Tarantool’а с 1.6.х до 1.7.х на боевом сервере. Обратите внимание, что это всегда приводит к некоторому простою. Для обновления без простоев необходимо, чтобы несколько работающих Tarantool-серверов были объединены в репликационный кластер (см. ниже).

Tarantool 1.7 работает с несовместимыми форматами файлов - .snap и .xlog. Файлы Tarantool’а 1.6 поддерживаются при обновлении, но после непродолжительного использования Tarantool’а 1.7 вернуться к 1.6 уже нельзя. В Tarantool’е 1.7 также были переименованы некоторые конфигурационные параметры, но старые имена параметров еще поддерживаются. Список критических изменений доступен в Release notes для Tarantool’а 1.7.

  1. Уточните у разработчиков, необходимо ли обновлять файлы приложения из-за наличия несовместимых изменений (см. Release notes для Tarantool’а 1.7). Если да, то создайте резервные копии старых файлов приложения.
  2. Остановите Tarantool-сервер.
  3. Создайте копию всех данных (см. подразделы про горячее резервное копирование в разделе Резервное копирование) и пакета, из которого была установлена текущая (старая) версия (на случай отката).
  4. Обновите Tarantool-сервер. Инструкции по установке доступны на странице загрузок Tarantool’а.
  5. Обновите базу данных Tarantool. Выполните команду box.schema.upgrade(), поместив ее внутрь функции box.once() в файле инициализации Tarantool’а. В результате на этапе запуска Tarantool создаст новые системные спейсы, обновит названия типов данных (например, num -> unsigned, str -> string) и список доступных типов данных в системных спейсах.
  6. При необходимости обновите файлы приложения.
  7. Запустите обновленный Tarantool-сервер с помощью tarantoolctl или systemctl.

Обновление Tarantool’а в репликационном кластере

Tarantool 1.7 может служить репликой для Tarantool’а 1.6 - и наоборот. При установке соединения происходит обсуждение возможностей, и новый для 1.7 репликационный функционал не используется при работе с репликами версии 1.6. Такой подход позволяет обновлять кластерные конфигурации.

Этот процесс позволяет осуществить последовательное обновление без простоев и подходит для любой конфигурации кластера: master-master или master-replica.

  1. Обновите Tarantool на всех репликах (или на любом мастере в кластере master-master). Подробные инструкции доступны в подразделе Обновление экземпляра Tarantool’а.

  2. Проверьте работу реплик:

    1. Запустите Tarantool.
    2. Присоединитесь к мастеру и начните работать, как раньше.

    На мастере установлена старая версия Tarantool’а, которая всегда совместима со следующей мажорной версией.

  3. Обновите мастер. Процесс такой же, как и при обновлении реплики.

  4. Проверьте работу мастера:

    1. Запустите Tarantool в режиме реплики для получения последней версии данных.
    2. Переключитесь в режим мастера.
  5. Обновите базу данных на любом мастере в кластере. Выполните команду box.schema.upgrade(). Это обновит системные спейсы Tarantool’а так, чтобы они совпадали с текущей установленной версией Tarantool’а. Изменения распространятся на другие узлы кластера через обычный механизм репликации.

Замечания по поводу некоторых операционных систем

Mac OS

Администрирование экземпляров Tarantool’а на Mac OS возможно только с помощью tarantoolctl. Встроенные системные инструменты не поддерживаются.

FreeBSD

Чтобы tarantoolctl и утилиты init.d работали на FreeBSD, используйте пути, отличные от предложенных в разделе Настройка экземпляров Tarantool’а. Используйте /usr/local/etc/tarantool/ вместо /usr/share/tarantool/ и создайте следующие поддиректории:

  • default для хранения настроек tarantoolctl по умолчанию (см. пример ниже),
  • instances.available для хранения всех доступных файлов экземпляра, и
  • instances.enabled для хранения файлов экземпляра, которые необходимо запускать автоматически с помощью sysvinit.

Так выглядят настройки tarantoolctl по умолчанию на FreeBSD:

default_cfg = {
    pid_file   = "/var/run/tarantool", -- /var/run/tarantool/${INSTANCE}.pid
    wal_dir    = "/var/db/tarantool", -- /var/db/tarantool/${INSTANCE}/
    snap_dir   = "/var/db/tarantool", -- /var/db/tarantool/${INSTANCE}
    vinyl_dir = "/var/db/tarantool", -- /var/db/tarantool/${INSTANCE}
    logger     = "/var/log/tarantool", -- /var/log/tarantool/${INSTANCE}.log
    username   = "tarantool",
}

-- instances.available - все доступные экземпляры
-- instances.enabled - экземпляры для автоматического запуска через sysvinit
instance_dir = "/usr/local/etc/tarantool/instances.available"

Сообщения об ошибках

Если вы нашли ошибку в Tarantool, вы окажете нам услугу, сообщив о ней.

Пожалуйста, откройте тикет в репозитории Tarantool на GitHub. Рекомендуем включить следующую информацию:

  • Шаги для воспроизведения ошибки с объяснением того, как ошибочное поведение отличается от описанного в документации ожидаемого поведения. Пожалуйста, указывайте как можно более конкретную информацию. Например, вместо «Я не могу получить определенную информацию» лучше написать «box.space.x:delete() не указывает, что именно было удалено».
  • Название и версию вашей операционной системы, название и версию Tarantool и любую информацию об особенностях вашей машины и ее конфигурации.
  • Сопутствующие файлы – такие как трассировка стека или файл журнала Tarantool’а.

Если это запрос новой функциональности или это затрагивает определенную группу пользователей, не забудьте это указать.

Обычно член команды Tarantool отвечает в течение одного-двух рабочих дней, чтобы подтвердить, что тикет взят в работу, задать уточняющие вопросы или предложить альтернативное решение описанной проблемы.

Репликация

Replication allows multiple Tarantool instances to work on copies of the same databases. The databases are kept in sync because each instance can communicate its changes to all the other instances.

Эта глава включает в себя следующие разделы:

Архитектура механизма репликации

Replication mechanism

A pack of instances which operate on copies of the same databases make up a replica set. Each instance in a replica set has a role, master or replica.

A replica gets all updates from the master by continuously fetching and applying its write ahead log (WAL). Each record in the WAL represents a single Tarantool data-change request such as INSERT, UPDATE or DELETE, and is assigned a monotonically growing log sequence number (LSN). In essence, Tarantool replication is row-based: each data-change request is fully deterministic and operates on a single tuple. However, unlike a classical row-based log, which contains entire copies of the changed rows, Tarantool’s WAL contains copies of the requests. For example, for UPDATE requests, Tarantool only stores the primary key of the row and the update operations, to save space.

Invocations of stored programs are not written to the WAL. Instead, records of the actual data-change requests, performed by the Lua code, are written to the WAL. This ensures that possible non-determinism of Lua does not cause replication to go out of sync.

Data definition operations on temporary spaces, such as creating/dropping, adding indexes, truncating, etc., are written to the WAL, since information about temporary spaces is stored in non-temporary system spaces, such as box.space._space. Data change operations on temporary spaces are not written to the WAL and are not replicated.

To create a valid initial state, to which WAL changes can be applied, every instance of a replica set requires a start set of checkpoint files, such as .snap files for memtx and .run files for vinyl. A replica joining an existing replica set, chooses an existing master and automatically downloads the initial state from it. This is called an initial join.

When an entire replica set is bootstrapped for the first time, there is no master which could provide the initial checkpoint. In such case, replicas connect to each other, elect a master, which then creates the starting set of checkpoint files, and distributes it across all other replicas. This is called an automatic bootstrap of a replica set.

When a replica contacts a master (there can be many masters) for the first time, it becomes part of a replica set. On subsequent occasions, it should always contact a master in the same replica set. Once connected to the master, the replica requests all changes that happened after the latest local LSN (there can be many LSNs – each master has its own LSN).

Each replica set is identified by a globally unique identifier, called replica set UUID. The identifier is created by the master which creates the very first checkpoint, and is part of the checkpoint file. It is stored in system space box.space._schema. For example:

tarantool> box.space._schema:select{'cluster'}
---
- - ['cluster', '6308acb9-9788-42fa-8101-2e0cb9d3c9a0']
...

Additionally, each instance in a replica set is assigned its own UUID, when it joins the replica set. It is called an instance UUID and is a globally unique identifier. This UUID is used to ensure that instances do not join a different replica set, e.g. because of a configuration error. A unique instance identifier is also necessary to apply rows originating from different masters only once, that is, implement multi-master replication. This is why each row in the write ahead log, in addition to its log sequence number, stores the instance identifier of the instance on which it was created. But using UUID as such an identifier would take too much space in the write ahead log, thus a shorter integer number is assigned to the instance when it joins a replica set. This number is then used to refer to the instance in the write ahead log. It is called instance id. All identifiers are stored in system space box.space._cluster. For example:

tarantool> box.space._cluster:select{}
---
- - [1, '88580b5c-4474-43ab-bd2b-2409a9af80d2']
...

Here the instance ID is 1 (unique within the replica set), and the instance UUID is 88580b5c-4474-43ab-bd2b-2409a9af80d2 (globally unique).

Using shorter numeric identifiers is also handy to track the state of the entire replica set. For example, box.info.vclock describes the state of replication in regard to each connected peer.

box.info.vclock
---
- {1: 827, 2: 584}
...

Here vclock contains log sequence numbers (827 and 584) for instances with short identifiers 1 and 2.

Replication setup

To enable replication, you need to specify two parameters in a box.cfg{} request:

  • replication parameter which defines the replication source(s), and
  • read_only parameter which is true for a replica and false for a master.

Both these parameters are «dynamic». This allows a replica to become a master and vice versa on the fly with the help of a box.cfg{} request.

Further we’re giving a detailed example of bootstrapping a replica set.

Replication roles: master and replica

Replication role (master or replica) is set in read_only configuration parameter. The recommended role for all-but-one instances in a replica set is «read-only» (replica).

In a master-replica configuration, every change that happens on the master will be visible on the replicas, but not vice versa.

../../../../_images/mr-1m-2r-oneway.svg

A simple two-instance replica set with the master on one machine and the replica on a different machine provides two benefits:

  • failover, because if the master goes down then the replica can take over, and
  • load balancing, because clients can connect to either the master or the replica for read requests.

In a master-master configuration (also called «multi-master»), every change that happens on either instance will be visible on the other one.

../../../../_images/mm-3m-mesh.svg

The failover benefit in this case is still present, and the load-balancing benefit is enhanced, because any instance can handle both read and write requests. Meanwhile, for multi-master configurations, it is necessary to understand the replication guarantees provided by the asynchronous protocol that Tarantool implements.

Tarantool multi-master replication guarantees that each change on each master is propagated to all instances and is applied only once. Changes from the same instance are applied in the same order as on the originating instance. Changes from different instances, however, can mix and apply in a different order on different instances. This may lead to replication going out of sync in certain cases.

For example, assuming the database is only appended to (i.e. it contains only insertions), it is safe to set each instance to a master. If there are also deletions, but it is not mission critical that deletion happens in the same order on all replicas (e.g. the DELETE is used to prune expired data), a master-master configuration is also safe.

UPDATE operations, however, can easily go out of sync. For example, assignment and increment are not commutative, and may yield different results if applied in different order on different instances.

More generally, it is only safe to use Tarantool master-master replication if all database changes are commutative: the end result does not depend on the order in which the changes are applied. You can start learning more about conflict-free replicated data types here.

Replication topologies: cascade, ring and full mesh

Replication topology is set in replication configuration parameter. The recommended topology is a full mesh, because it makes potential failover easy.

Some database products offer cascading replication topologies: creating a replica on a replica. Tarantool does not recommend such setup.

../../../../_images/no-cascade.svg

The problem with a cascading replica set is that some instances have no connection to other instances and may not receive changes from them. One essential change that must be propagated across all instances in a replica set is an entry in box.space._cluster system space with replica set UUID. Without knowing a replica set UUID, a master refuses to accept connections from such instances when replication topology changes. Here is how this can happen:

../../../../_images/cascade-problem-1.svg

We have a chain of three instances. Instance #1 contains entries for instances #1 and #2 in its _cluster space. Instances #2 and #3 contain entries for instances #1, #2 and #3 in their _cluster spaces.

../../../../_images/cascade-problem-2.svg

Now instance #2 is faulty. Instance #3 tries connecting to instance #1 as its new master, but the master refuses the connection since it has no entry for instance #3.

Ring replication topology is, however, supported:

../../../../_images/cascade-to-ring.svg

So, if you need a cascading topology, you may first create a ring to ensure all instances know each other’s UUID, and then disconnect the chain in the place you desire.

A stock recommendation for a master-master replication topology, however, is a full mesh:

../../../../_images/mm-3m-mesh.svg

You then can decide where to locate instances of the mesh – within the same data center, or spread across a few data centers. Tarantool will automatically ensure that each row is applied only once on each instance. To remove a degraded instance from a mesh, simply change replication configuration parameter.

This ensures full cluster availability in case of a local failure, e.g. one of the instances failing in one of the data centers, as well as in case of an entire data center failure.

The maximal number of replicas in a mesh is 32.

Bootstrapping a replica set

Master-replica bootstrap

Let’s first bootstrap a simple master-replica set containing two instances, each located on its own machine. For easier administration, we make the instance files almost identical.

../../../../_images/mr-1m-1r-twoway.svg

Here is an example of the master’s instance file:

-- instance file for the master
box.cfg{
  listen = 3301,
  replication = {'replicator:password@192.168.0.101:3301',  -- master URI
                 'replicator:password@192.168.0.102:3301'}, -- replica URI
  read_only = false
}
box.once("schema", function()
   box.schema.user.create('replicator', {password = 'password'})
   box.schema.user.grant('replicator', 'replication') -- grant replication role
   box.schema.space.create("test")
   box.space.test:create_index("primary")
   print('box.once executed on master')
end)

где:

  • listen parameter from box.cfg{} defines a URI (port 3301 in our example), on which the master can accept connections from replicas.

  • replication parameter defines the URIs at which all instances in the replica set can accept connections. It includes the replica’s URI as well, although the replica is not a replication source right now.

    Примечание

    For security reasons, we recommend to prevent unauthorized replication sources by associating a password with every user that has a replication role. That way, the URI for replication parameter must have the long form username:password@host:port.

  • read_only parameter enables data-change operations on the instance and makes this Tarantool instance act as a master, not as a replica. That’s the only parameter in our instance files that will differ.

  • box.once() function contains database initialization logic that should be executed only once during the replica set lifetime.

In this example, we create a space with a primary index, and a user for replication purposes. We also say print('box.once executed on master') to see later in console whether box.once() is executed.

Примечание

Replication requires privileges. We can grant privileges for accessing spaces directly to the user who will start the instance. However, it is more usual to grant privileges for accessing spaces to a role, and then grant the role to the user who will start the replica.

Here we use Tarantool’s predefined role named «replication» which by default grants «read» privileges for all database objects («universe»), and we can further set up privileges for this role as required.

In the replica’s instance file, we only set read-only parameter to «true», and say print('box.once executed on replica') to make sure that box.once() is not executed more than once. Otherwise the replica’s instance file is fully identical to the master’s instance file.

-- instance file for the replica
box.cfg{
  listen = 3301,
  replication = {'replicator:password@192.168.0.101:3301',  -- master URI
                 'replicator:password@192.168.0.102:3301'}, -- replica URI
  read_only = true
}
box.once("schema", function()
   box.schema.user.create('replicator', {password = 'password'})
   box.schema.user.grant('replicator', 'replication') -- grant replication role
   box.schema.space.create("test")
   box.space.test:create_index("primary")
   print('box.once executed on replica')
end)

Примечание

The replica does not inherit the master’s configuration parameters, such as those making the checkpoint daemon run on the master. To get the same behavior, please set the relevant parameters explicitly so that they are the same on both master and replica.

Now we can launch the two instances. The master…

$ # launching the master
$ tarantool master.lua
2017-06-14 14:12:03.847 [18933] main/101/master.lua C> version 1.7.4-52-g980d30092
2017-06-14 14:12:03.848 [18933] main/101/master.lua C> log level 5
2017-06-14 14:12:03.849 [18933] main/101/master.lua I> mapping 268435456 bytes for tuple arena...
2017-06-14 14:12:03.859 [18933] iproto/101/main I> binary: bound to [::]:3301
2017-06-14 14:12:03.861 [18933] main/105/applier/replicator@192.168.0. I> can't connect to master
2017-06-14 14:12:03.861 [18933] main/105/applier/replicator@192.168.0. coio.cc:105 !> SystemError connect, called on fd 14, aka 192.168.0.102:56736: Connection refused
2017-06-14 14:12:03.861 [18933] main/105/applier/replicator@192.168.0. I> will retry every 1 second
2017-06-14 14:12:03.861 [18933] main/104/applier/replicator@192.168.0. I> remote master is 1.7.4 at 192.168.0.101:3301
2017-06-14 14:12:19.878 [18933] main/105/applier/replicator@192.168.0. I> remote master is 1.7.4 at 192.168.0.102:3301
2017-06-14 14:12:19.879 [18933] main/101/master.lua I> initializing an empty data directory
2017-06-14 14:12:19.908 [18933] snapshot/101/main I> saving snapshot `/var/lib/tarantool/master/00000000000000000000.snap.inprogress'
2017-06-14 14:12:19.914 [18933] snapshot/101/main I> done
2017-06-14 14:12:19.914 [18933] main/101/master.lua I> vinyl checkpoint done
2017-06-14 14:12:19.917 [18933] main/101/master.lua I> ready to accept requests
2017-06-14 14:12:19.918 [18933] main/105/applier/replicator@192.168.0. I> failed to authenticate
2017-06-14 14:12:19.918 [18933] main/105/applier/replicator@192.168.0. xrow.cc:431 E> ER_LOADING: Instance bootstrap hasn't finished yet
box.once executed on master
2017-06-14 14:12:19.920 [18933] main C> entering the event loop

… (yep, box.once() got executed on the master) – and the replica:

$ # launching the replica
$ tarantool replica.lua
2017-06-14 14:12:19.486 [18934] main/101/replica.lua C> version 1.7.4-52-g980d30092
2017-06-14 14:12:19.486 [18934] main/101/replica.lua C> log level 5
2017-06-14 14:12:19.487 [18934] main/101/replica.lua I> mapping 268435456 bytes for tuple arena...
2017-06-14 14:12:19.494 [18934] iproto/101/main I> binary: bound to [::]:3311
2017-06-14 14:12:19.495 [18934] main/104/applier/replicator@192.168.0. I> remote master is 1.7.4 at 192.168.0.101:3301
2017-06-14 14:12:19.495 [18934] main/105/applier/replicator@192.168.0. I> remote master is 1.7.4 at 192.168.0.102:3302
2017-06-14 14:12:19.496 [18934] main/104/applier/replicator@192.168.0. I> failed to authenticate
2017-06-14 14:12:19.496 [18934] main/104/applier/replicator@192.168.0. xrow.cc:431 E> ER_LOADING: Instance bootstrap hasn't finished yet

In both logs, there are messages saying that the replica got bootstrapped from the master:

$ # bootstrapping the replica (from the master's log)
<...>
2017-06-14 14:12:20.503 [18933] main/106/main I> initial data sent.
2017-06-14 14:12:20.505 [18933] relay/[::ffff:192.168.0.101]:/101/main I> recover from `/var/lib/tarantool/master/00000000000000000000.xlog'
2017-06-14 14:12:20.505 [18933] main/106/main I> final data sent.
2017-06-14 14:12:20.522 [18933] relay/[::ffff:192.168.0.101]:/101/main I> recover from `/Users/e.shebunyaeva/work/tarantool-test-repl/master_dir/00000000000000000000.xlog'
2017-06-14 14:12:20.922 [18933] main/105/applier/replicator@192.168.0. I> authenticated
$ # bootstrapping the replica (from the replica's log)
<...>
2017-06-14 14:12:20.498 [18934] main/104/applier/replicator@192.168.0. I> authenticated
2017-06-14 14:12:20.498 [18934] main/101/replica.lua I> bootstrapping replica from 192.168.0.101:3301
2017-06-14 14:12:20.512 [18934] main/104/applier/replicator@192.168.0. I> initial data received
2017-06-14 14:12:20.512 [18934] main/104/applier/replicator@192.168.0. I> final data received
2017-06-14 14:12:20.517 [18934] snapshot/101/main I> saving snapshot `/var/lib/tarantool/replica/00000000000000000005.snap.inprogress'
2017-06-14 14:12:20.518 [18934] snapshot/101/main I> done
2017-06-14 14:12:20.519 [18934] main/101/replica.lua I> vinyl checkpoint done
2017-06-14 14:12:20.520 [18934] main/101/replica.lua I> ready to accept requests
2017-06-14 14:12:20.520 [18934] main/101/replica.lua I> set 'read_only' configuration option to true
2017-06-14 14:12:20.520 [18934] main C> entering the event loop

Notice that box.once() was executed only at the master, although we added box.once() to both instance files.

We could as well launch the replica first:

$ # launching the replica
$ tarantool replica.lua
2017-06-14 14:35:36.763 [18952] main/101/replica.lua C> version 1.7.4-52-g980d30092
2017-06-14 14:35:36.765 [18952] main/101/replica.lua C> log level 5
2017-06-14 14:35:36.765 [18952] main/101/replica.lua I> mapping 268435456 bytes for tuple arena...
2017-06-14 14:35:36.772 [18952] iproto/101/main I> binary: bound to [::]:3301
2017-06-14 14:35:36.772 [18952] main/104/applier/replicator@192.168.0. I> can't connect to master
2017-06-14 14:35:36.772 [18952] main/104/applier/replicator@192.168.0. coio.cc:105 !> SystemError connect, called on fd 13, aka 192.168.0.101:56820: Connection refused
2017-06-14 14:35:36.772 [18952] main/104/applier/replicator@192.168.0. I> will retry every 1 second
2017-06-14 14:35:36.772 [18952] main/105/applier/replicator@192.168.0. I> remote master is 1.7.4 at 192.168.0.102:3301

… and the master later:

$ # launching the master
$ tarantool master.lua
2017-06-14 14:35:43.701 [18953] main/101/master.lua C> version 1.7.4-52-g980d30092
2017-06-14 14:35:43.702 [18953] main/101/master.lua C> log level 5
2017-06-14 14:35:43.702 [18953] main/101/master.lua I> mapping 268435456 bytes for tuple arena...
2017-06-14 14:35:43.709 [18953] iproto/101/main I> binary: bound to [::]:3301
2017-06-14 14:35:43.709 [18953] main/105/applier/replicator@192.168.0. I> remote master is 1.7.4 at 192.168.0.102:3301
2017-06-14 14:35:43.709 [18953] main/104/applier/replicator@192.168.0. I> remote master is 1.7.4 at 192.168.0.101:3301
2017-06-14 14:35:43.709 [18953] main/101/master.lua I> initializing an empty data directory
2017-06-14 14:35:43.721 [18953] snapshot/101/main I> saving snapshot `/var/lib/tarantool/master/00000000000000000000.snap.inprogress'
2017-06-14 14:35:43.722 [18953] snapshot/101/main I> done
2017-06-14 14:35:43.723 [18953] main/101/master.lua I> vinyl checkpoint done
2017-06-14 14:35:43.723 [18953] main/101/master.lua I> ready to accept requests
2017-06-14 14:35:43.724 [18953] main/105/applier/replicator@192.168.0. I> failed to authenticate
2017-06-14 14:35:43.724 [18953] main/105/applier/replicator@192.168.0. xrow.cc:431 E> ER_LOADING: Instance bootstrap hasn't finished yet
box.once executed on master
2017-06-14 14:35:43.726 [18953] main C> entering the event loop
2017-06-14 14:35:43.779 [18953] main/103/main I> initial data sent.
2017-06-14 14:35:43.780 [18953] relay/[::ffff:192.168.0.101]:/101/main I> recover from `/var/lib/tarantool/master/00000000000000000000.xlog'
2017-06-14 14:35:43.780 [18953] main/103/main I> final data sent.
2017-06-14 14:35:43.796 [18953] relay/[::ffff:192.168.0.102]:/101/main I> recover from `/var/lib/tarantool/master/00000000000000000000.xlog'
2017-06-14 14:35:44.726 [18953] main/105/applier/replicator@192.168.0. I> authenticated

In this case, the replica would wait for the master to become available, so the launch order doesn’t matter. Our box.once() logic would also be executed only once, at the master.

$ # the replica has eventually connected to the master
$ # and got bootstrapped (from the replica's log)
2017-06-14 14:35:43.777 [18952] main/104/applier/replicator@192.168.0. I> remote master is 1.7.4 at 192.168.0.101:3301
2017-06-14 14:35:43.777 [18952] main/104/applier/replicator@192.168.0. I> authenticated
2017-06-14 14:35:43.777 [18952] main/101/replica.lua I> bootstrapping replica from 192.168.0.199:3310
2017-06-14 14:35:43.788 [18952] main/104/applier/replicator@192.168.0. I> initial data received
2017-06-14 14:35:43.789 [18952] main/104/applier/replicator@192.168.0. I> final data received
2017-06-14 14:35:43.793 [18952] snapshot/101/main I> saving snapshot `/var/lib/tarantool/replica/00000000000000000005.snap.inprogress'
2017-06-14 14:35:43.793 [18952] snapshot/101/main I> done
2017-06-14 14:35:43.795 [18952] main/101/replica.lua I> vinyl checkpoint done
2017-06-14 14:35:43.795 [18952] main/101/replica.lua I> ready to accept requests
2017-06-14 14:35:43.795 [18952] main/101/replica.lua I> set 'read_only' configuration option to true
2017-06-14 14:35:43.795 [18952] main C> entering the event loop

Controlled failover

To perform a controlled failover, that is, swap the roles of the master and replica, all we need to do is to set read_only=true at the master, and read_only=false at the replica. The order of actions is important here. If a system is running in production, we don’t want concurrent writes happen both at the replica and the master. Nor do we want the new replica to accept any writes until it has finished fetching all replication data from the old master. To compare replica and master state, we can use box.info.signature.

  1. Set read_only=true at the master.

    # at the master
    tarantool> box.cfg{read_only=true}
    
  2. Record the master’s current position with box.info.signature, containing the sum of all LSNs in the master’s vector clock.

    # at the master
    tarantool> box.info.signature
    
  3. Wait until the replica’s signature is the same as the master’s.

    # at the replica
    tarantool> box.info.signature
    
  4. Set read_only=false at the replica to enable write operations.

    # at the replica
    tarantool> box.cfg{read_only=false}
    

These 4 steps ensure that the replica doesn’t accept new writes until it’s done fetching writes from the master.

Master-master bootstrap

Now let’s bootstrap a two-instance master-master set. For easier administration, we make master#1 and master#2 instance files fully identical.

../../../../_images/mm-2m-mesh.svg

We re-use the master’s instance file from the master-replica example above.

-- instance file for any of the two masters
box.cfg{
  listen      = 3301,
  replication = {'replicator:password@192.168.0.101:3301',  -- master1 URI
                 'replicator:password@192.168.0.102:3301'}, -- master2 URI
  read_only   = false
}
box.once("schema", function()
   box.schema.user.create('replicator', {password = 'password'})
   box.schema.user.grant('replicator', 'replication') -- grant replication role
   box.schema.space.create("test")
   box.space.test:create_index("primary")
   print('box.once executed on master #1')
end)

In replication parameter, we define the URIs of both masters in the replica set and say print('box.once executed on master #1') to see when and where the box.once() logic is executed.

Now we can launch the two masters. Again, the launch order doesn’t matter. The box.once() logic will also be executed only once, at the master which is elected as the replica set leader at bootstrap.

$ # launching master #1
$ tarantool master1.lua
2017-06-14 15:39:03.062 [47021] main/101/master1.lua C> version 1.7.4-52-g980d30092
2017-06-14 15:39:03.062 [47021] main/101/master1.lua C> log level 5
2017-06-14 15:39:03.063 [47021] main/101/master1.lua I> mapping 268435456 bytes for tuple arena...
2017-06-14 15:39:03.065 [47021] iproto/101/main I> binary: bound to [::]:3301
2017-06-14 15:39:03.065 [47021] main/105/applier/replicator@192.168.0.10 I> can't connect to master
2017-06-14 15:39:03.065 [47021] main/105/applier/replicator@192.168.0.10 coio.cc:107 !> SystemError connect, called on fd 14, aka 192.168.0.102:57110: Connection refused
2017-06-14 15:39:03.065 [47021] main/105/applier/replicator@192.168.0.10 I> will retry every 1 second
2017-06-14 15:39:03.065 [47021] main/104/applier/replicator@192.168.0.10 I> remote master is 1.7.4 at 192.168.0.101:3301
2017-06-14 15:39:08.070 [47021] main/105/applier/replicator@192.168.0.10 I> remote master is 1.7.4 at 192.168.0.102:3301
2017-06-14 15:39:08.071 [47021] main/105/applier/replicator@192.168.0.10 I> authenticated
2017-06-14 15:39:08.071 [47021] main/101/master1.lua I> bootstrapping replica from 192.168.0.102:3301
2017-06-14 15:39:08.073 [47021] main/105/applier/replicator@192.168.0.10 I> initial data received
2017-06-14 15:39:08.074 [47021] main/105/applier/replicator@192.168.0.10 I> final data received
2017-06-14 15:39:08.074 [47021] snapshot/101/main I> saving snapshot `/Users/e.shebunyaeva/work/tarantool-test-repl/master1_dir/00000000000000000008.snap.inprogress'
2017-06-14 15:39:08.074 [47021] snapshot/101/main I> done
2017-06-14 15:39:08.076 [47021] main/101/master1.lua I> vinyl checkpoint done
2017-06-14 15:39:08.076 [47021] main/101/master1.lua I> ready to accept requests
box.once executed on master #1
2017-06-14 15:39:08.077 [47021] main C> entering the event loop
$ # launching master #2
$ tarantool master2.lua
2017-06-14 15:39:07.452 [47022] main/101/master2.lua C> version 1.7.4-52-g980d30092
2017-06-14 15:39:07.453 [47022] main/101/master2.lua C> log level 5
2017-06-14 15:39:07.453 [47022] main/101/master2.lua I> mapping 268435456 bytes for tuple arena...
2017-06-14 15:39:07.455 [47022] iproto/101/main I> binary: bound to [::]:3301
2017-06-14 15:39:07.455 [47022] main/104/applier/replicator@192.168.0.19 I> remote master is 1.7.4 at 192.168.0.101:3301
2017-06-14 15:39:07.455 [47022] main/105/applier/replicator@192.168.0.10 I> remote master is 1.7.4 at 192.168.0.102:3301
2017-06-14 15:39:07.455 [47022] main/101/master2.lua I> initializing an empty data directory
2017-06-14 15:39:07.457 [47022] snapshot/101/main I> saving snapshot `/Users/e.shebunyaeva/work/tarantool-test-repl/master2_dir/00000000000000000000.snap.inprogress'
2017-06-14 15:39:07.457 [47022] snapshot/101/main I> done
2017-06-14 15:39:07.458 [47022] main/101/master2.lua I> vinyl checkpoint done
2017-06-14 15:39:07.459 [47022] main/101/master2.lua I> ready to accept requests
2017-06-14 15:39:07.460 [47022] main C> entering the event loop
2017-06-14 15:39:08.072 [47022] main/103/main I> initial data sent.
2017-06-14 15:39:08.073 [47022] relay/[::ffff:192.168.0.102]:/101/main I> recover from `/Users/e.shebunyaeva/work/tarantool-test-repl/master2_dir/00000000000000000000.xlog'
2017-06-14 15:39:08.073 [47022] main/103/main I> final data sent.
2017-06-14 15:39:08.077 [47022] relay/[::ffff:192.168.0.102]:/101/main I> recover from `/Users/e.shebunyaeva/work/tarantool-test-repl/master2_dir/00000000000000000000.xlog'
2017-06-14 15:39:08.461 [47022] main/104/applier/replicator@192.168.0.10 I> authenticated

Adding instances

Adding a replica

../../../../_images/mr-1m-2r-mesh-add.svg

To add a second replica instance to the master-replica set from our bootstrapping example, we need an analog of the instance file that we created for the first replica in that set:

-- instance file for replica #2
box.cfg{
  listen = 3301,
  replication = ('replicator:password@192.168.0.101:3301',  -- master URI
                 'replicator:password@192.168.0.102:3301',  -- replica #1 URI
                 'replicator:password@192.168.0.103:3301'), -- replica #2 URI
  read_only = true
}
box.once("schema", function()
   box.schema.user.create('replicator', {password = 'password'})
   box.schema.user.grant('replicator', 'replication’) -- grant replication role
   box.schema.space.create("test")
   box.space.test:create_index("primary")
   print('box.once executed on replica #2')
end)

Here we add replica #2 URI to replication parameter, so now it contains three URIs.

After we launch the new replica instance, it gets connected to the master instance and retrieves the master’s write ahead log and snapshot files:

$ # launching replica #2
$ tarantool replica2.lua
2017-06-14 14:54:33.927 [46945] main/101/replica2.lua C> version 1.7.4-52-g980d30092
2017-06-14 14:54:33.927 [46945] main/101/replica2.lua C> log level 5
2017-06-14 14:54:33.928 [46945] main/101/replica2.lua I> mapping 268435456 bytes for tuple arena...
2017-06-14 14:54:33.930 [46945] main/104/applier/replicator@192.168.0.10 I> remote master is 1.7.4 at 192.168.0.101:3301
2017-06-14 14:54:33.930 [46945] main/104/applier/replicator@192.168.0.10 I> authenticated
2017-06-14 14:54:33.930 [46945] main/101/replica2.lua I> bootstrapping replica from 192.168.0.101:3301
2017-06-14 14:54:33.933 [46945] main/104/applier/replicator@192.168.0.10 I> initial data received
2017-06-14 14:54:33.933 [46945] main/104/applier/replicator@192.168.0.10 I> final data received
2017-06-14 14:54:33.934 [46945] snapshot/101/main I> saving snapshot `/var/lib/tarantool/replica2/00000000000000000010.snap.inprogress'
2017-06-14 14:54:33.934 [46945] snapshot/101/main I> done
2017-06-14 14:54:33.935 [46945] main/101/replica2.lua I> vinyl checkpoint done
2017-06-14 14:54:33.935 [46945] main/101/replica2.lua I> ready to accept requests
2017-06-14 14:54:33.935 [46945] main/101/replica2.lua I> set 'read_only' configuration option to true
2017-06-14 14:54:33.936 [46945] main C> entering the event loop

Since we’re adding a read-only instance, there is no need to dynamically update replication parameter on the other running instances. This update would be required if we added a master instance.

However, we recommend to specify replica #3 URI in all instance files of the replica set. This will keep all the files consistent with each other and with the current replication topology, and so will help to avoid configuration errors in case of further reconfigurations and replica set restart.

Adding a master

../../../../_images/mm-3m-mesh-add.svg

To add a third master instance to the master-master set from our bootstrapping example, we need an analog of the instance files that we created to bootstrap the other master instances in that set:

-- instance file for master #3
box.cfg{
  listen      = 3301,
  replication = {'replicator:password@192.168.0.101:3301',  -- master#1 URI
                 'replicator:password@192.168.0.102:3301',  -- master#2 URI
                 'replicator:password@192.168.0.103:3301'}, -- master#3 URI
  read_only   = true, -- temporarily read-only
}
box.once("schema", function()
   box.schema.user.create('replicator', {password = 'password'})
   box.schema.user.grant('replicator', 'replication’) -- grant "replication" role
   box.schema.space.create("test")
   box.space.test:create_index("primary")
end)

Here we make the following changes:

  • Add master#3 URI to replication parameter.
  • Temporarily specify read_only=true to disable data-change operations on the instance. After launch, master #3 will act as a replica until it retrieves all data from the other masters in the replica set.

After we launch the third master instance, it gets connected to the other master instances and retrieves their write ahead logs and snapshot files:

$ # launching master #3
$ tarantool master3.lua
2017-06-14 17:10:00.556 [47121] main/101/master3.lua C> version 1.7.4-52-g980d30092
2017-06-14 17:10:00.557 [47121] main/101/master3.lua C> log level 5
2017-06-14 17:10:00.557 [47121] main/101/master3.lua I> mapping 268435456 bytes for tuple arena...
2017-06-14 17:10:00.559 [47121] iproto/101/main I> binary: bound to [::]:3301
2017-06-14 17:10:00.559 [47121] main/104/applier/replicator@192.168.0.10 I> remote master is 1.7.4 at 192.168.0.101:3301
2017-06-14 17:10:00.559 [47121] main/105/applier/replicator@192.168.0.10 I> remote master is 1.7.4 at 192.168.0.102:3301
2017-06-14 17:10:00.559 [47121] main/106/applier/replicator@192.168.0.10 I> remote master is 1.7.4 at 192.168.0.103:3301
2017-06-14 17:10:00.559 [47121] main/105/applier/replicator@192.168.0.10 I> authenticated
2017-06-14 17:10:00.559 [47121] main/101/master3.lua I> bootstrapping replica from 192.168.0.102:3301
2017-06-14 17:10:00.562 [47121] main/105/applier/replicator@192.168.0.10 I> initial data received
2017-06-14 17:10:00.562 [47121] main/105/applier/replicator@192.168.0.10 I> final data received
2017-06-14 17:10:00.562 [47121] snapshot/101/main I> saving snapshot `/Users/e.shebunyaeva/work/tarantool-test-repl/master3_dir/00000000000000000009.snap.inprogress'
2017-06-14 17:10:00.562 [47121] snapshot/101/main I> done
2017-06-14 17:10:00.564 [47121] main/101/master3.lua I> vinyl checkpoint done
2017-06-14 17:10:00.564 [47121] main/101/master3.lua I> ready to accept requests
2017-06-14 17:10:00.565 [47121] main/101/master3.lua I> set 'read_only' configuration option to true
2017-06-14 17:10:00.565 [47121] main C> entering the event loop
2017-06-14 17:10:00.565 [47121] main/104/applier/replicator@192.168.0.10 I> authenticated

Next, we add master#3 URI to replication parameter on the existing two masters. Replication-related parameters are dynamic, so we only need to make a box.cfg{} request on each of the running instances:

# adding master #3 URI to replication sources
tarantool> box.cfg{replication =
         > {'replicator:password@192.168.0.101:3301',
         > 'replicator:password@192.168.0.102:3301',
         > 'replicator:password@192.168.0.103:3301'}}
---
...

When master #3 catches up with the other masters“ state, we can disable read-only mode for this instance:

# making master #3 a real master
tarantool> box.cfg{read_only=false}
---
...

We also recommend to specify master #3 URI in all instance files in order to keep all the files consistent with each other and with the current replication topology.

Removing instances

To politely remove an instance from a replica set, follow these steps:

  1. On the instance, run box.cfg{} with a blank replication source:

    tarantool> box.cfg{replication=''}
    ---
    ...
    

    The other instances in the replica set will carry on. If later the removed instance rejoins, it will receive all the updates that the other instances made while it was away.

  2. If the instance is decommissioned forever, delete the instance’s record from the following locations:

    1. replication parameter at all running instances in the replica set:

      tarantool> box.cfg{replication=...}
      
    2. box.space._cluster on any master instance in the replica set. For example, a record with instance id = 3:

      tarantool> box.space._cluster:select{}
      ---
      - - [1, '913f99c8-aee3-47f2-b414-53ed0ec5bf27']
        - [2, 'eac1aee7-cfeb-46cc-8503-3f8eb4c7de1e']
        - [3, '97f2d65f-2e03-4dc8-8df3-2469bd9ce61e']
      ...
      tarantool> box.space._cluster:delete(3)
      ---
      - [3, '97f2d65f-2e03-4dc8-8df3-2469bd9ce61e']
      ...
      

Monitoring a replica set

To learn what instances belong in the replica set, and obtain statistics for all these instances, use box.info.replication request:

box.info.replication
---
  replication:
    1:
      id: 1
      uuid: b8a7db60-745f-41b3-bf68-5fcce7a1e019
      lsn: 88
    2:
      id: 2
      uuid: cd3c7da2-a638-4c5d-ae63-e7767c3a6896
      lsn: 31
      upstream:
        status: follow
        idle: 43.187747001648
        lag: 0
      downstream:
     vclock: {1: 31}
    3:
      id: 3
      uuid: e38ef895-5804-43b9-81ac-9f2cd872b9c4
      lsn: 54
      upstream:
        status: follow
        idle: 43.187621831894
        lag: 2
      downstream:
        vclock: {1: 54}
...

This report is for a master-master replica set of three instances, each having its own instance id, UUID and log sequence number.

../../../../_images/mm-3m-mesh.svg

The request was issued at master #1, and the reply includes statistics for the other two masters, given in regard to master #1.

The primary indicators of replication health are idle and lag parameters (see reference on box.info.replication for details).

Восстановление после сбоя

«Degraded state» is a situation when the master becomes unavailable – due to hardware or network failure, or due to a programming bug.

../../../../_images/mr-degraded.svg

In a master-replica set, if a master disappears, error messages appear on the replicas stating that the connection is lost:

$ # messages from a replica's log
2017-06-14 16:23:10.993 [19153] main/105/applier/replicator@192.168.0. I> can't read row
2017-06-14 16:23:10.993 [19153] main/105/applier/replicator@192.168.0. coio.cc:349 !> SystemError
unexpected EOF when reading from socket, called on fd 17, aka 192.168.0.101:57815,
peer of 192.168.0.101:3301: Broken pipe
2017-06-14 16:23:10.993 [19153] main/105/applier/replicator@192.168.0. I> will retry every 1 second
2017-06-14 16:23:10.993 [19153] relay/[::ffff:192.168.0.101]:/101/main I> the replica has closed its socket, exiting
2017-06-14 16:23:10.993 [19153] relay/[::ffff:192.168.0.101]:/101/main C> exiting the relay loop

… and the master’s status is reported as «disconnected»:

# report from replica #1
tarantool> box.info.replication
---
- 1:
    id: 1
    uuid: 70e8e9dc-e38d-4046-99e5-d25419267229
    lsn: 542
    upstream:
      status: disconnected
      idle: 182.36929893494
      message: connect, called on fd 13, aka 192.168.0.101:58244
      lag: 0.00026607513427734
  2:
    id: 2
    uuid: fb252ac7-5c34-4459-84d0-54d248b8c87e
    lsn: 0
  3:
    id: 3
    uuid: fd7681d8-255f-4237-b8bb-c4fb9d99024d
    lsn: 0
    downstream:
      vclock: {1: 542}
...
# report from replica #2
box.info.replication
---
- 1:
    id: 1
    uuid: 70e8e9dc-e38d-4046-99e5-d25419267229
    lsn: 542
    upstream:
      status: disconnected
      idle: 186.76988101006
      message: connect, called on fd 13, aka 192.168.0.101:58253
      lag: 0.00027203559875488
  2:
    id: 2
    uuid: fb252ac7-5c34-4459-84d0-54d248b8c87e
    lsn: 0
    upstream:
      status: follow
      idle: 186.76960110664
      lag: 0.00020599365234375
  3:
    id: 3
    uuid: fd7681d8-255f-4237-b8bb-c4fb9d99024d
    lsn: 0
...

To declare that one of the replicas must now take over as a new master:

  1. Make sure that the old master is gone for good:
    • change network routing rules to avoid any more packets being delivered to the master, or
    • shut down the master instance, if you have access to the machine, or
    • power off the container or the machine.
  2. Say box.cfg{read_only=false, listen=URI} on the replica, and box.cfg{replication=URI} on the other replicas in the set.

Примечание

If there are updates on the old master that were not propagated before the old master went down, re-apply them manually to the new master using tarantoolctl cat and tarantoolctl play commands.

There is no automatic way for a replica to detect that the master is gone forever, since sources of failure and replication environments vary significantly. So the detection of degraded state requires an external observer.

Reseeding a replica

If any of a replica’s .xlog/.snap/.run files are corrupted or deleted, you can «re-seed» the replica:

  1. Stop the replica and destroy all local database files (the ones with extensions .xlog/.snap/.run/.inprogress).

  2. Delete the replica’s record from the following locations:

    1. replication parameter at all running instances in the replica set.
    2. box.space._cluster on the master instance.

    See section Removing instances for details.

  3. Restart the replica with the same instance file to contact the master again. The replica will then catch up with the master by retrieving all the master’s tuples.

Примечание

Remember that this procedure works only if the master’s WAL files are present.

Предотвращение дублирующихся действий

Tarantool guarantees that every update is applied only once at every replica. However, due to asynchronous nature of the replication, the order of updates is not guaranteed. Further we analyse this problem in more details, provide examples of replication going out of sync, and suggest solutions.

Replication stops

In a replica set of two masters, suppose master #1 tries to do something that master #2 has already done. For example, try to simultaneously insert a tuple with the same unique key:

tarantool> box.space.tester:insert{1, 'data'}

This would cause an error saying Duplicate key exists in unique index 'primary' in space 'tester' and the replication would be stopped.

$ # error messages from master #1
2017-06-26 21:17:03.233 [30444] main/104/applier/rep_user@100.96.166.1 I> can't read row
2017-06-26 21:17:03.233 [30444] main/104/applier/rep_user@100.96.166.1 memtx_hash.cc:226 E> ER_TUPLE_FOUND:
Duplicate key exists in unique index 'primary' in space 'tester'
2017-06-26 21:17:03.233 [30444] relay/[::ffff:100.96.166.178]/101/main I> the replica has closed its socket, exiting
2017-06-26 21:17:03.233 [30444] relay/[::ffff:100.96.166.178]/101/main C> exiting the relay loop

$ # error messages from master #2
2017-06-26 21:17:03.233 [30445] main/104/applier/rep_user@100.96.166.1 I> can't read row
2017-06-26 21:17:03.233 [30445] main/104/applier/rep_user@100.96.166.1 memtx_hash.cc:226 E> ER_TUPLE_FOUND:
Duplicate key exists in unique index 'primary' in space 'tester'
2017-06-26 21:17:03.234 [30445] relay/[::ffff:100.96.166.178]/101/main I> the replica has closed its socket, exiting
2017-06-26 21:17:03.234 [30445] relay/[::ffff:100.96.166.178]/101/main C> exiting the relay loop

If we check replication statuses with box.info, we’ll see that replication at master #1 is stopped (1.upstream.status = stopped). Additionally, no data is replicated from that master (section 1.downstream is missing in the report), because the downstream has encountered the same error:

# replication statuses (report from master #3)
tarantool> box.info
---
- version: 1.7.4-52-g980d30092
  id: 3
  ro: false
  vclock: {1: 9, 2: 1000000, 3: 3}
  uptime: 557
  lsn: 3
  vinyl: []
  cluster:
    uuid: 34d13b1a-f851-45bb-8f57-57489d3b3c8b
  pid: 30445
  status: running
  signature: 1000012
  replication:
    1:
      id: 1
      uuid: 7ab6dee7-dc0f-4477-af2b-0e63452573cf
      lsn: 9
      upstream:
        status: stopped
        idle: 445.8626639843
        message: Duplicate key exists in unique index 'primary' in space 'tester'
        lag: 0.00050592422485352
    2:
      id: 2
      uuid: 9afbe2d9-db84-4d05-9a7b-e0cbbf861e28
      lsn: 1000000
      upstream:
        status: follow
        idle: 201.99915885925
        lag: 0.0015020370483398
      downstream:
        vclock: {1: 8, 2: 1000000, 3: 3}
    3:
      id: 3
      uuid: e826a667-eed7-48d5-a290-64299b159571
      lsn: 3
  uuid: e826a667-eed7-48d5-a290-64299b159571
...

When replication is later manually resumed:

# resuming stopped replication (at all masters)
tarantool> original_value = box.cfg.replication
tarantool> box.cfg{replication={}}
tarantool> box.cfg{replication=original_value}

… the faulty row in the write ahead log files is skipped.

Replication runs out of sync

In a master-master cluster of two instances, suppose we make the following operation:

tarantool> box.space.tester:upsert({1}, {{'=', 2, box.info.uuid}})

When we get this operation applied on both instances in the replica set:

-- at master #1
tarantool> box.space.tester:upsert({1}, {{'=', 2, box.info.uuid}})
-- at master #2
tarantool> box.space.tester:upsert({1}, {{'=', 2, box.info.uuid}})

… we can have the following results, depending on the order of execution:

  • each master’s row contains the uuid from master #1,
  • each master’s row contains the uuid from master #2,
  • master #1 has the uuid of master #2, and vice versa.

Commutative changes

The cases described in previous paragraphs represent examples of non-commutative operations, i.e. operations, which result depends on the execution order. On the contrary, for commutative operations, the execution order doesn’t matter.

Consider for example the following command:

tarantool> box.space.tester:upsert{{1, 0}, {{'+', 2, 1)}

This operation is commutative: we get the same result no matter in which order the update is applied on the other masters.

Коннекторы

В этой главе описаны API для различных языков программирования.

Протокол

Tarantool’s binary protocol was designed with a focus on asynchronous I/O and easy integration with proxies. Each client request starts with a variable-length binary header, containing request id, request type, instance id, log sequence number, and so on.

Также в заголовке обязательно указывается длина запроса, что облегчает обработку данных. Ответ на запрос посылается по мере готовности. В заголовке ответа указывается тот же идентификатор и тип запроса, что и в изначальном запросе. По идентификатору можно легко соотнести запрос с ответом, даже если ответ был получен не в порядке отсылки запросов.

Вдаваться в тонкости реализации Tarantool-протокола нужно только при разработке нового коннектора для Tarantool’а — см. полное описание бинарного протокола в Tarantool’е в виде аннотированных BNF-диаграмм (Backus-Naur Form). В остальных случаях достаточно взять уже существующий коннектор для нужного вам языка программирования. Такие коннекторы позволяют легко хранить структуры данных из разных языков в формате Tarantool’а.

Пример пакета данных

The Tarantool API exists so that a client program can send a request packet to a server instance, and receive a response. Here is an example of a what the client would send for box.space[513]:insert{'A', 'BB'}. The BNF description of the components is on the page about Tarantool’s binary protocol.

Компонент Байт #0 Байт #1 Байт #2 Байт #3
код для вставки 02      
остаток заголовка
число из 2 цифр: ID пространства cd 02 01  
код для кортежа 21      
число из 1 цифры: количество полей = 2 92      
строка из 1 символа: поле[1] a1 41    
строка из 2 символов: поле[2] a2 42 42  

Now, you could send that packet to the Tarantool instance, and interpret the response (the page about Tarantool’s binary protocol has a description of the packet format for responses as well as requests). But it would be easier, and less error-prone, if you could invoke a routine that formats the packet according to typed parameters. Something like response = tarantool_routine("insert", 513, "A", "B");. And that is why APIs exist for drivers for Perl, Python, PHP, and so on.

Настройка окружения для примеров работы с коннекторами

This chapter has examples that show how to connect to a Tarantool instance via the Perl, PHP, Python, node.js, and C connectors. The examples contain hard code that will work if and only if the following conditions are met:

  • tarantool-сервер запущен на локальной машине (localhost = 127.0.0.1), а прослушивание для него настроено на порту 3301 (box.cfg.listen = '3301'),
  • space examples has id = 999 (box.space.examples.id = 999) and has a primary-key index for a numeric field (box.space[999].index[0].parts[1].type = "unsigned"),
  • для пользователя „guest“ настроены привилегии на чтение и запись.

It is easy to meet all the conditions by starting the instance and executing this script:

box.cfg{listen=3301}
box.schema.space.create('examples',{id=999})
box.space.examples:create_index('primary', {type = 'hash', parts = {1, 'unsigned'}})
box.schema.user.grant('guest','read,write','space','examples')
box.schema.user.grant('guest','read','space','_space')

Perl

The most commonly used Perl driver is tarantool-perl. It is not supplied as part of the Tarantool repository; it must be installed separately. The most common way to install it is by cloning from GitHub.

To avoid minor warnings that may appear the first time tarantool-perl is installed, start with installing some other modules that tarantool-perl uses, with CPAN, the Comprehensive Perl Archive Network:

$ sudo cpan install AnyEvent
$ sudo cpan install Devel::GlobalDestruction

Then, to install tarantool-perl itself, say:

$ git clone https://github.com/tarantool/tarantool-perl.git tarantool-perl
$ cd tarantool-perl
$ git submodule init
$ git submodule update --recursive
$ perl Makefile.PL
$ make
$ sudo make install

Here is a complete Perl program that inserts [99999,'BB'] into space[999] via the Perl API. Before trying to run, check that the server instance is listening at localhost:3301 and that the space examples exists, as described earlier. To run, paste the code into a file named example.pl and say perl example.pl. The program will connect using an application-specific definition of the space. The program will open a socket connection with the Tarantool instance at localhost:3301, then send an space_object:INSERT request, then — if all is well — end without displaying any messages. If Tarantool is not running on localhost with listen port = 3301, the program will print “Connection refused”.

#!/usr/bin/perl
use DR::Tarantool ':constant', 'tarantool';
use DR::Tarantool ':all';
use DR::Tarantool::MsgPack::SyncClient;

my $tnt = DR::Tarantool::MsgPack::SyncClient->connect(
  host    => '127.0.0.1',                      # look for tarantool on localhost
  port    => 3301,                             # on port 3301
  user    => 'guest',                          # username. for 'guest' we do not also say 'password=>...'

  spaces  => {
    999 => {                                   # definition of space[999] ...
      name => 'examples',                      #   space[999] name = 'examples'
      default_type => 'STR',                   #   space[999] field type is 'STR' if undefined
      fields => [ {                            #   definition of space[999].fields ...
          name => 'field1', type => 'NUM' } ], #     space[999].field[1] name='field1',type='NUM'
      indexes => {                             #   definition of space[999] indexes ...
        0 => {
          name => 'primary', fields => [ 'field1' ] } } } } );

$tnt->insert('examples' => [ 99999, 'BB' ]);

The example program uses field type names „STR“ and „NUM“ instead of „string“ and „unsigned“, due to a temporary Perl limitation.

The example program only shows one request and does not show all that’s necessary for good practice. For that, please see the tarantool-perl repository.

PHP

The most commonly used PHP driver is tarantool-php. It is not supplied as part of the Tarantool repository; it must be installed separately, for example with git. See installation instructions. in the driver’s README file.

Here is a complete PHP program that inserts [99999,'BB'] into a space named examples via the PHP API. Before trying to run, check that the server instance is listening at localhost:3301 and that the space examples exists, as described earlier. To run, paste the code into a file named example.php and say php -d extension=~/tarantool-php/modules/tarantool.so example.php. The program will open a socket connection with the Tarantool instance at localhost:3301, then send an INSERT request, then — if all is well — print «Insert succeeded». If the tuple already exists, the program will print “Duplicate key exists in unique index „primary“ in space „examples“”.

<?php
$tarantool = new Tarantool('localhost', 3301);

try {
    $tarantool->insert('examples', array(99999, 'BB'));
    echo "Insert succeeded\n";
} catch (Exception $e) {
    echo "Exception: ", $e->getMessage(), "\n";
}

The example program only shows one request and does not show all that’s necessary for good practice. For that, please see tarantool/tarantool-php project at GitHub.

Besides, you can use an alternative PHP driver from another GitHub project: it includes a client (see tarantool-php/client) and a mapper for that client (see tarantool-php/mapper).

Python

Далее приводится пример полноценной программы на языке Python, которая осуществляет вставку кортежа [99999,'Value','Value'] в пространство examples с помощью высокоуровневого Tarantool API для языка Python.

#!/usr/bin/python
from tarantool import Connection

c = Connection("127.0.0.1", 3301)
result = c.insert("examples",(99999,'Value', 'Value'))
print result

To prepare, paste the code into a file named example.py and install the tarantool-python connector with either pip install tarantool>0.4 to install in /usr (requires root privilege) or pip install tarantool>0.4 --user to install in ~ i.e. user’s default directory. Before trying to run, check that the server instance is listening at localhost:3301 and that the space examples exists, as described earlier. To run the program, say python example.py. The program will connect to the Tarantool server, will send the INSERT request, and will not throw any exception if all went well. If the tuple already exists, the program will throw tarantool.error.DatabaseError: (3, "Duplicate key exists in unique index 'primary' in space 'examples'").

The example program only shows one request and does not show all that’s necessary for good practice. For that, please see tarantool-python project at GitHub. For an example of using Python API with queue managers for Tarantool, see queue-python project at GitHub.

Node.js

The most commonly used node.js driver is the Node Tarantool driver. It is not supplied as part of the Tarantool repository; it must be installed separately. The most common way to install it is with npm. For example, on Ubuntu, the installation could look like this after npm has been installed:

npm install tarantool-driver --global

Here is a complete node.js program that inserts [99999,'BB'] into space[999] via the node.js API. Before trying to run, check that the server instance is listening at localhost:3301 and that the space examples exists, as described earlier. To run, paste the code into a file named example.rs and say node example.rs. The program will connect using an application-specific definition of the space. The program will open a socket connection with the Tarantool instance at localhost:3301, then send an INSERT request, then — if all is well — end after saying «Insert succeeded». If Tarantool is not running on localhost with listen port = 3301, the program will print “Connect failed”. If user „guest“ user does not have authorization to connect, the program will print «Auth failed». If the insert request fails for any reason, for example because the tuple already exists, the program will print «Insert failed».

var TarantoolConnection = require('tarantool-driver');
var conn = new TarantoolConnection({port: 3301});
var insertTuple = [99999, "BB"];
conn.connect().then(function() {
    conn.auth("guest", "").then(function() {
        conn.insert(999, insertTuple).then(function() {
            console.log("Insert succeeded");
            process.exit(0);
    }, function(e) { console.log("Insert failed");  process.exit(1); });
    }, function(e) { console.log("Auth failed");    process.exit(1); });
    }, function(e) { console.log("Connect failed"); process.exit(1); });

The example program only shows one request and does not show all that’s necessary for good practice. For that, please see The node.js driver repository.

C#

The most commonly used C# driver is progaudi.tarantool, previously named tarantool-csharp. It is not supplied as part of the Tarantool repository; it must be installed separately. The makers recommend cross-platform installation using Nuget.

To be consistent with the other instructions in this chapter, here is a way to install the driver directly on Ubuntu 16.04.

  1. Install .net core from Microsoft. Follow .net core installation instructions.

Примечание

  • Mono will not work, nor will .Net from xbuild. Only .net core supported on Linux and Mac.
  • Read the Microsoft End User License Agreement first, because it is not an ordinary open-source agreement and there will be a message during installation saying «This software may collect information about you and your use of the software, and send that to Microsoft.» Still you can set environment variables to opt out from telemetry.
  1. Create a new console project.

    $ cd ~
    $ mkdir progaudi.tarantool.test
    $ cd progaudi.tarantool.test
    $ dotnet new console
    
  2. Add progaudi.tarantool reference.

    $ dotnet add package progaudi.tarantool
    
  3. Change code in Program.cs.

    $ cat <<EOT > Program.cs
    using System;
    using System.Threading.Tasks;
    using ProGaudi.Tarantool.Client;
    
    public class HelloWorld
    {
      static public void Main ()
      {
        Test().GetAwaiter().GetResult();
      }
      static async Task Test()
      {
        var box = await Box.Connect("127.0.0.1:3301");
        var schema = box.GetSchema();
        var space = await schema.GetSpace("examples");
        await space.Insert((99999, "BB"));
      }
    }
    EOT
    
  4. Build and run your application.

    Before trying to run, check that the server is listening at localhost:3301 and that the space examples exists, as described earlier.

    $ dotnet restore
    $ dotnet run
    

    The program will:

    • connect using an application-specific definition of the space,
    • open a socket connection with the Tarantool server at localhost:3301,
    • send an INSERT request, and — if all is well — end without saying anything.

    If Tarantool is not running on localhost with listen port = 3301, or if user „guest“ does not have authorization to connect, or if the INSERT request fails for any reason, the program will print an error message, among other things (stacktrace, etc).

The example program only shows one request and does not show all that’s necessary for good practice. For that, please see the progaudi.tarantool driver repository.

C

В этом разделе даны два примера использования высокоуровневого API для Tarantool’а и языка C.

Пример 1

Далее приводится пример полноценной программы на языке C, которая осуществляет вставку кортежа [99999,'B'] в пространство examples с помощью высокоуровневого Tarantool API для языка C.

#include <stdio.h>
#include <stdlib.h>

#include <tarantool/tarantool.h>
#include <tarantool/tnt_net.h>
#include <tarantool/tnt_opt.h>

void main() {
   struct tnt_stream *tnt = tnt_net(NULL);          /* См. ниже = НАСТРОЙКА */
   tnt_set(tnt, TNT_OPT_URI, "localhost:3301");
   if (tnt_connect(tnt) < 0) {                      /* См. ниже = СОЕДИНЕНИЕ */
       printf("Connection refused\n");
       exit(-1);
   }
   struct tnt_stream *tuple = tnt_object(NULL);     /* См. ниже = СОЗДАНИЕ ЗАПРОСА */
   tnt_object_format(tuple, "[%d%s]", 99999, "B");
   tnt_insert(tnt, 999, tuple);                     /* См. ниже = ОТПРАВКА ЗАПРОСА */
   tnt_flush(tnt);
   struct tnt_reply reply;  tnt_reply_init(&reply); /* См. ниже = ПОЛУЧЕНИЕ ОТВЕТА */
   tnt->read_reply(tnt, &reply);
   if (reply.code != 0) {
       printf("Insert failed %lu.\n", reply.code);
   }
   tnt_close(tnt);                                  /* См. ниже = ЗАВЕРШЕНИЕ */
   tnt_stream_free(tuple);
   tnt_stream_free(tnt);
}

Скопируйте исходный код программы в файл с именем example.c и установите коннектор tarantool-c. Вот один из способов установки tarantool-c (под Ubuntu):

$ git clone git://github.com/tarantool/tarantool-c.git ~/tarantool-c
$ cd ~/tarantool-c
$ git submodule init
$ git submodule update
$ cmake .
$ make
$ make install

Чтобы скомпилировать и слинковать тестовую программу, выполните следующую команду:

$ # иногда это необходимо:
$ export LD_LIBRARY_PATH=/usr/local/lib
$ gcc -o example example.c -ltarantool

Before trying to run, check that a server instance is listening at localhost:3301 and that the space examples exists, as described earlier. To run the program, say ./example. The program will connect to the Tarantool instance, and will send the request. If Tarantool is not running on localhost with listen address = 3301, the program will print “Connection refused”. If the insert fails, the program will print «Insert failed» and an error number (see all error codes in the source file /src/box/errcode.h).

Далее следуют примечания, на которые мы ссылались в комментариях к исходному коду тестовой программы.

НАСТРОЙКА: Настройка начинается с создания потока (tnt_stream).

struct tnt_stream *tnt = tnt_net(NULL);
tnt_set(tnt, TNT_OPT_URI, "localhost:3301");

In this program, the stream will be named tnt. Before connecting on the tnt stream, some options may have to be set. The most important option is TNT_OPT_URI. In this program, the URI is localhost:3301, since that is where the Tarantool instance is supposed to be listening.

Описание функции:

struct tnt_stream *tnt_net(struct tnt_stream *s)
int tnt_set(struct tnt_stream *s, int option, variant option-value)

CONNECT: Now that the stream named tnt exists and is associated with a URI, this example program can connect to a server instance.

if (tnt_connect(tnt) < 0)
   { printf("Connection refused\n"); exit(-1); }

Описание функции:

int tnt_connect(struct tnt_stream *s)

The connection might fail for a variety of reasons, such as: the server is not running, or the URI contains an invalid password. If the connection fails, the return value will be -1.

СОЗДАНИЕ ЗАПРОСА: В большинстве запросов требуется передавать структурированные данные, например содержимое кортежа.

struct tnt_stream *tuple = tnt_object(NULL);
tnt_object_format(tuple, "[%d%s]", 99999, "B");

In this program, the request will be an INSERT, and the tuple contents will be an integer and a string. This is a simple serial set of values, that is, there are no sub-structures or arrays. Therefore it is easy in this case to format what will be passed using the same sort of arguments that one would use with a C printf() function: %d for the integer, %s for the string, then the integer value, then a pointer to the string value.

Описание функции:

ssize_t tnt_object_format(struct tnt_stream *s, const char *fmt, ...)

ОТПРАВКА ЗАПРОСА: Отправка запросов на изменение данных в базе делается аналогично тому, как это делается в Tarantool-библиотеке box.

tnt_insert(tnt, 999, tuple);
tnt_flush(tnt);

В данной программе мы делаем INSERT-запрос. В этом запросе мы передаем поток tnt, который ранее использовали для установки соединения, и поток tuple, который также ранее настроили с помощью функции tnt_object_format().

Описание функции:

ssize_t tnt_insert(struct tnt_stream *s, uint32_t space, struct tnt_stream *tuple)
ssize_t tnt_replace(struct tnt_stream *s, uint32_t space, struct tnt_stream *tuple)
ssize_t tnt_select(struct tnt_stream *s, uint32_t space, uint32_t index,
                   uint32_t limit, uint32_t offset, uint8_t iterator,
                   struct tnt_stream *key)
ssize_t tnt_update(struct tnt_stream *s, uint32_t space, uint32_t index,
                   struct tnt_stream *key, struct tnt_stream *ops)

ПОЛУЧЕНИЕ ОТВЕТА: На большинство запросов клиент получает ответ, который содержит информацию о том, был ли данный запрос успешно выполнен, а также содержит набор кортежей.

struct tnt_reply reply;  tnt_reply_init(&reply);
tnt->read_reply(tnt, &reply);
if (reply.code != 0)
   { printf("Insert failed %lu.\n", reply.code); }

Данная программа проверяет, был ли запрос выполнен успешно, но никак не интерпретирует оставшуюся часть ответа.

Описание функции:

struct tnt_reply *tnt_reply_init(struct tnt_reply *r)
tnt->read_reply(struct tnt_stream *s, struct tnt_reply *r)
void tnt_reply_free(struct tnt_reply *r)

ЗАВЕРШЕНИЕ: По окончании сессии нам нужно закрыть соединение, созданное с помощью функции tnt_connect(), и удалить объекты, созданные на этапе настройки.

tnt_close(tnt);
tnt_stream_free(tuple);
tnt_stream_free(tnt);

Описание функции:

void tnt_close(struct tnt_stream *s)
void tnt_stream_free(struct tnt_stream *s)

Пример 2

Далее приводится еще один пример полноценной программы на языке C, которая осуществляет выборку по индекс-ключу [99999] из пространства examples с помощью высокоуровневого Tarantool API для языка C. Для вывода результатов в этой программе используются функции из библиотеки MsgPuck. Эти функции нужны для декодирования массивов значений в формате MessagePack.

#include <stdio.h>
#include <stdlib.h>
#include <tarantool/tarantool.h>
#include <tarantool/tnt_net.h>
#include <tarantool/tnt_opt.h>

#define MP_SOURCE 1
#include <msgpuck.h>

void main() {
    struct tnt_stream *tnt = tnt_net(NULL);
    tnt_set(tnt, TNT_OPT_URI, "localhost:3301");
    if (tnt_connect(tnt) < 0) {
        printf("Connection refused\n");
        exit(1);
    }
    struct tnt_stream *tuple = tnt_object(NULL);
    tnt_object_format(tuple, "[%d]", 99999); /* кортеж tuple = ключ для поиска */
    tnt_select(tnt, 999, 0, (2^32) - 1, 0, 0, tuple);
    tnt_flush(tnt);
    struct tnt_reply reply; tnt_reply_init(&reply);
    tnt->read_reply(tnt, &reply);
    if (reply.code != 0) {
        printf("Select failed.\n");
        exit(1);
    }
    char field_type;
    field_type = mp_typeof(*reply.data);
    if (field_type != MP_ARRAY) {
        printf("no tuple array\n");
        exit(1);
    }
    long unsigned int row_count;
    uint32_t tuple_count = mp_decode_array(&reply.data);
    printf("tuple count=%u\n", tuple_count);
    unsigned int i, j;
    for (i = 0; i < tuple_count; ++i) {
        field_type = mp_typeof(*reply.data);
        if (field_type != MP_ARRAY) {
            printf("no field array\n");
            exit(1);
        }
        uint32_t field_count = mp_decode_array(&reply.data);
        printf("  field count=%u\n", field_count);
        for (j = 0; j < field_count; ++j) {
            field_type = mp_typeof(*reply.data);
            if (field_type == MP_UINT) {
                uint64_t num_value = mp_decode_uint(&reply.data);
                printf("    value=%lu.\n", num_value);
            } else if (field_type == MP_STR) {
                const char *str_value;
                uint32_t str_value_length;
                str_value = mp_decode_str(&reply.data, &str_value_length);
                printf("    value=%.*s.\n", str_value_length, str_value);
            } else {
                printf("wrong field type\n");
                exit(1);
            }
        }
    }
    tnt_close(tnt);
    tnt_stream_free(tuple);
    tnt_stream_free(tnt);
}

Аналогично первому примеру, сохраните исходный код программы в файле с именем example2.c.

Чтобы скомпилировать и слинковать тестовую программу, выполните следующую команду:

$ gcc -o example2 example2.c -ltarantool

Для запуска программы выполните команду ./example2.

В этих двух программах мы привели пример использования лишь двух запросов. Для полноценной работы с Tarantool’ом с помощью C API, пожалуйста, обратитесь к документации из проекта tarantool-c на GitHub.

Интерпретация возвращаемых значений

При работе с любым Tarantool-коннектором функции, вызванные с помощью Tarantool’а, возвращают значения в формате MsgPack. Если функция была вызвана через API коннектора, то формат возвращаемых значений будет следующим: скалярные значения возвращаются в виде кортежей (сначала идет идентификатор типа из формата MsgPack, а затем идет значение); все прочие (не скалярные) значения возвращаются в виде групп кортежей (сначала идет идентификатор массива в формате MsgPack, а затем идут скалярные значения). Но если функция была вызвана в рамках бинарного протокола (с помощью команды eval), а не через API коннектора, то подобных изменений формата возвращаемых значений не происходит.

In the following example, a Lua function will be created. Since it will be accessed externally by a „guest“ user, a grant of an execute privilege will be necessary. The function returns an empty array, a scalar string, two booleans, and a short integer. The values are the ones described in the table Common Types and MsgPack Encodings.

tarantool> box.cfg{listen=3301}
2016-03-03 18:45:52.802 [27381] main/101/interactive I> ready to accept requests
---
...
tarantool> function f() return {},'a',false,true,127; end
---
...
tarantool> box.schema.func.create('f')
---
...
tarantool> box.schema.user.grant('guest','execute','function','f')
---
...

Далее идет пример программы на C, из который мы вызываем эту Lua-функцию. Хотя в примере использован код на C, результат будет одинаковым, на каком бы языке ни была написана вызываемая программа: Perl, PHP, Python, Go или Java.

#include <stdio.h>
#include <stdlib.h>
#include <tarantool/tarantool.h>
#include <tarantool/tnt_net.h>
#include <tarantool/tnt_opt.h>
void main() {
  struct tnt_stream *tnt = tnt_net(NULL);              /* SETUP */
  tnt_set(tnt, TNT_OPT_URI, "localhost:3301");
   if (tnt_connect(tnt) < 0) {                         /* CONNECT */
       printf("Connection refused\n");
       exit(-1);
   }
   struct tnt_stream *arg; arg = tnt_object(NULL);     /* MAKE REQUEST */
   tnt_object_add_array(arg, 0);
   struct tnt_request *req1 = tnt_request_call(NULL);  /* CALL function f() */
   tnt_request_set_funcz(req1, "f");
   uint64_t sync1 = tnt_request_compile(tnt, req1);
   tnt_flush(tnt);                                     /* SEND REQUEST */
   struct tnt_reply reply;  tnt_reply_init(&reply);    /* GET REPLY */
   tnt->read_reply(tnt, &reply);
   if (reply.code != 0) {
     printf("Call failed %lu.\n", reply.code);
     exit(-1);
   }
   const unsigned char *p= (unsigned char*)reply.data; /* PRINT REPLY */
   while (p < (unsigned char *) reply.data_end)
   {
     printf("%x ", *p);
     ++p;
   }
   printf("\n");
   tnt_close(tnt);                                     /* TEARDOWN */
   tnt_stream_free(arg);
   tnt_stream_free(tnt);
}

По завершении программа выведет на экран следующие значения:

dd 0 0 0 5 90 91 a1 61 91 c2 91 c3 91 7f

Первые пять байт — dd 0 0 0 5 — это фрагмент данных в формате MsgPack, означающий «32-битный заголовок массива со значением 5» (см. спецификацию на формат MsgPack). Остальные значения описаны в таблице Стандартные типы в MsgPack-кодировке.

Вопросы и ответы

Q:Чем особен Tarantool?
A:Tarantool is the latest generation of a family of in-memory data servers developed for web applications. It is the result of practical experience and trials within Mail.Ru since development began in 2008.
Q:Why Lua?
A:Lua is a lightweight, fast, extensible multi-paradigm language. Lua also happens to be very easy to embed. Lua coroutines relate very closely to Tarantool fibers, and Lua architecture works well with Tarantool internals. Lua acts well as a stored program language for Tarantool, although connecting with other languages is also easy.
Q:What’s the key advantage of Tarantool?
A:
Tarantool provides a rich database feature set (HASH, TREE, RTREE, BITSET indexes, secondary indexes, composite indexes, transactions, triggers, asynchronous replication) in a flexible environment of a Lua interpreter.
These two properties make it possible to be a fast, atomic and reliable in-memory data server which handles non-trivial application-specific logic. The advantage over traditional SQL servers is in performance: low-overhead, lock-free architecture means Tarantool can serve an order of magnitude more requests per second, on comparable hardware. The advantage over NoSQL alternatives is in flexibility: Lua allows flexible processing of data stored in a compact, denormalized format.
Q:Who is developing Tarantool?
A:There is an engineering team employed by Mail.Ru – check out our commit logs on github.com/tarantool. The development is fully open. Most of the connectors“ authors, and the maintainers for different distributions, come from the wider community.
Q:Are there problems associated with being an in-memory server?
A:The principal storage engine (memtx) is designed for RAM plus persistent storage. It is immune to data loss because there is a write-ahead log. Its memory-allocation and compression techniques ensure there is no waste. And if Tarantool runs out of memory, then it will stop accepting updates until more memory is available, but will continue to handle read and delete requests without difficulty. However, for databases which are much larger than the available RAM space, Tarantool has a second storage engine (vinyl) which is only limited by the available disk space.
Q:Can I store (large) BLOBs in Tarantool?
A:Starting with Tarantool 1.7, there is no «hard» limit for the maximal tuple size. Tarantool, however, is designed for high-velocity workload with a lot of small chunks. For example, when you change an existing tuple, Tarantool creates a new version of the tuple in memory. Thus, an optimal tuple size is within kilobytes.
Q:I delete data from vinyl, but disk usage stays the same. What gives?
A:Data you write to vinyl is persisted in append-only run files. These files are immutable, and to perform a delete, a deletion marker (tombstone) is written to a newer run file instead. On compaction, new and old run files are merged, and a new run file is produced. Independently, the checkpoint manager keeps track of all run files involved in a checkpoint, and deletes obsolete files once they are no longer needed.

Справочники

Built-in modules reference

This reference covers Tarantool’s built-in Lua modules.

Примечание

Some functions in these modules are analogs to functions from standard Lua libraries. For better results, we recommend using functions from Tarantool’s built-in modules.

Модуль box

As well as executing Lua chunks or defining their own functions, you can exploit Tarantool’s storage functionality with the box module and its submodules.

The contents of the box module can be inspected at runtime with box, with no arguments. The box module contains:

Submodule box.cfg

The box.cfg submodule is for administrators to specify all the server configuration parameters (see «Configuration reference» for a complete description of all configuration parameters). Use box.cfg without braces to get read-only access to those parameters.

Example:

tarantool> box.cfg
---
- checkpoint_count: 2
  too_long_threshold: 0.5
  slab_alloc_factor: 1.1
  memtx_max_tuple_size: 1048576
  background: false
  <...>
...

Вложенный модуль box.index

The box.index submodule provides read-only access for index definitions and index keys. Indexes are contained in box.space.space-name.index array within each space object. They provide an API for ordered iteration over tuples. This API is a direct binding to corresponding methods of index objects of type box.index in the storage engine.

объект box.index.index_object
index_object:unique

True if the index is unique, false if the index is not unique.

Rtype:boolean
index_object:type

Index type, „TREE“ or „HASH“ or „BITSET“ or „RTREE“.

index_object:parts

An array describing index key fields.

Rtype:table

Example:

tarantool> box.space.tester.index.primary
---
- unique: true
  parts:
  - type: unsigned
    fieldno: 1
  id: 0
  space_id: 513
  name: primary
  type: TREE
...
index_object:pairs([key[, iterator-type]])

Search for a tuple or a set of tuples via the given index, and allow iterating over one tuple at a time.

The key parameter specifies what must match within the index. The iterator parameter specifies the rule for matching and ordering. Different index types support different iterators. For example, a TREE index maintains a strict order of keys and can return all tuples in ascending or descending order, starting from the specified key. Other index types, however, do not support ordering.

To understand consistency of tuples returned by an iterator, it’s essential to know the principles of the Tarantool transaction processing subsystem. An iterator in Tarantool does not own a consistent read view. Instead, each procedure is granted exclusive access to all tuples and spaces until there is a «context switch»: which may happen due to the implicit yield rules, or by an explicit call to fiber.yield. When the execution flow returns to the yielded procedure, the data set could have changed significantly. Iteration, resumed after a yield point, does not preserve the read view, but continues with the new content of the database. The tutorial Indexed pattern search shows one way that iterators and yields can be used together.

Параметры:
  • index_object (index_object) – an object reference.
  • key (scalar/table) – value to be matched against the index key, which may be multi-part
  • iterator – as defined in tables below. The default iterator type is „EQ“
Return:

iterator which can be used in a for/end loop or with totable()

Possible errors: No such space; wrong type; Selected iteration type is not supported for the index type; or key is not supported for the iteration type.

Complexity factors: Index size, Index type; Number of tuples accessed.

A search-key-value can be a number (for example 1234), a string (for example 'abcd'), or a table of numbers and strings (for example {1234, 'abcd'}). Each part of a key will be compared to each part of an index key.

Iterator types for TREE indexes

Type Arguments Description
box.index.EQ or „EQ“ search value The comparison operator is „==“ (equal to). If an index key is equal to a search value, it matches. Tuples are returned in ascending order by index key. This is the default.
box.index.REQ or „REQ“ search value Matching is the same as for box.index.EQ. Tuples are returned in descending order by index key.
box.index.GT or „GT“ search value The comparison operator is „>“ (greater than). If an index key is greater than a search value, it matches. Tuples are returned in ascending order by index key.
box.index.GE or „GE“ search value The comparison operator is „>=“ (greater than or equal to). If an index key is greater than or equal to a search value, it matches. Tuples are returned in ascending order by index key.
box.index.ALL or „ALL“ search value Same as box.index.GE.
box.index.LT or „LT“ search value The comparison operator is „<“ (less than). If an index key is less than a search value, it matches. Tuples are returned in descending order by index key.
box.index.LE or „LE“ search value The comparison operator is „<=“ (less than or equal to). If an index key is less than or equal to a search value, it matches. Tuples are returned in descending order by index key.

Informally, we can state that searches with TREE indexes are generally what users will find is intuitive, provided that there are no nils and no missing parts. Formally, the logic is as follows. A search key has zero or more parts, for example {}, {1,2,3},{1,nil,3}. An index key has one or more parts, for example {1}, {1,2,3},{1,2,3}. A search key may contain nil (but not msgpack.NULL, which is the wrong type). An index key may not contain nil or msgpack.NULL, although a later version of Tarantool will have different rules – the behavior of searches with nil is subject to change. Possible iterators are LT, LE, EQ, REQ, GE, GT. A search key is said to «match» an index key if the following statements, which are pseudocode for the comparison operation, return TRUE.

If (number-of-search-key-parts > number-of-index-key-parts) return ERROR
If (number-of-search-key-parts == 0) return TRUE
for (i = 1; ; ++i)
{
  if (i > number-of-search-key-parts) OR (search-key-part[i] is nil)
  {
    if (iterator is LT or GT) return FALSE
    return TRUE
  }
  if (type of search-key-part[i] is not compatible with type of index-key-part[i])
  {
    return ERROR
  }
  if (search-key-part[i] == index-key-part[i])
  {
    if (iterator is LT or GT) return FALSE
    continue
  }
  if (search-key-part[i] > index-key-part[i])
  {
    if (iterator is EQ or REQ or LE or LT) return FALSE
    return TRUE
  }
  if (search-key-part[i] < index-key-part[i])
  {
    if (iterator is EQ or REQ or GE or GT) return FALSE
    return TRUE
  }
}

Iterator types for HASH indexes

Type Arguments Description
box.index.ALL none All index keys match. Tuples are returned in ascending order by hash of index key, which will appear to be random.
box.index.EQ or „EQ“ search value The comparison operator is „==“ (equal to). If an index key is equal to a search value, it matches. The number of returned tuples will be 0 or 1. This is the default.
box.index.GT or „GT“ search value The comparison operator is „>“ (greater than). If a hash of an index key is greater than a hash of a search value, it matches. Tuples are returned in ascending order by hash of index key, which will appear to be random. Provided that the space is not being updated, one can retrieve all the tuples in a space, N tuples at a time, by using {iterator=“GT“, limit=N} in each search, and using the last returned value from the previous result as the start search value for the next search.

Iterator types for BITSET indexes

Type Arguments Description
box.index.ALL or „ALL“ none All index keys match. Tuples are returned in their order within the space.
box.index.EQ or „EQ“ bitset value If an index key is equal to a bitset value, it matches. Tuples are returned in their order within the space. This is the default.
box.index.BITS_ALL_SET bitset value If all of the bits which are 1 in the bitset value are 1 in the index key, it matches. Tuples are returned in their order within the space.
box.index.BITS_ANY_SET bitset value If any of the bits which are 1 in the bitset value are 1 in the index key, it matches. Tuples are returned in their order within the space.
box.index.BITS_ALL_NOT_SET bitset value If all of the bits which are 1 in the bitset value are 0 in the index key, it matches. Tuples are returned in their order within the space.

Iterator types for RTREE indexes

Type Arguments Description
box.index.ALL or „ALL“ none All keys match. Tuples are returned in their order within the space.
box.index.EQ or „EQ“ search value If all points of the rectangle-or-box defined by the search value are the same as the rectangle-or-box defined by the index key, it matches. Tuples are returned in their order within the space. «Rectangle-or-box» means «rectangle-or-box as explained in section about RTREE». This is the default.
box.index.GT or „GT“ search value If all points of the rectangle-or-box defined by the search value are within the rectangle-or-box defined by the index key, it matches. Tuples are returned in their order within the space.
box.index.GE or „GE“ search value If all points of the rectangle-or-box defined by the search value are within, or at the side of, the rectangle-or-box defined by the index key, it matches. Tuples are returned in their order within the space.
box.index.LT or „LT“ search value If all points of the rectangle-or-box defined by the index key are within the rectangle-or-box defined by the search key, it matches. Tuples are returned in their order within the space.
box.index.LE or „LE“ search value If all points of the rectangle-or-box defined by the index key are within, or at the side of, the rectangle-or-box defined by the search key, it matches. Tuples are returned in their order within the space.
box.index.OVERLAPS or „OVERLAPS“ search values If some points of the rectangle-or-box defined by the search value are within the rectangle-or-box defined by the index key, it matches. Tuples are returned in their order within the space.
box.index.NEIGHBOR or „NEIGHBOR“ search value If some points of the rectangle-or-box defined by the defined by the key are within, or at the side of, defined by the index key, it matches. Tuples are returned in order: nearest neighbor first.

First Example of index pairs():

Default „TREE“ Index and pairs() function:

tarantool> s = box.schema.space.create('space17')
---
...
tarantool> s:create_index('primary', {
         >   parts = {1, 'string', 2, 'string'}
         > })
---
...
tarantool> s:insert{'C', 'C'}
---
- ['C', 'C']
...
tarantool> s:insert{'B', 'A'}
---
- ['B', 'A']
...
tarantool> s:insert{'C', '!'}
---
- ['C', '!']
...
tarantool> s:insert{'A', 'C'}
---
- ['A', 'C']
...
tarantool> function example()
         >   for _, tuple in
         >     s.index.primary:pairs(nil, {
         >         iterator = box.index.ALL}) do
         >       print(tuple)
         >   end
         > end
---
...
tarantool> example()
['A', 'C']
['B', 'A']
['C', '!']
['C', 'C']
---
...
tarantool> s:drop()
---
...

Second Example of index pairs():

This Lua code finds all the tuples whose primary key values begin with „XY“. The assumptions include that there is a one-part primary-key TREE index on the first field, which must be a string. The iterator loop ensures that the search will return tuples where the first value is greater than or equal to „XY“. The conditional statement within the loop ensures that the looping will stop when the first two letters are not „XY“.

for _, tuple in
box.space.t.index.primary:pairs("XY",{iterator = "GE"}) do
  if (string.sub(tuple[1], 1, 2) ~= "XY") then break end
  print(tuple)
end

Third Example of index pairs():

This Lua code finds all the tuples whose primary key values are greater than or equal to 1000, and less than or equal to 1999 (this type of request is sometimes called a «range search» or a «between search»). The assumptions include that there is a one-part primary-key TREE index on the first field, which must be a number. The iterator loop ensures that the search will return tuples where the first value is greater than or equal to 1000. The conditional statement within the loop ensures that the looping will stop when the first value is greater than 1999.

for _, tuple in
box.space.t2.index.primary:pairs(1000,{iterator = "GE"}) do
  if (tuple[1] > 1999) then break end
  print(tuple)
end
index_object:select(search-key, options)

This is an alternative to box.space…select() which goes via a particular index and can make use of additional parameters that specify the iterator type, and the limit (that is, the maximum number of tuples to return) and the offset (that is, which tuple to start with in the list).

Параметры:
  • index_object (index_object) – an object reference.
  • key (scalar/table) – values to be matched against the index key
  • options (table/nil) – none, any or all of next parameters
  • options.iterator – type of iterator
  • options.limit (number) – максимальное количество таплов
  • options.offset (number) – start tuple number
Return:

the tuple or tuples that match the field values.

Rtype:

array of tuples

Example:

-- Create a space named tester.
tarantool> sp = box.schema.space.create('tester')
-- Create a unique index 'primary'
-- which won't be needed for this example.
tarantool> sp:create_index('primary', {parts = {1, 'unsigned' }})
-- Create a non-unique index 'secondary'
-- with an index on the second field.
tarantool> sp:create_index('secondary', {
         >   type = 'tree',
         >   unique = false,
         >   parts = {2, 'string'}
         > })
-- Insert three tuples, values in field[2]
-- equal to 'X', 'Y', and 'Z'.
tarantool> sp:insert{1, 'X', 'Row with field[2]=X'}
tarantool> sp:insert{2, 'Y', 'Row with field[2]=Y'}
tarantool> sp:insert{3, 'Z', 'Row with field[2]=Z'}
-- Select all tuples where the secondary index
-- keys are greater than 'X'.`
tarantool> sp.index.secondary:select({'X'}, {
         >   iterator = 'GT',
         >   limit = 1000
         > })

The result will be a table of tuple and will look like this:

---
- - [2, 'Y', 'Row with field[2]=Y']
  - [3, 'Z', 'Row with field[2]=Z']
...

Примечание

index.index-name is optional. If it is omitted, then the assumed index is the first (primary-key) index. Therefore, for the example above, box.space.tester:select({1}, {iterator = 'GT'}) would have returned the same two rows, via the „primary“ index.

Примечание

iterator = iterator-type is optional. If it is omitted, then iterator = 'EQ' is assumed.

Примечание

field-value [, field-value ] is optional. If it is omitted, then every key in the index is considered to be a match, regardless of iterator type. Therefore, for the example above, box.space.tester:select{} will select every tuple in the tester space via the first (primary-key) index.

Примечание

box.space.space-name.index.index-name:select(...)[1]`. can be replaced by box.space.space-name.index.index-name:get(...). That is, get can be used as a convenient shorthand to get the first tuple in the tuple set that would be returned by select. However, if there is more than one tuple in the tuple set, then get returns an error.

Example with BITSET index:

The following script shows creation and search with a BITSET index. Notice: BITSET cannot be unique, so first a primary-key index is created. Notice: bit values are entered as hexadecimal literals for easier reading.

tarantool> s = box.schema.space.create('space_with_bitset')
tarantool> s:create_index('primary_index', {
         >   parts = {1, 'string'},
         >   unique = true,
         >   type = 'TREE'
         > })
tarantool> s:create_index('bitset_index', {
         >   parts = {2, 'unsigned'},
         >   unique = false,
         >   type = 'BITSET'
         > })
tarantool> s:insert{'Tuple with bit value = 01', 0x01}
tarantool> s:insert{'Tuple with bit value = 10', 0x02}
tarantool> s:insert{'Tuple with bit value = 11', 0x03}
tarantool> s.index.bitset_index:select(0x02, {
         >   iterator = box.index.EQ
         > })
---
- - ['Tuple with bit value = 10', 2]
...
tarantool> s.index.bitset_index:select(0x02, {
         >   iterator = box.index.BITS_ANY_SET
         > })
---
- - ['Tuple with bit value = 10', 2]
  - ['Tuple with bit value = 11', 3]
...
tarantool> s.index.bitset_index:select(0x02, {
         >   iterator = box.index.BITS_ALL_SET
         > })
---
- - ['Tuple with bit value = 10', 2]
  - ['Tuple with bit value = 11', 3]
...
tarantool> s.index.bitset_index:select(0x02, {
         >   iterator = box.index.BITS_ALL_NOT_SET
         > })
---
- - ['Tuple with bit value = 01', 1]
...
index_object:get(key)

Search for a tuple via the given index, as described earlier.

Параметры:
  • index_object (index_object) – an object reference.
  • key (scalar/table) – values to be matched against the index key
Return:

the tuple whose index-key fields are equal to the passed key values.

Rtype:

tuple

Possible errors: No such index; wrong type; more than one tuple matches.

Complexity factors: Index size, Index type. See also space_object:get().

Example:

tarantool> box.space.tester.index.primary:get(2)
---
- [2, 'Music']
...
index_object:min([key])

Find the minimum value in the specified index.

Параметры:
  • index_object (index_object) – an object reference.
  • key (scalar/table) – values to be matched against the index key
Return:

the tuple for the first key in the index. If optional key-value is supplied, returns the first key which is greater than or equal to key-value.

Rtype:

tuple

Possible errors: index is not of type „TREE“.

Complexity factors: Index size, Index type.

Example:

tarantool> box.space.tester.index.primary:min()
---
- ['Alpha!', 55, 'This is the first tuple!']
...
index_object:max([key])

Find the maximum value in the specified index.

Параметры:
  • index_object (index_object) – an object reference.
  • key (scalar/table) – values to be matched against the index key
Return:

the tuple for the last key in the index. If optional key-value is supplied, returns the last key which is less than or equal to key-value.

Rtype:

tuple

Possible errors: index is not of type „TREE“.

Complexity factors: Index size, Index type.

Example:

tarantool> box.space.tester.index.primary:max()
---
- ['Gamma!', 55, 'This is the third tuple!']
...
index_object:random(seed)

Find a random value in the specified index. This method is useful when it’s important to get insight into data distribution in an index without having to iterate over the entire data set.

Параметры:
  • index_object (index_object) – an object reference.
  • seed (number) – an arbitrary non-negative integer
Return:

the tuple for the random key in the index.

Rtype:

tuple

Complexity factors: Index size, Index type.

Примечание

Примечание:
vinyl does not support random().

Example:

tarantool> box.space.tester.index.secondary:random(1)
---
- ['Beta!', 66, 'This is the second tuple!']
...
index_object:count([key][, iterator])

Iterate over an index, counting the number of tuples which match the key-value.

Параметры:
  • index_object (index_object) – an object reference.
  • key (scalar/table) – values to be matched against the index key
  • iterator – comparison method
Return:

the number of matching index keys.

Rtype:

number

Example:

tarantool> box.space.tester.index.primary:count(999)
---
- 0
...
tarantool> box.space.tester.index.primary:count('Alpha!', { iterator = 'LE' })
---
- 1
...
index_object:update(key, {{operator, field_no, value}, ...})

Update a tuple.

Same as box.space…update(), but key is searched in this index instead of primary key. This index ought to be unique.

Параметры:
  • index_object (index_object) – an object reference.
  • key (scalar/table) – values to be matched against the index key
  • operator (string) – operation type represented in string
  • field_no (number) – what field the operation will apply to. The field number can be negative, meaning the position from the end of tuple. (#tuple + negative field number + 1)
  • value (lua_value) – what value will be applied
Return:

the updated tuple.

Rtype:

tuple

index_object:delete(key)

Delete a tuple identified by a key.

Same as box.space…delete(), but key is searched in this index instead of in the primary-key index. This index ought to be unique.

Параметры:
  • index_object (index_object) – an object reference.
  • key (scalar/table) – values to be matched against the index key
Return:

the deleted tuple.

Rtype:

tuple

Примечание

Примечание:
vinyl will return nil, rather than the deleted tuple.
index_object:alter({options})

Alter an index.

Параметры:
Return:

nil

Possible errors: Index does not exist, or the first index cannot be changed to {unique = false}, or the alter function is only applicable for the memtx storage engine.

Примечание

Примечание:
vinyl does not support alter().

Example:

tarantool> box.space.space55.index.primary:alter({type = 'HASH'})
---
...
index_object:drop()

Drop an index. Dropping a primary-key index has a side effect: all tuples are deleted.

Параметры:
Return:

nil.

Possible errors: Index does not exist, or a primary-key index cannot be dropped while a secondary-key index exists.

Example:

tarantool> box.space.space55.index.primary:drop()
---
...
index_object:rename(index-name)

Rename an index.

Параметры:
Return:

nil

Possible errors: index_object does not exist.

Example:

tarantool> box.space.space55.index.primary:rename('secondary')
---
...

Complexity factors: Index size, Index type, Number of tuples accessed.

index_object:bsize()

Return the total number of bytes taken by the index.

Параметры:
Return:

number of bytes

Rtype:

number

Example showing use of the box functions

This example will work with the sandbox configuration described in the preface. That is, there is a space named tester with a numeric primary key. The example function will:

  • select a tuple whose key value is 1000;
  • return an error if the tuple already exists and already has 3 fields;
  • Insert or replace the tuple with:
    • field[1] = 1000
    • field[2] = a uuid
    • field[3] = number of seconds since 1970-01-01;
  • Get field[3] from what was replaced;
  • Format the value from field[3] as yyyy-mm-dd hh:mm:ss.ffff;
  • Return the formatted value.

The function uses Tarantool box functions box.space…select, box.space…replace, fiber.time, uuid.str. The function uses Lua functions os.date() and string.sub().

function example()
  local a, b, c, table_of_selected_tuples, d
  local replaced_tuple, time_field
  local formatted_time_field
  local fiber = require('fiber')
  table_of_selected_tuples = box.space.tester:select{1000}
  if table_of_selected_tuples ~= nil then
    if table_of_selected_tuples[1] ~= nil then
      if #table_of_selected_tuples[1] == 3 then
        box.error({code=1, reason='This tuple already has 3 fields'})
      end
    end
  end
  replaced_tuple = box.space.tester:replace
    {1000,  require('uuid').str(), tostring(fiber.time())}
  time_field = tonumber(replaced_tuple[3])
  formatted_time_field = os.date("%Y-%m-%d %H:%M:%S", time_field)
  c = time_field % 1
  d = string.sub(c, 3, 6)
  formatted_time_field = formatted_time_field .. '.' .. d
  return formatted_time_field
end

… And here is what happens when one invokes the function:

tarantool> box.space.tester:delete(1000)
---
- [1000, '264ee2da03634f24972be76c43808254', '1391037015.6809']
...
tarantool> example(1000)
---
- 2014-01-29 16:11:51.1582
...
tarantool> example(1000)
---
- error: 'This tuple already has 3 fields'
...
Example showing a user-defined iterator

Here is an example that shows how to build one’s own iterator. The paged_iter function is an «iterator function», which will only be understood by programmers who have read the Lua manual section Iterators and Closures. It does paginated retrievals, that is, it returns 10 tuples at a time from a table named «t», whose primary key was defined with create_index('primary',{parts={1,'string'}}).

function paged_iter(search_key, tuples_per_page)
  local iterator_string = "GE"
  return function ()
  local page = box.space.t.index[0]:select(search_key,
    {iterator = iterator_string, limit=tuples_per_page})
  if #page == 0 then return nil end
  search_key = page[#page][1]
  iterator_string = "GT"
  return page
  end
end

Programmers who use paged_iter do not need to know why it works, they only need to know that, if they call it within a loop, they will get 10 tuples at a time until there are no more tuples. In this example the tuples are merely printed, a page at a time. But it should be simple to change the functionality, for example by yielding after each retrieval, or by breaking when the tuples fail to match some additional criteria.

for page in paged_iter("X", 10) do
  print("New Page. Number Of Tuples = " .. #page)
  for i = 1, #page, 1 do
    print(page[i])
  end
end
Submodule box.index with index type = RTREE for spatial searches

The box.index submodule may be used for spatial searches if the index type is RTREE. There are operations for searching rectangles (geometric objects with 4 corners and 4 sides) and boxes (geometric objects with more than 4 corners and more than 4 sides, sometimes called hyperrectangles). This manual uses the term rectangle-or-box for the whole class of objects that includes both rectangles and boxes. Only rectangles will be illustrated.

Rectangles are described according to their X-axis (horizontal axis) and Y-axis (vertical axis) coordinates in a grid of arbitrary size. Here is a picture of four rectangles on a grid with 11 horizontal points and 11 vertical points:

           X AXIS
           1   2   3   4   5   6   7   8   9   10  11
        1
        2  #-------+                                           <-Rectangle#1
Y AXIS  3  |       |
        4  +-------#
        5          #-----------------------+                   <-Rectangle#2
        6          |                       |
        7          |   #---+               |                   <-Rectangle#3
        8          |   |   |               |
        9          |   +---#               |
        10         +-----------------------#
        11                                     #               <-Rectangle#4

The rectangles are defined according to this scheme: {X-axis coordinate of top left, Y-axis coordinate of top left, X-axis coordinate of bottom right, Y-axis coordinate of bottom right} – or more succinctly: {x1,y1,x2,y2}. So in the picture … Rectangle#1 starts at position 1 on the X axis and position 2 on the Y axis, and ends at position 3 on the X axis and position 4 on the Y axis, so its coordinates are {1,2,3,4}. Rectangle#2’s coordinates are {3,5,9,10}. Rectangle#3’s coordinates are {4,7,5,9}. And finally Rectangle#4’s coordinates are {10,11,10,11}. Rectangle#4 is actually a «point» since it has zero width and zero height, so it could have been described with only two digits: {10,11}.

Some relationships between the rectangles are: «Rectangle#1’s nearest neighbor is Rectangle#2», and «Rectangle#3 is entirely inside Rectangle#2».

Now let us create a space and add an RTREE index.

tarantool> s = box.schema.space.create('rectangles')
tarantool> i = s:create_index('primary', {
         >   type = 'HASH',
         >   parts = {1, 'unsigned'}
         > })
tarantool> r = s:create_index('rtree', {
         >   type = 'RTREE',
         >   unique = false,
         >   parts = {2, 'ARRAY'}
         > })

Field#1 doesn’t matter, we just make it because we need a primary-key index. (RTREE indexes cannot be unique and therefore cannot be primary-key indexes.) The second field must be an «array», which means its values must represent {x,y} points or {x1,y1,x2,y2} rectangles. Now let us populate the table by inserting two tuples, containing the coordinates of Rectangle#2 and Rectangle#4.

tarantool> s:insert{1, {3, 5, 9, 10}}
tarantool> s:insert{2, {10, 11}}

And now, following the description of RTREE iterator types, we can search the rectangles with these requests:

tarantool> r:select({10, 11, 10, 11}, {iterator = 'EQ'})
---
- - [2, [10, 11]]
...
tarantool> r:select({4, 7, 5, 9}, {iterator = 'GT'})
---
- - [1, [3, 5, 9, 10]]
...
tarantool> r:select({1, 2, 3, 4}, {iterator = 'NEIGHBOR'})
---
- - [1, [3, 5, 9, 10]]
  - [2, [10, 11]]
...

Request#1 returns 1 tuple because the point {10,11} is the same as the rectangle {10,11,10,11} («Rectangle#4» in the picture). Request#2 returns 1 tuple because the rectangle {4,7,5,9}, which was «Rectangle#3» in the picture, is entirely within{3,5,9,10} which was Rectangle#2. Request#3 returns 2 tuples, because the NEIGHBOR iterator always returns all tuples, and the first returned tuple will be {3,5,9,10} («Rectangle#2» in the picture) because it is the closest neighbor of {1,2,3,4} («Rectangle#1» in the picture).

Now let us create a space and index for cuboids, which are rectangle-or-boxes that have 6 corners and 6 sides.

tarantool> s = box.schema.space.create('R')
tarantool> i = s:create_index('primary', {parts = {1, 'unsigned'}})
tarantool> r = s:create_index('S', {
         >   type = 'RTREE',
         >   unique = false,
         >   dimension = 3,
         >   parts = {2, 'ARRAY'}
         > })

The additional option here is dimension=3. The default dimension is 2, which is why it didn’t need to be specified for the examples of rectangle. The maximum dimension is 20. Now for insertions and selections there will usually be 6 coordinates. For example:

tarantool> s:insert{1, {0, 3, 0, 3, 0, 3}}
tarantool> r:select({1, 2, 1, 2, 1, 2}, {iterator = box.index.GT})

Now let us create a space and index for Manhattan-style spatial objects, which are rectangle-or-boxes that have a different way to calculate neighbors.

tarantool> s = box.schema.space.create('R')
tarantool> i = s:create_index('primary', {parts = {1, 'unsigned'}})
tarantool> r = s:create_index('S', {
         >   type = 'RTREE',
         >   unique = false,
         >   distance = 'manhattan',
         >   parts = {2, 'ARRAY'}
         > })

The additional option here is distance='manhattan'. The default distance calculator is „euclid“, which is the straightforward as-the-crow-flies method. The optional distance calculator is „manhattan“, which can be a more appropriate method if one is following the lines of a grid rather than traveling in a straight line.

tarantool> s:insert{1, {0, 3, 0, 3}}
tarantool> r:select({1, 2, 1, 2}, {iterator = box.index.NEIGHBOR})

More examples of spatial searching are online in the file R tree index quick start and usage.

Submodule box.info

The box.info submodule provides access to information about server instance variables.

  • version is the Tarantool version. This value is also shown by tarantool –version.
  • id corresponds to replication.id (see below).
  • ro is true if the instance is in «read-only» mode (same as read_only in box.cfg{}).
  • vclock corresponds to replication.downstream.vclock (see below).
  • uptime is the number of seconds since the instance started. This value can also be retrieved with tarantool.uptime().
  • lsn corresponds to replication.lsn (see below).
  • vinyl returns runtime statistics for vinyl storage engine.
  • cluster.uuid is the UUID of the replica set. Every instance in a replica set will have the same cluster.uuid value. This value is also stored in box.space._schema system space.
  • pid is the process ID. This value is also shown by tarantool module and by the Linux command ps -A.
  • status corresponds to replication.upstream.status (see below).
  • signature is the sum of all lsn values from the vector clocks (vclock) of all instances in the replica set.
  • uuid corresponds to replication.uuid (see below).

replication part contains statistics for all instances in the replica set in regard to the current instance (see an example in the section «Monitoring a replica set»):

  • replication.id is a short numeric identifier of the instance within the replica set.

  • replication.uuid is a globally unique identifier of the instance. This value is also stored in box.space._cluster system space.

  • replication.lsn is the log sequence number (LSN) for the latest entry in the instance’s write ahead log (WAL).

  • replication.upstream contains statistics for the replication data uploaded by the instance.

  • replication.upstream.status is the replication status of the instance.

    • auth means that the instance is getting authenticated to connect to a replication source.
    • connecting means that the instance is trying to connect to the replications source(s) listed in its replication parameter.
    • disconnected means that the instance is not connected to the replica set (due to network problems, not replication errors).
    • follow means that the instance’s role is «replica» (read-only) and replication is in progress.
    • running means the instance’s role is «master» (non read-only) and replication is in progress.
    • stopped means that replication was stopped due to a replication error (e.g. duplicate key).
  • replication.upstream.idle is the time (in seconds) since the instance received the last event from a master.

  • replication.upstream.lag is the time difference between the local time at the instance, recorded when the event was received, and the local time at another master recorded when the event was written to the write ahead log on that master.

    Since lag calculation uses operating system clock from two different machines, don’t be surprised if it’s negative: a time drift may lead to the remote master clock being consistently behind the local instance’s clock.

    For multi-master configurations, this is the maximal lag.

  • replication.downstream contains statistics for the replication data requested and downloaded from the instance.

  • replication.downstream.vclock is the instance’s vector clock, which contains a pair „id, lsn“.

box.info()

Since box.info contents are dynamic, it’s not possible to iterate over keys with the Lua pairs() function. For this purpose, box.info() builds and returns a Lua table with all keys and values provided in the submodule.

Return:keys and values in the submodule.
Rtype:table

Example:

tarantool> box.info
---
- version: 1.7.4-52-g980d30092
  id: 1
  ro: false
  vclock: {1: 8}
  uptime: 7280
  lsn: 8
  vinyl: []
  cluster:
    uuid: f7c0c1c6-f9d8-4df7-82ff-d4bd00610a6c
  pid: 16162
  status: running
  signature: 8
  replication:
    1:
      id: 1
      uuid: 1899631e-6369-40a1-81c9-7d170e909276
      lsn: 8
    2:
      id: 2
      uuid: bd949e5d-7ff9-413e-b4f2-c9b0149fdda6
      lsn: 0
      upstream:
        status: follow
        idle: 7256.7571430206
        lag: 0
      downstream:
        vclock: {1: 8}
    3:
      id: 3
      uuid: c5cb61d5-fa48-460d-abd7-3f13709d07a7
      lsn: 0
      upstream:
        status: follow
        idle: 7255.7510120869
        lag: 0
      downstream:
        vclock: {1: 8}
  uuid: 1899631e-6369-40a1-81c9-7d170e909276
...

Функция box.once

box.once(key, function[, ...])

Execute a function, provided it has not been executed before. A passed value is checked to see whether the function has already been executed. If it has been executed before, nothing happens. If it has not been executed before, the function is invoked.

See an example of using box.once() while bootstrapping a replica set.

If an error occurs inside box.once() when initializing a database, you can re-execute the failed box.once() block without stopping the database. The solution is to delete the once object from the system space _schema. Say box.space._schema:select{}, find your once object there and delete it. For example, re-executing a block with key='hello' :

tarantool> box.space._schema:select{}
---
- - ['cluster', 'b4e15788-d962-4442-892e-d6c1dd5d13f2']
  - ['max_id', 512]
  - ['oncebye']
  - ['oncehello']
  - ['version', 1, 7, 2]
...

tarantool> box.space._schema:delete('oncehello')
---
- ['oncehello']
...

tarantool> box.once('hello', function() end)
---
...
Параметры:
  • key (string) – a value that will be checked
  • function (function) – a function
  • ... – arguments that must be passed to function

Вложенный модуль box.schema

The box.schema submodule has data-definition functions for spaces, users, roles, function tuples, and sequences.

box.schema.space.create(space-name[, {options}])

Create a space.

Параметры:
  • space-name (string) – name of space, which should not be a number and should not contain special characters
  • options (table) – see «Options for box.schema.space.create» chart, below
Return:

space object

Rtype:

userdata

Options for box.schema.space.create

Name Эффект Type Default
temporary space contents are temporary: changes are not stored in the write-ahead log and there is no replication. Note re storage engine: vinyl does not support temporary spaces. boolean false
id unique identifier: users can refer to spaces with the id instead of the name number last space’s id, +1
field_count fixed count of fields: for example if field_count=5, it is illegal to insert a tuple with fewer than or more than 5 fields number 0 i.e. not fixed
if_not_exists create space only if a space with the same name does not exist already, otherwise do nothing but do not cause an error boolean false
engine „memtx“ or „vinyl“ string „memtx“
user name of the user who is considered to be the space’s owner for authorization purposes string current user’s name
format field names and types: See the illustrations of format clauses in the space_object:format() description and in the box.space._space example. Optional and usually not specified. table (blank)

There are three syntax variations for object references targeting space objects, for example box.schema.space.drop(space-id) will drop a space. However, the common approach is to use functions attached to the space objects, for example space_object:drop().

Пример

tarantool> s = box.schema.space.create('space55')
---
...
tarantool> s = box.schema.space.create('space55', {
         >   id = 555,
         >   temporary = false
         > })
---
- error: Space 'space55' already exists
...
tarantool> s = box.schema.space.create('space55', {
         >   if_not_exists = true
         > })
---
...

After a space is created, usually the next step is to create an index for it, and then it is available for insert, select, and all the other box.space functions.

box.schema.user.create(user-name[, {options}])

Create a user. For explanation of how Tarantool maintains user data, see section Users and reference on _user space.

The possible options are:

  • if_not_exists = true|false (default = false) - boolean; true means there should be no error if the user already exists,
  • password (default = „“) - string; the password = password specification is good because in a URI (Uniform Resource Identifier) it is usually illegal to include a user-name without a password.

Примечание

The maximum number of users is 32.

Параметры:
  • user-name (string) – name of user, which should not be a number and should not contain special characters
  • options (table) – if_not_exists, password
Return:

nil

Примеры:

box.schema.user.create('Lena')
box.schema.user.create('Lena', {password = 'X'})
box.schema.user.create('Lena', {if_not_exists = false})
box.schema.user.drop(user-name[, {options}])

Drop a user. For explanation of how Tarantool maintains user data, see section Users and reference on _user space.

Параметры:
  • user-name (string) – the name of the user
  • options (table) – if_exists = true|false (default = false) - boolean; true means there should be no error if the user does not exist.

Примеры:

box.schema.user.drop('Lena')
box.schema.user.drop('Lena',{if_exists=false})
box.schema.user.exists(user-name)

Return true if a user exists; return false if a user does not exist. For explanation of how Tarantool maintains user data, see section Users and reference on _user space.

Параметры:
  • user-name (string) – the name of the user
Rtype:

bool

Example:

box.schema.user.exists('Lena')
box.schema.user.grant(user-name, privileges, object-type, object-name[, {options}])
box.schema.user.grant(user-name, privileges, 'universe'[, nil, {options}])
box.schema.user.grant(user-name, role-name[, nil, nil, {options}])

Grant privileges to a user or to another role.

Параметры:
  • user-name (string) – the name of the user
  • privileges (string) – „read“ or „write“ or „execute“ or a combination,
  • object-type (string) – „space“ or „function“ or „sequence“.
  • object-name (string) – name of object to grant permissions to
  • role-name (string) – name of role to grant to user.
  • options (table) – grantor, if_not_exists

If 'function','object-name' is specified, then a _func tuple with that object-name must exist.

Variation: instead of object-type, object-name say „universe“ which means „all object-types and all objects“. In this case, object name is omitted.

Variation: instead of privilege, object-type, object-name say role-name (see section Roles).

The possible options are:

  • grantor = grantor_name_or_id – string or number, for custom grantor,
  • if_not_exists = true|false (default = false) - boolean; true means there should be no error if the user already has the privilege.

Example:

box.schema.user.grant('Lena', 'read', 'space', 'tester')
box.schema.user.grant('Lena', 'execute', 'function', 'f')
box.schema.user.grant('Lena', 'read,write', 'universe')
box.schema.user.grant('Lena', 'Accountant')
box.schema.user.grant('Lena', 'read,write,execute', 'universe')
box.schema.user.grant('X', 'read', 'universe', nil, {if_not_exists=true}))
box.schema.user.revoke(user-name, privilege, object-type, object-name)
box.schema.user.revoke(user-name, privilege, 'role', role-name)

Revoke privileges from a user or from another role.

Параметры:
  • user-name (string) – the name of the user
  • privilege (string) – „read“ or „write“ or „execute“ or a combination
  • object-type (string) – „space“ or „function“ or „sequence“
  • object-name (string) – the name of a function or space or sequence

The user must exist, and the object must exist, but it is not an error if the user does not have the privilege.

Variation: instead of object-type, object-name say „universe“ which means „all object-types and all objects“.

Variation: instead of privilege, object-type, object-name say role-name (see section Roles).

Example:

box.schema.user.revoke('Lena', 'read', 'space', 'tester')
box.schema.user.revoke('Lena', 'execute', 'function', 'f')
box.schema.user.revoke('Lena', 'read,write', 'universe')
box.schema.user.revoke('Lena', 'Accountant')
box.schema.user.password(password)

Return a hash of a user’s password. For explanation of how Tarantool maintains passwords, see section Passwords and reference on _user space.

Примечание

  • If a non-„guest“ user has no password, it’s impossible to connect to Tarantool using this user. The user is regarded as “internal” only, not usable from a remote connection. Such users can be useful if they have defined some procedures with the SETUID option, on which privileges are granted to externally-connectable users. This way, external users cannot create/drop objects, they can only invoke procedures.
  • For the „guest“ user, it’s impossible to set a password: that would be misleading, since „guest“ is the default user on a newly-established connection over a binary port, and Tarantool does not require a password to establish a binary connection. It is, however, possible to change the current user to ‘guest’ by providing the AUTH packet with no password at all or an empty password. This feature is useful for connection pools, which want to reuse a connection for a different user without re-establishing it.
Параметры:
  • password (string) – password to be hashed
Rtype:

string

Example:

box.schema.user.password('ЛЕНА')
box.schema.user.passwd([user-name, ]password)

Associate a password with the user who is currently logged in, or with the user specified by user-name. The user must exist and must not be „guest“.

Users who wish to change their own passwords should use box.schema.user.passwd(password) syntax.

Administrators who wish to change passwords of other users should use box.schema.user.passwd(user-name, password) syntax.

Параметры:
  • user-name (string) – user-name
  • password (string) – password

Example:

box.schema.user.passwd('ЛЕНА')
box.schema.user.passwd('Lena', 'ЛЕНА')
box.schema.user.info([user-name])

Return a description of a user’s privileges. For explanation of how Tarantool maintains user data, see section Users and reference on _user space.

Параметры:
  • user-name (string) – the name of the user. This is optional; if it is not supplied, then the information will be for the user who is currently logged in.

Example:

box.schema.user.info()
box.schema.user.info('Lena')
box.schema.role.create(role-name[, {options}])

Create a role. For explanation of how Tarantool maintains role data, see section Roles.

Параметры:
  • role-name (string) – name of role, which should not be a number and should not contain special characters
  • options (table) – if_not_exists = true|false (default = false) - boolean; true means there should be no error if the role already exists
Return:

nil

Example:

box.schema.role.create('Accountant')
box.schema.role.create('Accountant', {if_not_exists = false})
box.schema.role.drop(role-name[, {options}])

Drop a role. For explanation of how Tarantool maintains role data, see section Roles.

Параметры:
  • role-name (string) – the name of the role
  • options (table) – if_exists = true|false (default = false) - boolean; true means there should be no error if the role does not exist.

Example:

box.schema.role.drop('Accountant')
box.schema.role.exists(role-name)

Return true if a role exists; return false if a role does not exist.

Параметры:
  • role-name (string) – the name of the role
Rtype:

bool

Example:

box.schema.role.exists('Accountant')
box.schema.role.grant(user-name, privilege, object-type, object-name[, option])
box.schema.role.grant(user-name, privilege, 'universe'[, nil, option])
box.schema.role.grant(role-name, role-name[, nil, nil, option])

Grant privileges to a role.

Параметры:
  • user-name (string) – the name of the role
  • privilege (string) – „read“ or „write“ or „execute“ or a combination
  • object-type (string) – „space“ or „function“ or „sequence“
  • object-name (string) – the name of a function or space or sequence
  • option (table) – if_not_exists = true|false (default = false) - boolean; true means there should be no error if the role already has the privilege

The role must exist, and the object must exist.

Variation: instead of object-type, object-name say „universe“ which means „all object-types and all objects“.

Variation: instead of privilege, object-type, object-name say role-name – to grant a role to a role.

Example:

box.schema.role.grant('Accountant', 'read', 'space', 'tester')
box.schema.role.grant('Accountant', 'execute', 'function', 'f')
box.schema.role.grant('Accountant', 'read,write', 'universe')
box.schema.role.grant('public', 'Accountant')
box.schema.role.grant('role1', 'role2', nil, nil, {if_not_exists=false})
box.schema.role.revoke(user-name, privilege, object-type, object-name)

Revoke privileges from a role.

Параметры:
  • user-name (string) – the name of the role
  • privilege (string) – „read“ or „write“ or „execute“ or a combination
  • object-type (string) – „space“ or „function“ or „sequence“
  • object-name (string) – the name of a function or space or sequence

The role must exist, and the object must exist, but it is not an error if the role does not have the privilege.

Variation: instead of object-type, object-name say „universe“ which means „all object-types and all objects“.

Variation: instead of privilege, object-type, object-name say role-name.

Example:

box.schema.role.revoke('Accountant', 'read', 'space', 'tester')
box.schema.role.revoke('Accountant', 'execute', 'function', 'f')
box.schema.role.revoke('Accountant', 'read,write', 'universe')
box.schema.role.revoke('public', 'Accountant')
box.schema.role.info([role-name])

Return a description of a role’s privileges.

Параметры:
  • role-name (string) – the name of the role.

Example:

box.schema.role.info('Accountant')
box.schema.func.create(func-name[, {options}])

Create a function tuple. This does not create the function itself – that is done with Lua – but if it is necessary to grant privileges for a function, box.schema.func.create must be done first. For explanation of how Tarantool maintains function data, see reference on _func space.

The possible options are:

  • if_not_exists = true|false (default = false) - boolean; true means there should be no error if the _func tuple already exists.
  • setuid = true|false (default = false) - with true to make Tarantool treat the function’s caller as the function’s creator, with full privileges. Remember that SETUID works only over binary ports. SETUID doesn’t work if you invoke a function via an admin console or inside a Lua script.
  • language = „LUA“|“C“ (default = ‘LUA’).
Параметры:
  • func-name (string) – name of function, which should not be a number and should not contain special characters
  • options (table) – if_not_exists, setuid, language.
Return:

nil

Example:

box.schema.func.create('calculate')
box.schema.func.create('calculate', {if_not_exists = false})
box.schema.func.create('calculate', {setuid = false})
box.schema.func.create('calculate', {language = 'LUA'})
box.schema.func.drop(func-name[, {options}])

Drop a function tuple. For explanation of how Tarantool maintains function data, see reference on _func space.

Параметры:
  • func-name (string) – the name of the function
  • options (table) – if_exists = true|false (default = false) - boolean; true means there should be no error if the _func tuple does not exist.

Example:

box.schema.func.drop('calculate')
box.schema.func.exists(func-name)

Return true if a function tuple exists; return false if a function tuple does not exist.

Параметры:
  • func-name (string) – the name of the function
Rtype:

bool

Example:

box.schema.func.exists('calculate')
box.schema.func.reload([name])

Reload a C module or function without restarting the server.

Under the hood, Tarantool loads a new copy of the module (*.so shared library) and starts routing all new request to the new version. The previous version remains active until all started calls are finished. All shared libraries are loaded with RTLD_LOCAL (see «man 3 dlopen»), therefore multiple copies can co-exist without any problems.

Примечание

  • When a function from a certain module is reloaded, all the other functions from this module are also reloaded.
  • Reload will fail if a module was loaded from Lua script with ffi.load().
Параметры:
  • name (string) – the name of the module or function to reload

Примеры:

-- reload a function
box.schema.func.reload('module.function')
-- reload the entire module contents
box.schema.func.reload('module')
-- reload everything
box.schema.func.reload()
Sequences

An introduction to sequences is in the Sequences section of the «Data model» chapter. Here are the details for each function and option.

box.schema.sequence.create(name[, options])

Create a new sequence generator.

Параметры:
  • name (string) – the name of the sequence
  • options (table) – see a quick overview in the «Options for box.schema.sequence.create()» chart (in the Sequences section of the «Data model» chapter), and see more details below.
Return:

a reference to a new sequence object.

Options:

  • start – the STARTS WITH value. Type = integer, Default = 1.

  • min – the MINIMUM value. Type = integer, Default = 1.

  • max - the MAXIMUM value. Type = integer, Default = 9223372036854775807.

    There is a rule: min <= start <= max. For example it is illegal to say {start=0} because then the specified start value (0) would be less than the default min value (1).

    There is a rule: min <= next-value <= max. For example, if the next generated value would be 1000, but the maximum value is 999, then that would be considered «overflow».

  • cycle – the CYCLE value. Type = bool. Default = false.

    If the sequence generator’s next value is an overflow number, it causes an error return – unless cycle == true.

    But if cycle == true, the count is started again, at the MINIMUM value or at the MAXIMUM value (not the STARTS WITH value).

  • cache – the CACHE value. Type = unsigned integer. Default = 0.

    Currently Tarantool ignores this value, it is reserved for future use.

  • step – the INCREMENT BY value. Type = integer. Default = 1.

    Ordinarily this is what is added to the previous value.

sequence_object:next()

Generate the next value and return it.

The generation algorithm is simple:

  • If this is the first time, then return the STARTS WITH value.
  • If the previous value plus the INCREMENT value is less than the MINIMUM value or greater than the MAXIMUM value, that is «overflow», so either return an error (if cycle = false) or return the MAXIMUM value (if cycle = true and step < 0) or return the MINIMUM value (if cycle = true and step > 0).

If there was no error, then save the returned result, it is now the «previous value».

For example, suppose sequence „S“ has:

  • min == -6,
  • max == -1,
  • step == -3,
  • start = -2,
  • cycle = true,
  • previous value = -2.

Then box.sequence.S:next() returns -5 because -2 + (-3) == -5.

Then box.sequence.S:next() again returns -1 because -5 + (-3) < -6, which is overflow, causing cycle, and max == -1.

This function requires a „write“ privilege on the sequence.

Примечание

This function should not be used in «cross-engine» transactions (transactions which use both the memtx and the vinyl storage engines).

To see what the previous value was, without changing it, you can select from the _sequence_data system space.

sequence_object:alter(options)

The alter() function can be used to change any of the sequence’s options. Requirements and restrictions are the same as for box.schema.sequence.create().

sequence_object:reset()

Set the sequence back to its original state. The effect is that a subsequent next() will return the start value. This function requires a „write“ privilege on the sequence.

sequence_object:set(new-previous-value)

Set the «previous value» to new-previous-value. This function requires a „write“ privilege on the sequence.

sequence_object:drop()

Drop an existing sequence.

Example:

Here is an example showing all sequence options and operations:

s = box.schema.sequence.create(
               'S2',
               {start=100,
               min=100,
               max=200,
               cache=100000,
               cycle=false,
               step=100
               })
s:alter({step=6})
s:next()
s:reset()
s:set(150)
s:drop()
space_object:create_index(... [sequence='...' option] ...)

You can use the sequence=sequence-name (or sequence=sequence-id or sequence=true) option when creating or altering a primary-key index. The sequence becomes associated with the index, so that the next insert() will put the next generated number into the primary-key field, if the field would otherwise be nil.

For example, if „Q“ is a sequence and „T“ is a new space, then this will work:

tarantool> box.space.T:create_index('Q',{sequence='Q'})
---
- unique: true
  parts:
  - type: unsigned
    fieldno: 1
  sequence_id: 8
  id: 0
  space_id: 514
  name: Q
  type: TREE
...

(Notice that the index now has a sequence_id field.)

And this will work:

tarantool> box.space.T:insert{nil,0}
---
- [1, 0]
...

Примечание

If you are using negative numbers for the sequence options, make sure that the index key type is „integer“. Otherwise the index key type may be either „integer“ or „unsigned“.

A sequence cannot be dropped if it is associated with an index.

Вложенный модуль box.session

The box.session submodule allows querying the session state, writing to a session-specific temporary Lua table, or setting up triggers which will fire when a session starts or ends. A session is an object associated with each client connection.

box.session.id()
Return:the unique identifier (ID) for the current session. The result can be 0 meaning there is no session.
Rtype:number
box.session.exists(id)
Return:1 if the session exists, 0 if the session does not exist.
Rtype:number
box.session.peer(id)

This function works only if there is a peer, that is, if a connection has been made to a separate Tarantool instance.

Return:The host address and port of the session peer, for example «127.0.0.1:55457». If the session exists but there is no connection to a separate instance, the return is null. The command is executed on the server instance, so the «local name» is the server instance’s host and port, and the «peer name» is the client’s host and port.
Rtype:string

Possible errors: „session.peer(): session does not exist“

box.session.sync()
Return:the value of the sync integer constant used in the binary protocol.
Rtype:number
box.session.user()
Return:the name of the current user
Rtype:string
box.session.type()
Return:the type of connection or cause of action.
Rtype:string

Possible return values are:

  • „binary“ if the connection was done via the binary protocol, for example to a target made with box.cfg{listen=…};
  • „console“ if the connection was done via the administrative console, for example to a target made with console.listen;
  • „repl“ if the connection was done directly, for example when using Tarantool as a client;
  • „applier“ if the action is due to replication, regardless of how the connection was done;
  • „background“ if the action is in a background fiber, regardless of whether the Tarantool server was started in the background.

box.session.type() is useful for an on_replace() trigger on a replica – the value will be „applier“ if and only if the trigger was activated because of a request that was done on the master.

box.session.su(user-name[, function-to-execute])

Change Tarantool’s current user – this is analogous to the Unix command su.

Or, if function-to-execute is specified, change Tarantool’s current user temporarily while executing the function – this is analogous to the Unix command sudo.

Параметры:
  • user-name (string) – name of a target user
  • function-to-execute – name of a function, or definition of a function. Additional parameters may be passed to box.session.su, they will be interpreted as parameters of function-to-execute.

Пример

tarantool> function f(a) return box.session.user() .. a end
---
...

tarantool> box.session.su('guest', f, '-xxx')
---
- guest-xxx
...

tarantool> box.session.su('guest',function(...) return ... end,1,2)
---
- 1
- 2
...
box.session.storage

A Lua table that can hold arbitrary unordered session-specific names and values, which will last until the session ends. For example, this table could be useful to store current tasks when working with a Tarantool queue manager.

Пример

tarantool> box.session.peer(box.session.id())
---
- 127.0.0.1:45129
...
tarantool> box.session.storage.random_memorandum = "Don't forget the eggs"
---
...
tarantool> box.session.storage.radius_of_mars = 3396
---
...
tarantool> m = ''
---
...
tarantool> for k, v in pairs(box.session.storage) do
         >   m = m .. k .. '='.. v .. ' '
         > end
---
...
tarantool> m
---
- 'radius_of_mars=3396 random_memorandum=Don''t forget the eggs. '
...
box.session.on_connect(trigger-function[, old-trigger-function])

Define a trigger for execution when a new session is created due to an event such as console.connect. The trigger function will be the first thing executed after a new session is created. If the trigger execution fails and raises an error, the error is sent to the client and the connection is closed.

Параметры:
  • trigger-function (function) – function which will become the trigger function
  • old-trigger-function (function) – existing trigger function which will be replaced by trigger-function
Return:

nil or function pointer

If the parameters are (nil, old-trigger-function), then the old trigger is deleted.

Details about trigger characteristics are in the triggers section.

Пример

tarantool> function f ()
         >   x = x + 1
         > end
tarantool> box.session.on_connect(f)

Предупреждение

If a trigger always results in an error, it may become impossible to connect to a server to reset it.

box.session.on_disconnect(trigger-function[, old-trigger-function])

Define a trigger for execution after a client has disconnected. If the trigger function causes an error, the error is logged but otherwise is ignored. The trigger is invoked while the session associated with the client still exists and can access session properties, such as box.session.id.

Параметры:
  • trigger-function (function) – function which will become the trigger function
  • old-trigger-function (function) – existing trigger function which will be replaced by trigger-function
Return:

nil or function pointer

If the parameters are (nil, old-trigger-function), then the old trigger is deleted.

Details about trigger characteristics are in the triggers section.

Example #1

tarantool> function f ()
         >   x = x + 1
         > end
tarantool> box.session.on_disconnect(f)

Example #2

After the following series of requests, a Tarantool instance will write a message using the log module whenever any user connects or disconnects.

function log_connect ()
  local log = require('log')
  local m = 'Connection. user=' .. box.session.user() .. ' id=' .. box.session.id()
  log.info(m)
end

function log_disconnect ()
  local log = require('log')
  local m = 'Disconnection. user=' .. box.session.user() .. ' id=' .. box.session.id()
  log.info(m)
end

box.session.on_connect(log_connect)
box.session.on_disconnect(log_disconnect)

Here is what might appear in the log file in a typical installation:

2014-12-15 13:21:34.444 [11360] main/103/iproto I>
    Connection. user=guest id=3
2014-12-15 13:22:19.289 [11360] main/103/iproto I>
    Disconnection. user=guest id=3
box.session.on_auth(trigger-function[, old-trigger-function])

Define a trigger for execution during authentication.

The on_auth trigger function is invoked in these circumstances:

  1. The console.connect function includes an authentication check for all users except „guest“. For this case, the on_auth trigger function is invoked after the on_connect trigger function, if and only if the connection has succeeded so far.
  2. The binary protocol has a separate authentication packet. For this case, connection and authentication are considered to be separate steps.

Unlike other trigger types, on_auth trigger functions are invoked before the event. Therefore a trigger function like function auth_function () v = box.session.user(); end will set v to «guest», the user name before the authentication is done. To get the user name after the authentication is done, use the special syntax: function auth_function (user_name) v = user_name; end

If the trigger fails by raising an error, the error is sent to the client and the connection is closed.

Параметры:
  • trigger-function (function) – function which will become the trigger function
  • old-trigger-function (function) – existing trigger function which will be replaced by trigger-function
Return:

nil or function pointer

If the parameters are (nil, old-trigger-function), then the old trigger is deleted.

Details about trigger characteristics are in the triggers section.

Example 1

tarantool> function f ()
         >   x = x + 1
         > end
tarantool> box.session.on_auth(f)

Example 2

This is a more complex example, with two server instances.

The first server instance listens on port 3301; its default user name is „admin“. There are two on_auth triggers:

  • The first trigger has a function with no arguments, it can only look at box.session.user().
  • The second trigger has a function with a user_name argument, it can look at both box.session.user() and user_name.

The second server instance will connect with console.connect, and then will display the variables that were set by the trigger functions.

-- On the first server instance, which listens on port 3301
box.cfg{listen=3301}
function function1()
  print('function 1, box.session.user()='..box.session.user())
  end
function function2(user_name)
  print('function 2, box.session.user()='..box.session.user())
  print('function 2, user_name='..user_name)
  end
box.session.on_auth(function1)
box.session.on_auth(function2)
box.schema.user.passwd('admin')
-- On the second server instance, that connects to port 3301
console = require('console')
console.connect('admin:admin@localhost:3301')

The result looks like this:

function 2, box.session.user()=guest
function 2, user_name=admin
function 1, box.session.user()=guest

Submodule box.slab

The box.slab submodule provides access to slab allocator statistics. The slab allocator is the main allocator used to store tuples. This can be used to monitor the total memory usage and memory fragmentation.

box.runtime.info()

Show a memory usage report (in bytes) for the Lua runtime.

Return:
  • lua is the heap size of the Lua garbage collector;
  • maxalloc is the maximal memory quota that can be allocated for Lua;
  • used is the current memory size used by Lua.
Rtype:

table

Example:

tarantool> box.runtime.info()
---
- lua: 913710
  maxalloc: 4398046510080
  used: 12582912
...
tarantool> box.runtime.info().used
---
- used: 12582912
...
box.slab.info()

Show an aggregated memory usage report (in bytes) for the slab allocator.

This report is useful for assessing out-of-memory risks: the risks are high if both arena_used_ratio and quota_used_ratio are high (90-95%).

If quota_used_ratio is low, then high arena_used_ratio and/or items_used_ratio indicate that the memory fragmentation is low (i.e. the memory is used efficiently).

If quota_used_ratio is high (approaching 100%), then low arena_used_ratio (50-60%) indicates that the memory is heavily fragmentized. Most probably, there is no immediate out-of-memory risk in this case, but generally this is an issue to consider. For example, probable risks are that the entire memory quota is used for tuples, and there is are no slabs left for a piece of an index. Or that all slabs are allocated for storing tuples, but in fact all the slabs are half-empty.

Return:
  • items_size is the total amount of memory (including allocated, but currently free slabs) used only for tuples, no indexes;
  • items_used_ratio = items_used / slab_count * slab_size (these are slabs used only for tuples, no indexes);
  • quota_size is the maximum amount of memory that the slab allocator can use for both tuples and indexes (as configured in memtx_memory parameter, e.g. the default is 1 gigabyte = 2^30 bytes = 1,073,741,824 bytes);
  • quota_used_ratio = quota_used / quota_size;
  • arena_used_ratio = arena_used / arena_size;
  • items_used is the efficient amount of memory (omitting allocated, but currently free slabs) used only for tuples, no indexes;
  • quota_used is the amount of memory that is already distributed to the slab allocator;
  • arena_size is the total memory used for tuples and indexes together (including allocated, but currently free slabs);
  • arena_used is the efficient memory used for storing tuples and indexes together (omitting allocated, but currently free slabs).
Rtype:

table

Example:

tarantool> box.slab.info()
---
- items_size: 228128
  items_used_ratio: 1.8%
  quota_size: 1073741824
  quota_used_ratio: 0.8%
  arena_used_ratio: 43.2%
  items_used: 4208
  quota_used: 8388608
  arena_size: 2325176
  arena_used: 1003632
...

tarantool> box.slab.info().arena_used
---
- 1003632
...
box.slab.stats()

Show a detailed memory usage report (in bytes) for the slab allocator. The report is broken down into groups by data item size as well as by slab size (64-byte, 136-byte, etc). The report includes the memory allocated for storing both tuples and indexes.

return:
  • mem_free is the allocated, but currently unused memory;
  • mem_used is the memory used for storing data items (tuples and indexes);
  • item_count is the number of stored items;
  • item_size is the size of each data item;
  • slab_count is the number of slabs allocated;
  • slab_size is the size of each allocated slab.
rtype:

table

Example:

Here is a sample report for the first group:

tarantool> box.slab.stats()[1]
---
- mem_free: 16232
  mem_used: 48
  item_count: 2
  item_size: 24
  slab_count: 1
  slab_size: 16384
...

This report is saying that there are 2 data items (item_count = 2) stored in one (slab_count = 1) 24-byte slab (item_size = 24), so mem_used = 2 * 24 = 48 bytes. Also, slab_size is 16384 bytes, of which 16384 - 48 = 16232 bytes are free (mem_free).

A complete report would show memory usage statistics for all groups:

tarantool> box.slab.stats()
---
- - mem_free: 16232
    mem_used: 48
    item_count: 2
    item_size: 24
    slab_count: 1
    slab_size: 16384
  - mem_free: 15720
    mem_used: 560
    item_count: 14
    item_size: 40
    slab_count: 1
    slab_size: 16384
  <...>
  - mem_free: 32472
    mem_used: 192
    item_count: 1
    item_size: 192
    slab_count: 1
    slab_size: 32768
  - mem_free: 1097624
    mem_used: 999424
    item_count: 61
    item_size: 16384
    slab_count: 1
    slab_size: 2097152
  ...

The total mem_used for all groups in this report equals arena_used in box.slab.info() report.

Вложенный модуль box.space

The box.space submodule has the data-manipulation functions select, insert, replace, update, upsert, delete, get, put. It also has members, such as id, and whether or not a space is enabled. Submodule source code is available in file src/box/lua/schema.lua.

A list of all box.space functions follows, then comes a list of all box.space members.

The functions and members of box.space

Name Use
space_object:auto_increment() Generate key + Insert a tuple
space_object:bsize() Get count of bytes
space_object:count() Get count of tuples
space_object:create_index() Create an index
space_object:delete() Delete a tuple
space_object:drop() Destroy a space
space_object:format() Declare field names and types
space_object:get() Select a tuple
space_object:insert() Insert a tuple
space_object:len() Get count of tuples
space_object:on_replace() Create a replace trigger
space_object:pairs() Prepare for iterating
space_object:put() Insert or replace a tuple
space_object:rename() Rename a space
space_object:replace() Insert or replace a tuple
space_object:run_triggers() Enable/disable a replace trigger
space_object:select() Select one or more tuples
space_object:truncate() Delete all tuples
space_object:update() Update a tuple
space_object:upsert() Update a tuple
space_object.enabled Flag, true if space is enabled
space_object.field_count Required number of fields
space_object.id Numeric identifier of space
space_object.index Container of space’s indexes
box.space._cluster (Metadata) List of replica sets
box.space._func (Metadata) List of function tuples
box.space._index (Metadata) List of indexes
box.space._priv (Metadata) List of privileges
box.space._schema (Metadata) List of schemas
box.space._sequence (Metadata) List of sequences
box.space._sequence_data (Metadata) List of sequences
box.space._space (Metadata) List of spaces
box.space._user (Metadata) List of users
объект box.space.space_object
space_object:auto_increment(tuple)

Insert a new tuple using an auto-increment primary key. The space specified by space_object must have an „unsigned“ or „integer“ or „number“ primary key index of type TREE. The primary-key field will be incremented before the insert.

Параметры:
  • space_object (space_object) – an object reference
  • tuple (table/tuple) – tuple’s fields, other than the primary-key field
Return:

the inserted tuple.

Rtype:

tuple

Complexity factors: Index size, Index type, Number of indexes accessed, WAL settings.

Possible errors: index has wrong type or primary-key indexed field is not a number.

Example:

tarantool> box.space.tester:auto_increment{'Fld#1', 'Fld#2'}
---
- [1, 'Fld#1', 'Fld#2']
...
tarantool> box.space.tester:auto_increment{'Fld#3'}
---
- [2, 'Fld#3']
...
space_object:bsize()
Параметры:
Return:

Number of bytes in the space.

Example:

tarantool> box.space.tester:bsize()
---
- 22
...

Note re storage engine: vinyl does not support bsize().

space_object:count([key][, iterator])

Return the number of tuples. If compared with len(), this method works slower because count() scans the entire space to count the tuples.

Параметры:
  • space_object (space_object) – an object reference
  • key (scalar/table) – primary-key field values, must be passed as a Lua table if key is multi-part
  • iterator – comparison method
Return:

Number of tuples.

Example:

tarantool> box.space.tester:count(2, {iterator='GE'})
---
- 1
...
space_object:create_index(index-name[, options])

Create an index. It is mandatory to create an index for a space before trying to insert tuples into it, or select tuples from it. The first created index, which will be used as the primary-key index, must be unique.

Параметры:
  • space_object (space_object) – an object reference
  • index_name (string) – name of index, which should not be a number and should not contain special characters
  • options (table) –
Return:

index object

Rtype:

index_object

Options for space_object:create_index:

Name Эффект Type Default
type type of index string („HASH“ or „TREE“ or „BITSET“ or „RTREE“) „TREE“
id unique identifier number last index’s id, +1
unique index is unique boolean true
if_not_exists no error if duplicate name boolean false
parts field-numbers + types {field_no, „unsigned“ or „string“ or „integer“ or „number“ or „boolean“ or „array“ or „scalar“} {1, 'unsigned'}
dimension affects RTREE only number 2
distance affects RTREE only string („euclid“ or „manhattan“) „euclid“
bloom_fpr affects vinyl only number vinyl_bloom_fpr
page_size affects vinyl only number vinyl_page_size
range_size affects vinyl only number vinyl_range_size
run_count_per_level affects vinyl only number vinyl_run_count_per_level
run_size_ratio affects vinyl only number vinyl_run_size_ratio
sequence see section regarding specifying a sequence in create_index() string or number not present

Note re storage engine: vinyl has extra options which by default are based on configuration parameters vinyl_bloom_fpr, vinyl_page_size, vinyl_range_size, vinyl_run_count_per_level, and vinyl_run_size_ratio – see the description of those parameters. The current values can be seen by selecting from box.space._index.

Possible errors: too many parts. Index „…“ already exists. Primary key must be unique.

tarantool> s = box.space.space55
---
...
tarantool> s:create_index('primary', {unique = true, parts = {1, 'unsigned', 2, 'string'}})
---
...

Подробнее о типах индексируемых полей:

The seven index field types (unsigned | string | integer | number | boolean | array | scalar) differ depending on what values are allowed, and what index types are allowed.

  • unsigned: unsigned integers between 0 and 18446744073709551615, about 18 quintillion. May also be called „uint“ or „num“, but „num“ is deprecated. Legal in memtx TREE or HASH indexes, and in vinyl TREE indexes.
  • string: any set of octets, up to the maximum length. May also be called „str“. Legal in memtx TREE or HASH or BITSET indexes, and in vinyl TREE indexes.
  • integer: integers between -9223372036854775808 and 18446744073709551615. May also be called „int“. Legal in memtx TREE or HASH indexes, and in vinyl TREE indexes.
  • number: integers between -9223372036854775808 and 18446744073709551615, single-precision floating point numbers, or double-precision floating point numbers. Legal in memtx TREE or HASH indexes, and in vinyl TREE indexes.
  • boolean: true or false. Legal in memtx TREE or HASH indexes, and in vinyl TREE indexes.
  • array: array of numbers. Legal in memtx RTREE indexes.
  • scalar: booleans (true or false), or integers between -9223372036854775808 and 18446744073709551615, or single-precision floating point numbers, or double-precison floating-point numbers, or strings. When there is a mix of types, the key order is: booleans, then numbers, then strings. Legal in memtx TREE or HASH indexes, and in vinyl TREE indexes.

Index field types to use in create_index

Тип поля для индексирования What can be in it Where is it legal Примеры
unsigned integers between 0 and 18446744073709551615 memtx TREE or HASH indexes,
vinyl TREE indexes
123456
string strings – any set of octets memtx TREE or HASH indexes
vinyl TREE indexes
„A B C“
„\65 \66 \67“
integer integers between -9223372036854775808 and 18446744073709551615 memtx TREE or HASH indexes,
vinyl TREE indexes
-2^63
number integers between -9223372036854775808 and 18446744073709551615, single-precision floating point numbers, double-precision floating point numbers memtx TREE or HASH indexes,
vinyl TREE indexes
1.234
-44
1.447e+44
boolean true / false memtx TREE or HASH indexes,
vinyl TREE indexes
false
true
array array of integers between -9223372036854775808 and 9223372036854775807 memtx RTREE indexes {10, 11}
{3, 5, 9, 10}
scalar booleans (true or false), integers between -9223372036854775808 and 18446744073709551615, single-precision floating point numbers, double-precision floating point numbers, strings memtx TREE or HASH indexes,
vinyl TREE indexes
true
-1
1.234
„“
„ру“

Note re storage engine: vinyl supports only the TREE index type, and vinyl secondary indexes must be created before tuples are inserted.

space_object:delete(key)

Delete a tuple identified by a primary key.

Параметры:
  • space_object (space_object) – an object reference
  • key (scalar/table) – primary-key field values, must be passed as a Lua table if key is multi-part
Return:

the deleted tuple

Rtype:

tuple

Complexity factors: Index size, Index type

Примечание

Примечание:
vinyl will return nil, rather than the deleted tuple.

Example:

tarantool> box.space.tester:delete(1)
---
- [1, 'My first tuple']
...
tarantool> box.space.tester:delete(1)
---
...
tarantool> box.space.tester:delete('a')
---
- error: 'Supplied key type of part 0 does not match index part type:
  expected unsigned'
...
space_object:drop()

Drop a space.

Параметры:
Return:

nil

Possible errors: If space_object does not exist.

Complexity factors: Index size, Index type, Number of indexes accessed, WAL settings.

Example:

box.space.space_that_does_not_exist:drop()
space_object:format(format-clause)

Declare field names and types.

Параметры:
  • space_object (space_object) – an object reference
  • format-clause (table) – a list of field names and types
Return:

nil

Possible errors: If space_object does not exist; if field names are duplicated, if type is not legal.

Ordinarily Tarantool allows unnamed untyped fields. But with format users can, for example, document that the Nth field is the surname field and must contain strings. It is also possible to specify a format clause in box.schema.space.create().

The format clause contains {name='...',type='...'} pairs. The name may be any string, provided that two fields do not have the same name. The type must be ‘unsigned’ or ‘string’ or ‘integer’ or ‘number’ or ‘boolean’ or ‘array’ or ‘scalar’ (the same as the requirement in «Options for space_object:create_index»).

It is legal for tuples to have more fields than are described by a format clause. The way to constrain the number of fields is to specify a space’s field_count member.

It is legal to use format on a space that already has a format, provided that there is no conflict with existing data or index definitions.

Example:

box.space.T:format({{name='surname',type='string'},{name='IDX',type='array'}})
space_object:get(key)

Search for a tuple in the given space.

Параметры:
  • space_object (space_object) – an object reference
  • key (scalar/table) – value to be matched against the index key, which may be multi-part.
Return:

the tuple whose index key matches key, or nil.

Rtype:

tuple

Possible errors: If space_object does not exist.

Complexity factors: Index size, Index type, Number of indexes accessed, WAL settings.

The box.space...select function returns a set of tuples as a Lua table; the box.space...get function returns at most a single tuple. And it is possible to get the first tuple in a space by appending [1]. Therefore box.space.tester:get{1} has the same effect as box.space.tester:select{1}[1], if exactly one tuple is found.

Example:

box.space.tester:get{1}
space_object:insert(tuple)

Insert a tuple into a space.

Параметры:
  • space_object (space_object) – an object reference
  • tuple (tuple/table) – tuple to be inserted.
Return:

the inserted tuple

Rtype:

tuple

Possible errors: If a tuple with the same unique-key value already exists, returns ER_TUPLE_FOUND.

Example:

tarantool> box.space.tester:insert{5000,'tuple number five thousand'}
---
- [5000, 'tuple number five thousand']
...
space_object:len()

Return the number of tuples in the space. If compared with count(), this method works faster because len() does not scan the entire space to count the tuples.

Параметры:
Return:

Number of tuples in the space.

Example:

tarantool> box.space.tester:len()
---
- 2
...

Note re storage engine: vinyl does not support len(). Possible workarounds are to use count() or #select(...).

space_object:on_replace(trigger-function[, old-trigger-function])

Create a «replace trigger». The trigger-function will be executed whenever a replace() or insert() or update() or upsert() or delete() happens to a tuple in <space-name>.

Параметры:
  • trigger-function (function) – function which will become the trigger function
  • old-trigger-function (function) – existing trigger function which will be replaced by trigger-function
Return:

nil or function pointer

If the parameters are (nil, old-trigger-function), then the old trigger is deleted.

If it is necessary to know whether the trigger activation happened due to replication or on a specific connection type, the function can refer to box.session.type().

Details about trigger characteristics are in the triggers section.

Example #1:

<