(g)ULP!
Loading...
Searching...
No Matches
Troubleshooting

  • if you want to be able to run the docker CLI command as a non-root user, add your user to the docker user group, re-login, and restart docker.service (https://wiki.archlinux.org/title/Users_and_groups#Group_management)
  • environment variables are defined in .env which is automatically read by docker-compose when starting the containers.
  • ensure data persistence directories for postgres and OpenSearch (ELASTIC_DATA, POSTGRES_DATA defined in .env) exists AND are owned by the user.
  • on linux, usually vm.max_map_count should be increased for OpenSearch, i.e. you see something like this in the error log when OpenSearch is starting up:

    {
    "@timestamp": "2023-10-30T13:26:11.175Z",
    "log.level": "ERROR",
    "message": "node validation exception\n[1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.\nbootstrap check failure [1] of [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]",
    "ecs.version": "1.2.0",
    "service.name": "ES_ECS",
    "event.dataset": "elasticsearch.server",
    "process.thread.name": "main",
    "log.logger": "org.elasticsearch.bootstrap.Elasticsearch",
    "elasticsearch.node.name": "a17e0512fed1",
    "elasticsearch.cluster.name": "docker-cluster"
    }

    to fix this, set vm.max_map_count=262144 in /etc/sysctl.conf (or in a file in /etc/sysctl.d)

    or set it temporarly with sysctl vm.max_map_count=262144.

    # activate without reboot
    sudo sysctl --system

    or reboot.

  • if you get errors like failed to obtain node-lock while booting OpenSearch, ensure that the data directory i.e. ./elastic_data is NOT owned by root and is writable (so, chown it again to your user in case).
  • error too many connections already from postgres usually happens when ingesting too many files at once, and should be handled by tuning the configuration parameters:
    • in gulp configuration, check multiprocessing_batch_size: it is advised to keep it 0 to perform operation in batches of number of cores, raising this value may speed up ingestion a lot but it is more prone to errors.
    • in postgres configuration, increase max_connections
    • better solution is to scale up (increase cores and/or postgres cluster size)
  • error elastic_transport.ConnectionTimeout: Connection timed out usually means your opensearch istance is not keeping up with ingestion:
    • increase ingestion_request_timeout (almost always this is the easiest solution) OR
    • scale up OpenSearch nodes OR
    • reduce parallelism with parallel_processes_max OR
    • tune ingestion_buffer_size configuration parameter (i.e. default is 1000, try with 2000 to reduce parallel chunks)
  • in case opensearch fails to successfully bootstrap because of errors such as:
elasticsearch | {"@timestamp":"2024-07-22T15:53:30.889Z", "log.level": "WARN", "message":"flood stage disk watermark [95%] exceeded on [2Mf_sAxtRua8tLTU865kPA][3b369ef109f9][/usr/share/elasticsearch/data] free: 39.7gb[4.3%], all indices on this node will be marked read-only", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[3b369ef109f9][management][T#2]","log.logger":"org.elasticsearch.cluster.routing.allocation.DiskThre

a possible (temporary) solution is to disable disk thresholds in opensearch's configuration after the node starts:

curl -k -u "opensearch:Gulp1234!" -XPUT -H "Content-Type: application/json" https://localhost:9200/_cluster/settings -d '{ "transient": { "cluster.routing.allocation.disk.threshold_enabled": false } }'
curl -k -u "opensearch:Gulp1234!" -XPUT -H "Content-Type: application/json" https://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'