Troubleshooting
The index build crashes or does not run to completion
A successful index build ends with the log message [Date] - INFO: Index build
completed. In the course of the index build, time is spent in the following
phases, each with their own section and progress messages in the log:
- Parsing triples
- Merging partial vocabularies
- Converting triples from local IDs to global IDs
- Building the various permutations (SPO, SOP, OSP, OPS, PSO, POS)
The index build may fail in any of these phases. Here is a list of things that may go wrong, and how to fix them.
Regarding 1: If QLever encounters input that it cannot parse, it will abort with an error message. The error message wlil indicate the byte offset in the input file where the error was encountered, and it will also contain a part of the input after that offset.
Regarding 1: The index build just ends with a line like [Date] - INFO: Triples parsed:
... and no further error message. This is a sign that the process was killed
by the operating system, most likely due to running out of memory. The most
likely cause is that num-triples-per-batch in SETTINGS_JSON is set too
high. Set it lower, see Qleverfile settings.
Regarding 1 or 2: QLever parses the input in batches of size
num-triples-per-batch each, or less for the last batch. For each batch, two files
are created on disk: one during parsing and one during the merging of partial
vocabularies. If the number of files exceeds the number of allowed open file
descriptors, there will be a corresponding error message. Set ULIMIT higher,
see Qleverfile settings.
Regarding 2: The index build crashes at [Date] - INFO: Merging partial
vocabularies ... and one of the last lines in the log is Finished writing
compressed internal vocabulary, size = 0 B [uncompressed = 0 B, ratio = 100%].
This happens when the STXXL_MEMORY divided by the number of batches is too
small. The number of batches is the total number of triples divided by
"num-triples-per-batch". Either increase STXXL_MEMORY or increase
num-triples-per-batch, see Qleverfile settings.
Regarding 3: This phase is computationally simple, does not use much memory, and eventually closes the two files per batch that were created during parsing and vocabulary merging. We are not aware of any systemic problems occurring in this phase.
Regarding 4: The index can crash here if STXLL_MEMORY is too low or the
number of triples is very large. Increase STXXL_MEMORY or make us of
https://github.com/ad-freiburg/qlever/pull/2443.