SearchStax Managed Search service administrators sometimes overload their systems by asking Solr to index too many records in a single batch, or too many batches in a short amount of time. CPU levels max out at 100% for extended periods. This can cause service outages as one replica after another goes into recovery mode for no visible reason.
When overloaded, Solr stores incoming /update requests in Zookeeper’s overseer queue. In extreme cases, Zookeeper can get into a state where the overseer queue is too full to process. At that time, we typically see:
- Solr connection timeout messages.
- Replicas in recovery.
- Large numbers of indexing errors and Solr timeout errors.
- Solr warnings saying “Could not obtain overseer’s address, skipping.”
You can check the state of the overseer queue using this Solr URL command:
<Solr HTTP endpoint>/admin/collections?action=OVERSEERSTATUS
If the queue is large and does not seem to be diminishing, please ask SearchStax support to manually clear the queue. This is likely to require a rolling restart to help Solr recover from the effects of overloading the indexing process.
Questions?
Do not hesitate to contact the SearchStax Support Desk.