There are some anecdotal reports online about inference endpoints quietly degrading. I will add to those reports, and just like the others I will not publish hard data because of client confidentiality and potential liability (I can’t afford a lawyer).
LLM
I was using an LLM inference endpoint last year for data extraction. The pipeline was unstructured text input into structured JSON output. I automated testing of the endpoint when I was evaluating the model to use for the task. I used Jaccard similarity to test the stability of the extracted entity sets. I also measured response times.
I had the test logs saved from November 2025. I redid the tests today (April 15 2026). Same endpoint, same model, same config, same params, same testing data. The response time averages are three times as long as before, ~200% response time outliers started appearing and the result stability suffered slightly.
TTS
I used one of the more well-known inference providers in February 2026, trying to reproduce the TTS results I was getting locally with Chatterbox. I already got the quality I needed locally, and was looking for speed. The inference endpoint was not marked as quantized and I found no details regarding the inference pipeline in the docs. In all the params that were exposed through the API, I mirrored the local settings. I generated a several samples through the API, paying for the inference. The audio resembled what I generated locally, so I continued working on my PoC using the API.
My initial promising tests were followed bu a couple of hours of complete API outage (500s on every request). After the endpoint came back up online, the audio I started getting back was not usable anymore when compared to what I generated locally and what the endpoint itself generated the day before.
I contacted the customer service and they claimed to get me in touch with the technical team. Sadly that’s where the contact from the company stopped.
Afterwards, I pointed an LLM at the inference provider’s ToS. They do not make any guarantees about what I paid for, whether quality or stability.