We're seeing an increase in client-side errors whe...
# support-questions-legacy
g
We're seeing an increase in client-side errors when hitting the email verification endpoint. The request volume isn't super high (5k/min?) so I don't think we should be hitting any rate-limits. I can see in APM that we're sometimes timing out when making establishing TCP connections to SuperTokens core. Is there anything you can see about our traffic patterns and usage that could explain what's going on? Here's a giraffe of errors when hitting the
user/email/verify
endpoint. Purple line is the load balancer's count of requests to the endpoint; blue is the client-reported errors.
r
hey @goodgravy
there has been a random huge spike within the last hour on our end
g
Is this still the case @rp_st ? We made a change ~40m ago
r
seems better now
but it's not lesser than what it was before.. it's just not higher.
g
We just hit TechCrunch and will be doing some other press, so expecting a moderate uplift in sign-ups over the next couple of days.
r
Ah okay! Congrats on the press! 👏👏
g
FYI we got a noticeable spike in 502s ~10 minutes ago hitting various endpoints on core, like
/recipe/user
,
/recipe/jwt
,
/recipe/user/email/verify
. Here's an example response:
Copy code
SuperTokens core threw an error for a GET request to path: '/recipe/user/email/verify' with status code: 502 and message: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx</center>
</body>
</html>
r
Right. We were scaling up. Should be resolved now
g
@rp_st we've noticed a big increase in latency on your core endponts in the last 30 mins
r
hmm yea. I see it too. Requests are reaching 80k per min. Which is even more than 1k rps!
is this expected amount of traffic?
we are further scaling up now.
g
We don't have a notable increase in traffic – 4h scale
r
huh.. so this is strange
maybe it's just concurrent users?
but that should increase your traffic too..
g
Yeah, that graph is just the raw number of inbound requests to our server, so I think it must be an increase in requests from our server to yours, for a ~constant number of requests from users' browsers
r
yup. Pretty much
g
We haven't deployed
r
hmmm
let me see the logs. In case there is something
ok, the requests suddenly went back to normal amounts.
g
I see the response times normalising too
r
but now the requests are rising again
let me check the IP addresses being used to query. Will get back
52.204.251.144
and
34.207.186.17
. Most requests are coming from
52.204.251.144
these are the request paths that are most common: - /recipe/user/email/verify GET - /recipe/user GET - /recipe/jwt - /recipe/session/regenerate Where
/recipe/user GET
is the most common.
Maybe it's an issue witht the caching you had in place? I think you had some code where you would fetch info from your db, and if that returned nothing, then you would query the core for the user info? Just guessing here..
g
There must be something within the library code which is hitting the
/recipe/user
endpoint… accounting for our caching logic, we only directly hit that endpoint 123 times in the last 4h 😬 Can you help us narrow down how we could be transitively hitting that endpoint? As in, which parts of the Node SDK hit that endpoint?
r
Yea sure.
The thing is, most of the lib code runs when an api is called
If there is not much of an increase of the browser calling the backend SDK api, then there shouldn’t be that much of an increase for core calls
when will you be available for a debugging call @goodgravy ?
@porcellus can help here (you had spoken to him on our last debugging call)
g
We made another infrastructure change 1hr ago and think that that will help with this problem. Theory is: - Some of our endpoints / some of the SDK trigger multiple requests to core APIs - Our DNS resolution infrastructure became overloaded - That caused some requests to core to timeout - Something triggered a retry - We retry all of the multiple requests to core API endpoints - This results in the increased request volume you were seeing
I'm not yet sure where the retry is coming from.
p
hi. I don't think there's too much I can add based on the above, but the most hit requests paths @rp sent looks like something is updating the email verification claim in sessions (if we are looking for a single thing that calls all those paths on the core). I don't know of anything in our SDK that'd cause that to be retried or called multiple times, so I don't think this helps too much.
r
things to seem to be stable for now. The rps has gone back to what it used to be and has been that since the last few hours.
We will monitor it for a day and let u know if it rises again.
it did spike up again in the last few hours.
and it's gone back to doing 1k rps again now.
2 Views