We re seeing an increase in client side errors when hitting SuperTokens.com #support-questions-legacy

We're seeing an increase in client-side errors whe...

goodgravy

09/25/2023, 11:03 AM

We're seeing an increase in client-side errors when hitting the email verification endpoint. The request volume isn't super high (5k/min?) so I don't think we should be hitting any rate-limits. I can see in APM that we're sometimes timing out when making establishing TCP connections to SuperTokens core. Is there anything you can see about our traffic patterns and usage that could explain what's going on? Here's a giraffe of errors when hitting the

user/email/verify

endpoint. Purple line is the load balancer's count of requests to the endpoint; blue is the client-reported errors.

rp_st

09/25/2023, 11:35 AM

hey @goodgravy

rp_st

09/25/2023, 11:35 AM

there has been a random huge spike within the last hour on our end

rp_st

09/25/2023, 11:35 AM

goodgravy

09/25/2023, 3:08 PM

Is this still the case @rp_st ? We made a change ~40m ago

rp_st

09/25/2023, 3:32 PM

seems better now

rp_st

09/25/2023, 3:32 PM

rp_st

09/25/2023, 3:32 PM

but it's not lesser than what it was before.. it's just not higher.

goodgravy

09/25/2023, 3:44 PM

We just hit TechCrunch and will be doing some other press, so expecting a moderate uplift in sign-ups over the next couple of days.

rp_st

09/25/2023, 3:44 PM

Ah okay! Congrats on the press! 👏👏

goodgravy

09/25/2023, 6:52 PM

FYI we got a noticeable spike in 502s ~10 minutes ago hitting various endpoints on core, like

/recipe/user

/recipe/jwt

/recipe/user/email/verify

. Here's an example response:

Copy code

SuperTokens core threw an error for a GET request to path: '/recipe/user/email/verify' with status code: 502 and message: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx</center>
</body>
</html>

rp_st

09/25/2023, 7:10 PM

Right. We were scaling up. Should be resolved now

goodgravy

09/26/2023, 1:19 PM

@rp_st we've noticed a big increase in latency on your core endponts in the last 30 mins

goodgravy

09/26/2023, 1:19 PM

rp_st

09/26/2023, 1:20 PM

hmm yea. I see it too. Requests are reaching 80k per min. Which is even more than 1k rps!

rp_st

09/26/2023, 1:20 PM

rp_st

09/26/2023, 1:21 PM

is this expected amount of traffic?

rp_st

09/26/2023, 1:22 PM

we are further scaling up now.

goodgravy

09/26/2023, 1:24 PM

We don't have a notable increase in traffic – 4h scale

rp_st

09/26/2023, 1:24 PM

huh.. so this is strange

rp_st

09/26/2023, 1:25 PM

maybe it's just concurrent users?

rp_st

09/26/2023, 1:25 PM

but that should increase your traffic too..

goodgravy

09/26/2023, 1:27 PM

Yeah, that graph is just the raw number of inbound requests to our server, so I think it must be an increase in requests from our server to yours, for a ~constant number of requests from users' browsers

rp_st

09/26/2023, 1:27 PM

yup. Pretty much

goodgravy

09/26/2023, 1:27 PM

We haven't deployed

rp_st

09/26/2023, 1:28 PM

hmmm

rp_st

09/26/2023, 1:28 PM

let me see the logs. In case there is something

rp_st

09/26/2023, 1:33 PM

ok, the requests suddenly went back to normal amounts.

rp_st

09/26/2023, 1:34 PM

goodgravy

09/26/2023, 1:36 PM

I see the response times normalising too

rp_st

09/26/2023, 1:36 PM

but now the requests are rising again

rp_st

09/26/2023, 1:36 PM

rp_st

09/26/2023, 1:37 PM

let me check the IP addresses being used to query. Will get back

rp_st

09/26/2023, 1:40 PM

52.204.251.144

and

34.207.186.17

. Most requests are coming from

52.204.251.144

rp_st

09/26/2023, 1:44 PM

these are the request paths that are most common: - /recipe/user/email/verify GET - /recipe/user GET - /recipe/jwt - /recipe/session/regenerate Where

/recipe/user GET

is the most common.

rp_st

09/26/2023, 1:45 PM

Maybe it's an issue witht the caching you had in place? I think you had some code where you would fetch info from your db, and if that returned nothing, then you would query the core for the user info? Just guessing here..

goodgravy

09/26/2023, 4:03 PM

There must be something within the library code which is hitting the

/recipe/user

endpoint… accounting for our caching logic, we only directly hit that endpoint 123 times in the last 4h 😬 Can you help us narrow down how we could be transitively hitting that endpoint? As in, which parts of the Node SDK hit that endpoint?

rp_st

09/26/2023, 4:04 PM

Yea sure.

rp_st

09/26/2023, 4:06 PM

The thing is, most of the lib code runs when an api is called

rp_st

09/26/2023, 4:07 PM

If there is not much of an increase of the browser calling the backend SDK api, then there shouldn’t be that much of an increase for core calls

rp_st

09/26/2023, 4:57 PM

when will you be available for a debugging call @goodgravy ?

rp_st

09/26/2023, 5:29 PM

@porcellus can help here (you had spoken to him on our last debugging call)

goodgravy

09/26/2023, 6:26 PM

We made another infrastructure change 1hr ago and think that that will help with this problem. Theory is: - Some of our endpoints / some of the SDK trigger multiple requests to core APIs - Our DNS resolution infrastructure became overloaded - That caused some requests to core to timeout - Something triggered a retry - We retry all of the multiple requests to core API endpoints - This results in the increased request volume you were seeing

goodgravy

09/26/2023, 6:26 PM

I'm not yet sure where the retry is coming from.

porcellus

09/26/2023, 6:48 PM

hi. I don't think there's too much I can add based on the above, but the most hit requests paths @rp sent looks like something is updating the email verification claim in sessions (if we are looking for a single thing that calls all those paths on the core). I don't know of anything in our SDK that'd cause that to be retried or called multiple times, so I don't think this helps too much.

rp_st

09/27/2023, 4:24 AM

things to seem to be stable for now. The rps has gone back to what it used to be and has been that since the last few hours.

rp_st

09/27/2023, 4:25 AM

We will monitor it for a day and let u know if it rises again.

rp_st

09/27/2023, 1:08 PM

it did spike up again in the last few hours.

rp_st

09/27/2023, 1:28 PM

and it's gone back to doing 1k rps again now.

3 Views

Previous Next