Here is a p99 graph of reponse times with a consistent 350-600 rps. In general, the supertokens core is stateless and horizontally scalable, so adding more core istances to this setup would further improve performance.
Here we have added 6 core instances, each running on a t3.small instance.
Furthermore, these are only API calls to sign in / sign up, session refresh etc. The most common auth related operation in your app is session verification, which is completely stateless and doesn't requires querying the supertokens core instance. So the latency for that is just a few milliseconds.