The architecture works great. The area of scaling/bottleneck is just in simultaneously connected users.
In peak performance, users are interacting with 5-10 requests a second. All user requests flow into a single resource for which all 2k-80k users are affected.
I appreciate your suggestions and will look into the microservices. A brief search says I might 80 kubernetes pods to achieve this. I'll definitely look into this.