As I discussed in a previous post Ruby MRI is not capable of running multiple threads in parallel. So how can we scale Ruby applications, if we cannot run threads in parallel?
Ruby web servers (Unicorn vs Puma)
Unicorn and Puma are two of the most popular Ruby web servers. Puma was originally designed for Rubinius to provide real concurrency. Rubinius doesn’t have the GIL so it can run multiple processes, in parallel, across multiple cores. Puma enables this on your webserver; if you use Puma with Rubinius you can spawn processes on multiple cores to serve multiple requests. Puma does work with MRI but the GIL prevents it from running more than one thread at one time.
Unicorn spawns multiple works (UNIX processes) to serve requests. Unicorn allows you to scale by increasing the number of workers on your servers. Each worker is within its own isolated workspace so it doesn’t require your application to be thread safe.
Forked processes vs multi-threading
Threads are more lightweight than UNIX processes, so they consume less memory. Therefore Puma generally has lower memory usage than Unicorn.
The drawback of multi-threading is that it is not as useful with MRI or other C-based Ruby implementations. The GIL limits performance.
The advantage of using forked (UNIX) processes is that you can serve multiple requests in parallel, on multiple cores, with any Ruby implementation. This technique does not require Ruby to be thread safe.
The drawback of this approach is that forked processes require more memory. As such, unicorn can consume a lot of memory on your application server. This can lead to memory errors and slow response times if it is not managed.