Multiprocessing vs Multithreading in Ruby: Which is Better?

Can multiprocessing be a good alternative to multithreading? Sure! It depends, however, on answering the following question: does my project really need multiple processes? To get this straight, you just need to read on!

Kamil - man with short brown hair and beard, wearing a gray shirt, looking slightly to the side against a light gray background.

Kamil Sopata

August 19, 2019 • 14 min read

Parallel computing is a cure for performance issues. It allows to do several things at once, which sounds great in the context of background jobs. Inspired by Python’s multiprocessing module I began to think about the parallelism in Ruby. Of course, there are several ways to get closer, but in this post I’ll try to focus on the Process module. But before we start, I recommend that you quickly remind the differences between a process and a thread:

What Is Wrong With Ruby as Multi-threaded Programming Language?

Ruby offers the Thread class that implements several methods for handling concurrent tasks. It sounds really promising on paper – opening new threads in which we can execute code and then wait until each thread finishes. Awesome, right?

Unfortunately, it is not as amazing as it seems. Why? First of all, you need to know what it really looks like under the hood.

In the whole post I will be using a simple Fibonacci sequence algorithm, because it takes some time to compute:

1def fib(n)
2  return n if [0,1].include?(n)
3  fib(n-1) + fib(n-2)
4end

tsx

I prepared 2 benchmarks based on a method that looks for 35-element of the Fibonacci sequence. The first one executed fib(35) 10 times. The second one does the same thing but using threads. I also ran these benchmarks 3 times to ensure that the results are repeatable (I used a MacBookPro with 2 core 2,4 GHz Intel Core i5 and 8GB RAM):

1Benchmark.measure { 10.times { fib(35) } }
2
3(CPU time|system CPU time|user and system CPU times|real time)
438.243695 0.647830 38.891525 ( 41.074481)
536.667084 0.550266 37.217350 ( 38.464907)
638.844508 0.711785 39.556293 ( 42.610056)

tsx

=>AVG: 40.72s

1Benchmark.measure {
2  threads = []
3  10.times do
4    threads << Thread.new { Thread.current[:output] = fib(35) }
5  end
6  threads.each { |thread| thread.join }
7}
8
938.623686 0.611559 39.235245 ( 40.751415)
1038.077194 0.579472 38.656666 ( 39.956344)
1138.445872 0.603536 39.049408 ( 40.273643)

tsx

=>AVG: 40.33s

The results are almost the same (the last column in bracket is the real time of execution).

Why works it like this? Let’s dig a bit.

Ruby interpreter (Matz's Ruby Interpreter) uses Global Interpreter Lock (GIL) which is also used by other interpreters, such as CPython. GIL controls the execution in threads – only one thread can be executed at a time. Thus the benchmarks above are the same – in both cases, only one task is processed at a time.

Each Ruby process always has one dedicated GIL that handles this process. Probably your first thought is – can’t we just turn off GIL? But it is not as easy as it seems – Ruby needs GIL because it avoids executions that aren’t thread-safe – for instance by the execution of non-atomic operations.

We can define an atomic operation as any operation that is uninterruptible.

Robert C. Martin

Clean Code

It is worth checking out Ruby implementations using other interpreters. One of them is JRuby based on Java Virtual Machine – it has no GIL and handles real threading.

Process-based parallelism

Ruby provides a Process module which we can use to create new processes. Let’s try the multi-processes fib(n) execution:

1Benchmark.measure {
2  read_stream, write_stream = IO.pipe
3  10.times do
4    Process.fork do
5      write_stream.puts fib(35)
6    end
7  end
8  Process.waitall
9  write_stream.close
10  results = read_stream.read
11  read_stream.close
12}
13
140.001240 0.005190 63.827237 ( 17.158324)
150.001579 0.007635 65.032995 ( 19.821757)
160.001433 0.006900 64.022068 ( 18.152649)

tsx

=>AVG: 18.38s

In this way, the execution took 22 seconds less than when using a single process implementation. I think it is a pretty good result. The OS scheduled new processes depending on which thread and core will be used to execute the code, and for how long. I have 2 cores on my MacBook Pro – the performance increased twofold (execution time is twice as fast) – do you see the analogy? More cores = better performance (in simplification and on condition that other processes won’t block them).

Process Module – a Magic Cure?

You may know multiprocessing from Chrome browser – each tab, for security reasons, exists in a separate process. In Ruby environment creating a new child-processes may increase performance, but it also entails certain restrictions. First of all, new processes put additional responsibilities on the developer. Extra care is required for their execution.

We always have to answer a few questions: will this solve our problems? When should we use multi-process architecture? How many processes should we run at one time? Do we need some kind of process limiter? How can too many existing processes affect our system? Will we be able to control the number of children-processes? What happens to the children-processes if the parent-process is killed? When is it worth using?

It clearly shows – there are a lot of considerations along the way. Let’s try to resolve a few of them.

When It makes Sense

Creating a multi-process application is much harder than creating a multi-threaded application. It makes sense when the number of new processes isn’t too big, their execution takes a long time (creating a process is a bit expensive – especially in MS Windows), we have a multi-core processor, we don’t want to share data between processes (or if we know how to share them safely) and when we don’t care about returning data from the process (which is a bit problematic). In general – each process should be independent, and the parent process should be the controller of these processes. Below you will find an example of a multi-process application.

	Thread	Process
Memory	It uses less memory thanks to shared memory and working in the scope of a single process	Everything (including shared memory) is isolated in the scope of the process, so it uses more memory
Communication	We can easily return value using shared memory	Requires using (IPC) as signals
Persistence	It exists in one process, so it always ends with it	There is a possibility to have “zombie” processes if the parent process are killed
Initialization	It’s faster in creating and deleting threads	It’s faster in creating and deleting threads
Maintenance	It has fewer potential issues, is easier to implement, but can be more difficult to debug	It’s easier to debug, but we have to take care of process persistence, zombies, etc.

Too Many Existing Processes

In the previous example I forked 10 additional processes that counted the 35th-element of the Fibonacci sequence. What happens if I change this to a greater number of processes?

1# ...
220.times do
3  Process.fork do
4    write_stream.puts fib(35)
5  end
6end
7# ...

tsx

When the program was running I called ps:

1➜ work ps | grep ruby
268743 ttys010 0:00.18 ruby test.rb
368756 ttys010 0:00.47 ruby test.rb
468757 ttys010 0:00.46 ruby test.rb
568758 ttys010 0:00.47 ruby test.rb
668759 ttys010 0:00.46 ruby test.rb
768760 ttys010 0:00.43 ruby test.rb
868761 ttys010 0:00.42 ruby test.rb
968762 ttys010 0:00.42 ruby test.rb
1068763 ttys010 0:00.43 ruby test.rb
1168764 ttys010 0:00.43 ruby test.rb
1268765 ttys010 0:00.43 ruby test.rb
1368766 ttys010 0:00.43 ruby test.rb
1468767 ttys010 0:00.43 ruby test.rb
1568768 ttys010 0:00.43 ruby test.rb
1668769 ttys010 0:00.43 ruby test.rb
1768770 ttys010 0:00.43 ruby test.rb
1868771 ttys010 0:00.43 ruby test.rb
1968772 ttys010 0:00.44 ruby test.rb
2068773 ttys010 0:00.43 ruby test.rb
2168774 ttys010 0:00.44 ruby test.rb
2268775 ttys010 0:00.42 ruby test.rb

tsx

We have 21 ruby processes (1 parent and 20 subprocesses) – is it much? Actually we don’t know, because it depends on factors like hardware or current system load.

Please take a look at the output from HTOP:

Idle:

Terminal screen showing system stats: 302 tasks, 1 running, load average 4.98, 11.25, 2.79, uptime 17 days, 6:46:47. Single-process script:

A terminal screen displays system statistics: CPU usage, tasks, load average, and uptime. Various colored bars represent usage levels. Multi-processes script:

A terminal display showing CPU usage across eight cores with color-coded bars, indicating varying levels of activity and system load statistics. At first glance, we can see that multi-processes script makes better use of the computing power of my computer. I mentioned earlier that my processor has 2 physical cores, we can see here 4 thanks to Hyperthreading – Intel technology that divides one core into 2 virtual ones.

So can there be too many tasks (processes) in the operating system scheduler? The OS provides some limitation (depending on the platform). Unix systems have a built-in command “ulimit” which defines 2 types of limits:

Hard – only root can set this and it can’t be exceeded
Soft – can be exceeded if necessary

In Linux the limit of processes is set in the file /etc/security/limits.conf. On MacOS we can use launchctl limit maxproc (the first value is a soft limit, the second one is a hard limit). You can read more here.

Common sense says we shouldn’t create too many subprocesses. The screenshot from HTOP when Multi-processes script was running is a good example – processes requiring a large amount of computing power can consume even 100% of the CPU, which can lead to the loss of stability of the entire system! On top of that, we should care of memory. Let’s say one simple sub-process needs 10MB of memory and we want to fork it 10 times (1 parent, 10 children) – don’t be surprised, it will take more than 100MB of memory.

Limitations

Limiting processes in Ruby is a complex problem. I started from a simple function, but unfortunately with a failure:

1def execute
2  read, write = IO.pipe
3  30.times do
4    process_limiter
5    Process.fork do
6      write.puts fib(2)
7    end
8  end
9  Process.waitall
10  write.close
11  results = read.read
12  read.close
13end
14
15def process_limiter
16  while current_processes > 15 do
17    sleep(0.1) # there should be a better script to check if the number of children is decreasing
18  end
19end
20
21def current_processes
22  IO.popen('ps | grep "[r]uby"').read.split("\n").size
23end
24
25execute

tsx

Process.waitall, according to the documentation "waits for all children, returning an array of pid/status pairs". All forked processes exists until the .waitall method is executed. Because of that, we can’t check ps | grep "[r]uby" as above. Children-processes send the SIGCHLD signal to the parent-process if they exist, are interrupted, or resumed after interruption. Unfortunately Ruby doesn’t have a method that can list all current processes. It would be great if we could check simple (pseudocode):

1Process.children

tsx

with output:

1[
2  [13013, #<Process::Status: pid 13013 exit 0>],
3  [13014, #<Process::Status: pid 13014 running>],
4  [13015, #<Process::Status: pid 13015 running>]
5]

tsx

To achieve it we can use process status, which we can find, for instance, in ps aux:

➜ ps aux | grep test.rb

Terminal screenshot showing process details, including PID, CPU usage, memory stats, start time, and command for three instances of "test.rb." As you can see – two processes have the status R+ (running in the foreground) and 1 has S+ (sleeping in the foreground). This can be quite useful information, description of all statuses can be found by entering: man ps.

Because Ruby can’t simply kill the completed process when other processes are still running (this is the responsibility of the .wait method) it makes it much harder to implement a process limiter, so we have to rely on the OS features and our brainpower.

The Process module offers also .detach method that we can use instead of .wait – it works similarly with the difference that with detach we don’t wait for the child process. In our example we care about the result: we have to wait.

Killed Parent

I used kill to terminate my parent-process.

[1] 4707 terminated ruby test.rb

Unfortunately, the parent-process doesn’t inform its children that it has been terminated, so all processes work as if nothing happened – they become the so-called zombie processes. It can also be problematic – what if the process is a long-running job that does something and returns value? His work will be redundant + it consumes resources unnecessarily.

Groups

Each process belongs to a group of processes. Thanks to this we can have better control the processes of our children.

Process groups allow the system to keep track of which processes are working together and hence should be managed together via job control.

Michael K. Johnson, Erik W. Troan

Linux Application Development

We can find a description of .setsid method in the Process module documentation: “Establishes this process as a new session and process group leader, with no controlling tty. Returns the session id. Not available on all platforms.'' After setsid our process will be the session leader for this session group. Process Group ID (pgid) will also be set to the value of Process ID (pid). To demonstrate this, I wrote a simple script:

1def compare_pids(context)
2  puts "#{context} - PID: #{Process.pid}, process group ID: #{Process.getpgrp}, session ID: #{Process.getsid}"
3end
4
5def exists?(pid)
6  system("ps #{pid} | grep ruby") ? true : false
7end
8
9compare_pids("From parent process")
10
11read_stream, write_stream = IO.pipe
12child = Process.fork do
13  compare_pids("From #1 forked process")
14  Process.setsid
15  compare_pids("From #1 forked process, after setsid")
16
17  pid_child_1 = Process.fork do
18    compare_pids("From #1.1 forked process")
19    sleep 100
20  end
21
22  pid_child_2 = Process.fork do
23    compare_pids("From #1.2 forked process")
24    sleep 100
25  end
26
27  write_stream.puts "#{pid_child_1}|#{pid_child_2}"
28  write_stream.close
29  Process.waitall
30end
31sleep 2
32
33results = read_stream.gets
34read_stream.close
35pid_child_1, pid_child_2 = results[(0..-2)].split("|")
36
37child_pgid = Process.getpgid(child)
38puts "From parent process:"
39puts "Process Group ID of child: #{child_pgid}, child pid: #{child}"
40puts "Process Group ID of child exists?: #{exists?(child_pgid)}, child pid exists?: #{exists?(child)}"
41puts "pid_child_1 exists?: #{exists?(pid_child_1)}, pid_child_2 exists?: #{exists?(pid_child_2)}"
42
43Process.kill('HUP', -child_pgid)
44puts "Killed child pgid: #{child_pgid}"
45puts "Process Group ID of child exists?: #{exists?(child_pgid)}, child pid exists?: #{exists?(child)}"
46puts "pid_child_1 exists?: #{exists?(pid_child_1)}, pid_child_2 exists?: #{exists?(pid_child_2)}"
47
48Process.waitall
49puts "After waitall:"
50puts "Process Group ID of child exists?: #{exists?(child_pgid)}, child pid exists?: #{exists?(child)}"
51
52From parent process - PID: 15496, process group ID: 15496, session ID: 9817
53From #1 forked process - PID: 15509, process group ID: 15496, session ID: 9817
54From #1 forked process, after setsid - PID: 15509, process group ID: 15509, session ID: 15509
55From #1.1 forked process - PID: 15510, process group ID: 15509, session ID: 15509
56From #1.2 forked process - PID: 15511, process group ID: 15509, session ID: 15509
57
58From parent process:
59Process Group ID of child: 15509, child pid: 15509
60Process Group ID of child exists?: true, child pid exists?: true
61pid_child_1 exists?: true, pid_child_2 exists?: true
62
63Killed child pgid: 15509
64Process Group ID of child exists?: true, child pid exists?: true
65pid_child_1 exists?: false, pid_child_2 exists?: false
66
67After waitall:
68Process Group ID of child exists?: false, child pid exists?: false

tsx

Please take a look at pgid in our forked process – the value is the same as the parent PID until we initialize a new session. This knowledge is quite important – we know that the PID value can also be a process group ID, so if we want to use detach or kill – we can provide gpid as well. This makes it much easier to manage our processes. When we called Process.kill('HUP', -child_pgid) (negative value is used to kill process groups instead of processes) we killed all processes in our group.

If you want to learn more about groups and processes, definitely check out Linux Application Development by Michael K. Johnson and Erik W. Troan or at least this cool article, where you can find a bunch of useful information about processes, zombies, daemons, exit codes and signals.

Real Life Example

listeners.rb:

1require "rack"
2
3class ListenerCommand
4  def initialize
5    @allocations = {}
6  end
7
8  def add(port)
9    return if allocated_ports.include?(port)
10    pid = fork_process { Listener.new(port).run }
11    allocations[port] = pid
12  end
13
14  def allocated_ports
15    allocations.keys
16  end
17
18  def pids
19    allocations.values
20  end
21
22  private
23
24  attr_reader :allocations
25
26  def fork_process
27    Process.fork do
28      yield
29    end
30  end
31end
32
33class Listener
34  def initialize(port)
35    @port = port
36  end
37
38  def run
39    app = Proc.new do |env|
40        request = Rack::Request.new(env)
41        log(request)
42        ["200", {"Content-Type" => "text/html"}, ["Ruby ♥."]]
43    end
44
45    Rack::Handler::WEBrick.run(app, Port: port)
46  end
47
48  private
49  attr_reader :port
50
51  def log(request)
52    output = "#{request.base_url} visited at #{Time.now} with params: #{request.params}\n"
53    File.write("#{port}_log.txt", output, mode: "a")
54  end
55end
56
57listeners = ListenerCommand.new
58
59listeners.add(8000)
60listeners.add(8010)
61listeners.add(8020)
62
63puts "Allocated ports: #{listeners.allocated_ports}"
64puts "PIDs: #{listeners.pids}"
65
66begin
67  Process.waitall
68rescue SignalException => e
69  listeners.pids.each do |pid|
70    Process.kill("HUP", pid)
71  end
72end

tsx

=> Allocated ports: [8000, 8010, 8020] PIDs: [5927, 5928, 5929]

➜ cat 8000_log.txt

1http://localhost:8000 visited at 2019-07-27 09:35:08 +0200 with params: {"blah"=>"hoo"}
2http://localhost:8000 visited at 2019-07-27 09:35:54 +0200 with params: {"port"=>"8000"}

tsx

➜ cat 8010_log.txt

1http://localhost:8010 visited at 2019-07-27 09:40:17 +0200 with params: {"ruby"=>"yea"}
2http://localhost:8010 visited at 2019-07-27 09:40:33 +0200 with params: {"port"=>"8010"}

tsx

➜ cat 8020_log.txt

1http://localhost:8020 visited at 2019-07-27 09:40:17 +0200 with params: {"foo"=>"bar"}
2http://localhost:8020 visited at 2019-07-27 09:40:33 +0200 with params: {"port"=>"8020"}

tsx

The program above creates three new processes using the .add method defined in ListenerCommand class. After process fork, ListenerCommand adds the allocated port and pid of the process to the allocations hash.

After that program begins to wait for all processes: Process.waitall. If all processes are killed – the program will finish. Also if the user attempts to kill the parent process, to avoid orphans processes, the program will catch SignalException exception and kill created processes.

Of course, this is only a skeleton of application, for instance - what if other exceptions occur? We always should consider all possible cases.

Custom web app for live art auctions' market leader

Artinfo needed a seamless, real-time online auction platform to modernize bidding and engage more users. We built a scalable web platform that enhanced data processing, increased participation, and boosted sales.

75k

Unique users a month

80%

Of phone bids transferred online

300

Online bidding yearly

Learn more

Is Multi-processing a Good Alternative to Threads?

Everyone should take some time to consider the question – does my project really need multiple processes? Multi-process applications can generate many more problems and are harder to implement. Make sure you are aware of what you do and why you do it.

It’s also good to know a bit about the operating system – how will the new processes be scheduled? Why are they scheduled in this particular way? But if you want to try, it’s always worth checking if the pros and cons of multiprocessing are in line with business and technological requirements. Thread.new seems to be safer and has fewer potential issues, so if you really need parallelisation, you should also consider using JRuby or Rubinius.

Web Development