Real-time Micro-conference accepted at the Linux Plumbers Conference 2019

From the announcement:

We are pleased to announce that the Real-Time Microconference has been
accepted into the 2019 Linux Plumbers Conference! The PREEMPT_RT patch
set (aka “The Real-Time Patch”) was created in 2004 in the effort to
make Linux into a hard real-time designed operating system. Over the
years much of the RT patch has made it into mainline Linux, which
includes: mutexes, lockdep, high-resolution timers, Ftrace,
RCU_PREEMPT, priority inheritance, threaded interrupts and much more.
There’s just a little left to get RT fully into mainline, and the light
at the end of the tunnel is finally in view. It is expected that the RT
patch will be in mainline within a year, which changes the topics of
discussion. Once it is in Linus’s tree, a whole new set of issues must
be handled. The focus on this year’s Plumbers events will include:

 – Real-time containers
 – Rework of softirqs (Requirement for the PREEMPT-RT merge)  [2]
 – An in-kernel view of latency [1]
 – Improvements in the locking determinism [4]
 – Advances in the RCU for reducing the per-cpu workload
 – The effects of BPF in the kernel latency
 – Core scheduling and Real-time schedulers [5]
 – Maintaining the RT stable trees [3]
 – New tools to test RT kernels [6]
 – New bootup self-tests
 – New types of failures that lockdep can detect after RT is merged

Come and join us in the discussion of making the LWN prediction of RT
coming into mainline “this year” a reality!
We hope to see you there[7]!
LPC[8] will be held in Lisbon, Portugal from Monday, September 9
through Wednesday, September 11.

[1]  Continuation of last year’s “New metrics for the PREEMPT RT”
discussion: 
http://bristot.me/wp-content/uploads/2018/11/new_metrics_for_the_rt.pdf
[2] 
https://lore.kernel.org/lkml/20190228171242.32144-1-frederic@kernel.org/
[3] 
https://wiki.linuxfoundation.org/realtime/preempt_rt_versions
[4] 
https://lwn.net/Articles/767953/ & continuation of last year’s
“SCHED_DEADLINE desiderata and slightly crazy ideas.”
[5] 
https://lwn.net/Articles/780703/
[6] Continuation of last year’s discussion: “How can we catch problems that can break the PREEMPT_RT preemption model?”
http://bristot.me/wp-content/uploads/2018/11/model_checker.pdf
[7] 
https://www.linuxplumbersconf.org/event/4/page/34-accepted-microconferences#realtime
[8] 
https://linuxplumbersconf.org

Kernel Recipes 2019 Talk!

I was invited to give a talk a the Kernel Recipes Conference in Paris. So, I decided to talk about formal modeling, but in an easy way. Here is the description of the talk. I hope people enjoy it.

Formal modeling made easy

Modeling parts of Linux has become a recurring topic. For instance, the memory model, the model for PREEMPT_RT synchronization, and so on.
But the term “formal model” causes panic for most of the developers. Mainly because of the complex notations and reasoning that involves formal languages. It seems to be a very theoretical thing, far from our day-by-day reality.

Believe me. Modeling can be more practical than you might guess!

This talk will discuss the challenges and benefits of modeling, based on the experience of developing the PREEMPT_RT model. It will present a methodology for modeling the Linux behavior as Finite-State Machines (automata), using terms that are very known by kernel developers: tracing events! With the particular focus on how to use models for the formal verification of Linux kernel, at runtime, with low overhead, and in many cases, without even modifying Linux kernel!

Real-time micro-conference proposal for Linux Plumbers 2019

Last year we had a very fun and productive real-time micro-conference at the Linux Plumbers Conference, so the idea is to repeat it this year!

Are you interested in participating? Do you have any suggestion of topic? Let us know!

This is the current proposal of micro-conference, we can still edit it and add your thoughts.

Proposal

Since 2004 a project has improved the Real-time and low-latency features for Linux. This project has become know as PREEMPT_RT, formally the real-time patch. Over the past decade, many parts of the PREEMPT RT became part of the official Linux code base. Examples of what came from PREEMPT_RT include: Real-time mutexes, high-resolution timers, lockdep, ftrace, RT scheduling, SCHED_DEADLINE, RCU_PREEMPT, generic interrupts, priority inheritance futexes, threaded interrupt handlers and more. The number of patches that needs integration has been reduced on the last years, and the pieces left are now mature enough to make its way into mainline Linux. This year could possibly be the year PREEMPT_RT is merged ™!

In the final lap of this race, the last patches are on the way to be merged, but there are still some pieces missing. When the merge occurs, the preempt-rt will start to follow a new pace: the Linus one. So, it is possible to raise the following discussions:

  1.  The status of the merge, and how can we resolve the last issues that block the merge;
  2. How can we improve the testing of the -rt, to follow the problems raised as Linus tree advances;
  3. What’s next?

Possible topics:

  • Status of the PREEMPT_RT Merge
  • Merge – what is missing and who can help?
  • How do we teach the rest of the kernel developers how not to break PREEMPT_RT?
  • Stable maintainers tools discussion & improvements.
  • Interrupt threads are RT and are not protected by the RT Throttling. How can we prevent interrupt thread starvation from a rogue RT task?
  • Improvements on full CPU isolation
  • Newer methods like proxy execution, hierarchical scheduler?
  • What tools can we add into tools/ that other kernel developers can use to test and learn about PREEMMPT_RT?
  • What tests can we add into tools/testing/selftests?
  • New tools for timing regression test, e.g. locking, overheads…
  • What kernel boot self-tests can be added?
  • Discuss various types of failures that can happen with PREEMPT_RT that normally would not happen in the vanilla kernel, e.g, with lockdep, preemption model.

I will suggest the continuation of the discussions of the topics I presented last year, based in the results of the work I did this year (spoiler: I am doing model verification in the kernel, and it is very efficient!).

Paul already told that he could talk about ongoing work to make RCU’s forward progress be more robust in overloaded cloud deployments. There will be some connection to real-time response involving rcu_poll and the infamous RCU kthread priority!

The continuation of the discussion of topics from last year’s micro-conference, including the development done during this (almost) year, are also welcome!

Early context tracking patch set: fixing perf & ftrace losing events

Some time ago, while using perf to check the automaton model, I noticed that perf was losing events. The same was reproducible with ftrace.

Steve pointed to a problem in the identification of the context execution used by the recursion control.

Currently, recursion control uses the preempt_counter to identify the current context. The NMI/HARD/SOFT IRQ counters are set in the preempt_counter in the irq_enter/exit functions.

In a trace, they are set like this:

 0)   ==========> |
 0)               |  do_IRQ() {		/* First C function */
 0)               |    irq_enter() {
 0)               |      		/* set the IRQ context. */
 0)   1.081 us    |    }
 0)               |    handle_irq() {
 0)               |     		/* IRQ handling code */
 0) + 10.290 us   |    }
 0)               |    irq_exit() {
 0)               |      		/* unset the IRQ context. */
 0)   6.657 us    |    }
 0) + 18.995 us   |  }
 0)   <========== |

As one can see, functions (and events) that take place before the set and after unset the preempt_counter are identified in the wrong context, causing the miss interpretation that recursion is taking place. When this happens, events are dropped.

To resolve this problem, the set/unset of the IRQ/NMI context needs to be done before the execution of the first C execution, and after its return. By doing so, and using this method to identify the context in the trace recursion protection, no more events are lost.

A possible solution is to use a per-cpu variable set and unset in the entry point of NMI/IRQs, before calling the C handler.

This possible solution is presented in this patch series as a proof of concept, for x86_64. Let’s see what kind of comments we will receive!

New tracepoints for External IRQ and NMIs

The irq_vectors class of tracepoints adds two events for each interrupt: One at the entry and one at the return of each interrupt handler. They are very useful in the analysis of latency, as they point to the occurrence of an interrupt, and how long did it take to run.

However, there are two cases of interrupt which these tracepoints were not present: NMIs and regular device interrupt. So I suggested two new tracepoints, for these cases.

The output of these new tracepoints looks like this:

 idle-0     [000] d.h.   102.890935: external_interrupt_entry: vector=35
 idle-0     [000] d.h.   102.890960: external_interrupt_exit: vector=35
 idle-0     [000] d.Z.   179.594315: nmi_entry: vector=2
 idle-0     [000] d.Z.   179.594396: nmi_exit: vector=2

The discussion of the patch set can be found here:

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1969152.html

V5: x86/jump_label: Bound IPIs sent when updating a static key

Today I’ve sent the v5 of the x86/jump_label: Bound IPIs sent when updating a static key patch set. The list of changes from the previous version is:

  • Renamed jump_label_can_update_check() to jump_label_can_update() (Borislav Petkov)
  • Switch jump_label_can_update() return values to bool (Borislav Petkov)
  • Accept the fact that some lines will be > 80 characters (Borislav Petkov)
  • Local variables are now in the inverted Christmas three order (Borislav Petkov)
  • Removed superfluous helpers. (Borislav Petkov)
  • Renamed text_to_poke to text_patch_loc, and handler to detour (Borislav Petkov & Steven Rostedt)
  • Re-applied the suggestion of not using BUG() from steven (Masami Hiramatsu)
  • arch_jump_label_transform_queue() now returns 0 on success and
    -ENOSPC/EINVAL on errors (Masami Hiramatsu)

I think we are not far from an acceptance. The code introduces a lot of optimizations suggested by Masami, Steven, and Borislav! But it might have one or two rounds of small fixes/polishment.

Here is the link for the submission:

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1968209.html

Another bug found with the model

While running the model against the 4.19-rt kernel, the following non-possible event chain was faced:

+ T   32019    2564.541336    [000]              preempt_disable ->    q8250 
+ T   32019    2564.541336    [000]             local_irq_enable ->   q13544 
+ T   32019    2564.541337    [000]               preempt_enable ->   q19599 
+ T   32019    2564.541337    [000]                   write_lock ->   q21325 
+ T   32019    2564.541337    [000]                   mutex_lock ->   q16217 
+ T   32019    2564.541338    [000]               mutex_acquired ->   q21325 
+ T   32019    2564.541338    [000]    sched_set_state_sleepable ->    q8561 
+ T   32019    2564.541338    [000]            local_irq_disable ->    q3392 
+ T   32019    2564.541338    [000]              preempt_disable ->   q20336 
+ T   32019    2564.541339    [000]     sched_set_state_runnable ->    q9970 
+ T   32019    2564.541339    [000]             local_irq_enable ->   q15342 
+ T   32019    2564.541339    [000]               preempt_enable ->   q21325 
+ T   32019    2564.541339    [000]               write_acquired ->   q19599 
+ T   32019    2564.541340    [000]            local_irq_disable ->   q14441 
+ T   32019    2564.541340    [000]              preempt_disable ->    q8250 
+ T   32019    2564.541342    [000]             local_irq_enable ->   q13544 
+ I   32019    2564.541344    [000]         hw_local_irq_disable ->   q18001 
+ I   32019    2564.541345    [000]                   mutex_lock
        2564.541345 event mutex_lock is not expected in state q18001
===== reseting model =====

In summary, a thread was running, then IRQs are disabled by the hardware (due to an interrupt), and a mutex lock was tried. This operation is not permitted.

The raw trace of these events shows the following events:

bash 32019 [000]  2564.541337:                   lock:rw_sem_request: write 0xffff9e992418fa90F unlink_file_vma+0x2d
bash 32019 [000]  2564.541337:                 lock:rt_mutex_request: 0xffff9e992418fa98F __down_write_common+0x40
bash 32019 [000]  2564.541338:                lock:rt_mutex_acquired: 0xffff9e992418fa98F __down_write_common+0x40
bash 32019 [000]  2564.541338:                 sched:sched_set_state: sleepable
bash 32019 [000]  2564.541338:                preemptirq:irq_disable: caller=__down_write_common+0x8b parent=(nil)F
bash 32019 [000]  2564.541338:            preemptirq:preempt_disable: caller=__down_write_common+0x95 parent=__down_write_common+0x95
bash 32019 [000]  2564.541339:                 sched:sched_set_state: runnable
bash 32019 [000]  2564.541339:                 preemptirq:irq_enable: caller=__down_write_common+0x337 parent=(nil)F
bash 32019 [000]  2564.541339:             preemptirq:preempt_enable: caller=__down_write_common+0x34b parent=__down_write_common+0x34b
bash 32019 [000]  2564.541339:                  lock:rw_sem_acquired: write 0xffff9e992418fa90F unlink_file_vma+0x2d
bash 32019 [000]  2564.541340:                   lock:rw_sem_release: write 0xffff9e992418fa90F free_pgtables+0xbe
bash 32019 [000]  2564.541340:                preemptirq:irq_disable: caller=__up_write+0x2c parent=(nil)F
bash 32019 [000]  2564.541340:            preemptirq:preempt_disable: caller=__up_write+0x36 parent=__up_write+0x36
bash 32019 [000]  2564.541342:                 preemptirq:irq_enable: caller=__up_write_unlock+0x75 parent=(nil)F
bash 32019 [000]  2564.541344:                preemptirq:irq_disable: caller=trace_hardirqs_off_thunk+0x1a parent=interrupt_entry+0xda
bash 32019 [000]  2564.541344:         irq_vectors:local_timer_entry: vector=236
bash 32019 [000]  2564.541345:                 lock:rt_mutex_request: pendingb_lock+0x0 queue_work_on+0x41

And voilà! a real-time mutex is being taken in the timer IRQ.

This violates two specifications, that specifies that it is not possible to take mutexes in with the interrupts disabled and that IRQ handlers do not take mutexes, as in Figure bellow:


Mutex cannot be taken with interrupts disabled.

Operations that are disabled in the interrupt handlers.


It is possible then to observe the code path that takes the rt_spin_lock (a mutex, in the practice):

 0)   ==========> |
 0)               |  smp_apic_timer_interrupt() {
 0)               |    __irq_enter_early_tp() {
 0)               |      rcu_irq_enter() {
 0)   0.690 us    |        rcu_nmi_enter();
 0)   2.032 us    |      }
 0)   3.361 us    |    }
 0)               |    irq_enter() {
 0)               |      rcu_irq_enter() {
 0)   0.635 us    |        rcu_nmi_enter();
 0)   1.792 us    |      }
 0)               |      tick_irq_enter() {
 0)   0.792 us    |        tick_check_oneshot_broadcast_this_cpu();
 0)   0.897 us    |        ktime_get();
 0)               |        update_ts_time_stats.constprop.32() {
 0)   0.594 us    |          nr_iowait_cpu();
 0)   1.926 us    |        }
 0)               |        tick_do_update_jiffies64.part.17() {
 0)   0.668 us    |          preempt_count_add();
 0)   0.594 us    |          preempt_count_add();
 0)               |          do_timer() {
 0)   0.676 us    |            calc_global_load();
 0)   1.808 us    |          }
 0)   0.638 us    |          preempt_count_sub();
 0)   0.551 us    |          preempt_count_sub();
 0)               |          update_wall_time() {
 0)               |            timekeeping_advance() {
 0)   0.567 us    |              preempt_count_add();
 0)   0.554 us    |              ntp_tick_length();
 0)   0.552 us    |              ntp_tick_length();
 0)   0.560 us    |              preempt_count_add();
 0)               |              timekeeping_update() {
 0)   0.552 us    |                ntp_get_next_leap();
 0)   0.769 us    |                update_vsyscall();
 0)               |                raw_notifier_call_chain() {
 0)   0.554 us    |                  notifier_call_chain();
 0)   1.621 us    |                }
 0)   0.565 us    |                update_fast_timekeeper();
 0)   0.576 us    |                update_fast_timekeeper();
 0)   7.533 us    |              }
 0)   0.614 us    |              preempt_count_sub();
 0)   0.608 us    |              preempt_count_sub();
 0) + 16.109 us   |            }
 0) + 17.268 us   |          }
 0) + 25.610 us   |        }
 0) + 32.431 us   |      }
 0)               |      irqtime_account_irq() {
 0)   0.547 us    |        in_serving_softirq();
 0)   2.045 us    |      }
 0)   0.561 us    |      preempt_count_add();
 0) + 39.845 us   |    }
 0)               |    hrtimer_interrupt() {
 0)   0.626 us    |      preempt_count_add();
 0)   0.786 us    |      ktime_get_update_offsets_now();
 0)               |      __hrtimer_run_queues.constprop.28() {
 0)   1.021 us    |        __remove_hrtimer();
 0)   0.574 us    |        preempt_count_sub();
 0)               |        watchdog_timer_fn() {
 0)               |          completion_done() {
 0)   0.541 us    |            preempt_count_add();
 0)   0.554 us    |            preempt_count_sub();
 0)   2.946 us    |          }
 0)               |          queue_work_on() {
 0)   0.597 us    |            migrate_disable();
 0)               |            rt_spin_lock() {
 0)   0.554 us    |              migrate_disable();
 0)               |              /* rt_mutex_request: pendingb_lock+0x0/0x40 return_to_handler+0x0/0x36 */
 0)               |              /* rt_mutex_acquired: pendingb_lock+0x0/0x40 return_to_handler+0x0/0x36 */
 0)   3.350 us    |            }
 0)               |            __queue_work() {
 0)   0.542 us    |              __rcu_read_lock();
 0)   0.593 us    |              get_work_pool();
 0)               |              rt_spin_lock() {
 0)   0.562 us    |                migrate_disable();
 0)               |                /* rt_mutex_request: cpu_worker_pools+0x0/0x680 return_to_handler+0x0/0x36 */
 0)               |                /* rt_mutex_acquired: cpu_worker_pools+0x0/0x680 return_to_handler+0x0/0x36 */
 0)   2.446 us    |              }
 0)               |              insert_work() {
 0)   0.563 us    |                get_pwq.isra.32();
 0)               |                wake_up_worker() {
 0)   0.547 us    |                  preempt_count_add();
 0)               |                  wake_up_process() {
 0)               |                    try_to_wake_up() {
 0)   0.560 us    |                      preempt_count_add();
 0)   0.558 us    |                      preempt_count_add();
 0)               |                      update_rq_clock() {
 0)   0.590 us    |                        kvm_steal_clock();
 0)   1.820 us    |                      }
 0)               |                      activate_task() {
 0)               |                        enqueue_task_fair() {
 0)               |                          enqueue_entity() {
 0)   0.580 us    |                            update_curr();
 0)   0.626 us    |                            update_cfs_group();
 0)   0.587 us    |                            __enqueue_entity();
 0)   4.352 us    |                          }
 0)   0.553 us    |                          hrtick_update();
 0)   6.598 us    |                        }
 0)   7.895 us    |                      }
 0)               |                      check_preempt_curr() {
 0)   0.629 us    |                        resched_curr();
 0)   2.035 us    |                      }
 0)   0.555 us    |                      preempt_count_sub();
 0)   0.548 us    |                      preempt_count_sub();
 0) + 19.316 us   |                    }
 0) + 20.400 us   |                  }
 0)   0.567 us    |                  preempt_count_sub();
 0) + 23.781 us   |                }
 0) + 26.107 us   |              }
 0)               |              rt_spin_unlock() {
 0)               |                /* rt_mutex_release: cpu_worker_pools+0x0/0x680 return_to_handler+0x0/0x36 */
 0)   0.580 us    |                migrate_enable();
 0)   2.343 us    |              }
 0)   0.576 us    |              __rcu_read_unlock();
 0) + 36.708 us   |            }
 0)               |            queue_delayed_work_on.part.49() {
 0)               |              rt_spin_unlock() {
 0)               |                /* rt_mutex_release: pendingb_lock+0x0/0x40 return_to_handler+0x0/0x36 */
 0)   0.547 us    |                migrate_enable();
 0)   2.034 us    |              }
 0)   0.968 us    |              migrate_enable();
 0)   4.576 us    |            }
 0) + 48.171 us   |          }
 0)   0.704 us    |          ktime_get();
 0)   0.552 us    |          hrtimer_forward();
 0)   0.622 us    |          __touch_watchdog();
 0) + 57.152 us   |        }
 0)   0.543 us    |        preempt_count_add();
 0)   0.771 us    |        enqueue_hrtimer();
 0) + 63.902 us   |      }
 0)               |      __hrtimer_get_next_event() {
 0)   1.038 us    |        __hrtimer_next_event_base();
 0)   0.673 us    |        __hrtimer_next_event_base();
 0)   3.437 us    |      }
 0)   0.558 us    |      preempt_count_sub();
 0)               |      tick_program_event() {
 0)               |        clockevents_program_event() {
 0)   0.658 us    |          ktime_get();
 0)   8.836 us    |          lapic_next_deadline();
 0) + 11.393 us   |        }
 0) + 12.489 us   |      }
 0) + 86.907 us   |    }
 0)               |    irq_exit() {
 0)   0.722 us    |      irqtime_account_irq();
 0)   0.602 us    |      preempt_count_sub();
 0)   0.572 us    |      idle_cpu();
 0)               |      rcu_irq_exit() {
 0)   0.588 us    |        rcu_nmi_exit();
 0)   1.648 us    |      }
 0)   6.500 us    |    }
 0)               |    __irq_exit_late_tp() {
 0)               |      rcu_irq_exit() {
 0)   0.572 us    |        rcu_nmi_exit();
 0)   1.624 us    |      }
 0)   2.690 us    |    }
 0) ! 146.366 us  |  }
 0)   <========== |

In the single core, the following chain of events can take place:

smp_apic_timer_interrupt(){
	hrtimer_interrupt() {
		__hrtimer_run_queues() {
			watchdog_timer_fn() {
				stop_one_cpu_nowait() {
					#ifdef !CONFIG_SMP
						schedule_work() {
							queue_work() {
								queue_work_on() {
								/* phew, long, ah!?.... */	
									local_lock_irqsave() {
										__local_lock_irqsave() {
											__local_lock_irq() {
												spin_lock_irqsave() {
													rt_spin_lock()...
														/* from here on you already know...*/
														/* a lot of } */

And this can cause a schedling while in atomic. Tracking down, the commit that introduced the stop_one_cpu_nowait() to the path was:

commit 9cf57731b63e37ed995b46690adc604891a9a28f
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Thu Jun 7 10:52:03 2018 +0200

    watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work
    
    Oleg suggested to replace the "watchdog/%u" threads with
    cpu_stop_work. That removes one thread per CPU while at the same time
    fixes softlockup vs SCHED_DEADLINE.
    
    But more importantly, it does away with the single
    smpboot_update_cpumask_percpu_thread() user, which allows
    cleanups/shrinkage of the smpboot interface.
    
    Suggested-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
</mingo@kernel.org></tglx@linutronix.de></peterz@infradead.org></torvalds@linux-foundation.org></peterz@infradead.org></oleg@redhat.com></peterz@infradead.org>

later modified by:

commit be45bf5395e0886a93fc816bbe41a008ec2e42e2
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Fri Jul 13 12:42:08 2018 +0200

    watchdog/softlockup: Fix cpu_stop_queue_work() double-queue bug
    
    When scheduling is delayed for longer than the softlockup interrupt
    period it is possible to double-queue the cpu_stop_work, causing list
    corruption.
    
    Cure this by adding a completion to track the cpu_stop_work's
    progress.
    
    Reported-by: kernel test robot <lkp@intel.com>
    Tested-by: Rong Chen <rong.a.chen@intel.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Fixes: 9cf57731b63e ("watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work")
    Link: http://lkml.kernel.org/r/20180713104208.GW2494@hirez.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
</mingo@kernel.org></tglx@linutronix.de></peterz@infradead.org></torvalds@linux-foundation.org></peterz@infradead.org></rong.a.chen@intel.com></lkp@intel.com></peterz@infradead.org>

The BUG report was made, and you can find it here:
https://www.spinics.net/lists/linux-rt-users/msg20376.html

ISORC 2019 Material

Here is the companion material for the paper: Untangling the Intricacies of Thread Synchronization in the PREEMPT_RT Linux Kernel by Daniel Bristot de Oliveira, Rômulo Silva de Oliveira, and Tommaso Cucinotta, to be presented at the ISORC 2019.

NOTE: The latest version of the model and patches can be found here: http://bristot.me/linux-task-model/

The model was developed using the Supremica tool. You can find more information about Supremica here.

The source code of the model can be found here. It can be opened with Supremica. From Supremica it is possible to visualize, edit, analyze, simulate and export the model.

In the Linux part, you can find the patch with perf task_model and tracepoints here. It applies in the version v4.14.15-rt13 of the development tree of the PREEMPT RT.

It is possible to use a virtual machine to run the experiments. Here is the definition of a QEMU/KVM virtual machine and the kernel config file that works with this VM

ERRATA: Prof. Martin Fabian, found that despite the minimized model was “non-blocking” the non-minimized was not. The problem was caused by two missing events (sched_switch_in, sched_switch_in_o) in the state any_thread_running in the specification s19. This difference causes a change in the number of states from 13906 to 13786, and in the transitions from 31708 to 31221. This problem was already resolved, in a version earlier than the submission of the paper. But I (Daniel) forgot to update Table II and the reference model. This error does not change any of the results. The link above points to the correct model. The broken model is here.