Code-Reading: The Racing Of Two Agents
From a bug due to a race condition between two polkit agents, this post explain
the handling of the startup components and autostart apps within
Other than the X server,
In SP2, the
comm is actually called
gnome-session-binary which is
exec-ed from a shell script called
gnome-session. This indirection is meant to solve a environment passing bug. A very interesting problem worth a separate post. is the daemon managing
all the interesting stuff within a session. For our purpose, the startup
sequence is of most interest to us.
There is a Code-Reading(PLANNED) post covering the end session part as well. Other
components talk back to
gnome-session through the exported DBus service
org.gnome.SessionManager. The term
SessionManager is used interchangable
gnome-session does have some documentations oneline. However, they all seem to
be outdated (compared the info I got from the code.). Nevertheless, they still
provide some insights and you can find the longer version
and the newer, shorter version here.
The concept of a saved session is also important for our discussion. Bascially the session will try to save all the open apps and their states upon logout and restores these saved apps at the next login. There seems to be a protocol(part of XSMP) on this saving behavior, however it doesn’t seem to be well supported among apps, not even within GNOME. Much earlier there was a bug about saved sessions. The previous investigation convinced me that this feature is deprecated and not maintained in the upstream.
Without delving into details about
XSMP, it’s sufficient to show saved
session’s desktop files here:
TODO normal app desktop, shell desktop files
Normally the agent built into the Shell is what users use. As
Shell is the required app (and thus an autostart app) for
it’s handled slightly different from other clients, i.e.
try to restart these apps if they die and end the current session if a required
component fails too often.
Q: Is this really the specialty for required app? Further, I’m puzzled by the line
- Required apps are defined in session key file by searching predefined locations.
- On failure, required apps are restarted automatically. Until a limit is reached, the whole session ends with failure.
- The naming for
GNOME Shellas a component is org.gnome.Shell. However, it’s worth citing the desktop file here:
[Desktop Entry] Type=Application _Name=GNOME Shell _Comment=Window management and application launching Exec=@bindir@/gnome-shell # ... NoDisplay=true X-GNOME-Autostart-Phase=DisplayServer X-GNOME-Provides=panel;windowmanager; X-GNOME-Autostart-Notify=true X-GNOME-AutoRestart=false
A separate, independent polkit agent that can work outside GNOME ecosystem. The
polkit-gnome-authentication-agent-1, installed under
The current distribution still ships an autostart desktop file. The earliest entry in the changelog on this desktop file is “Add as a source a .desktop file to start polkit agent: it was removed from tarball, but we still need it.” Here seems to be the upstream commit that removed this file. Much of the problem is caused by this file and its content is worthy full citation here:
# polkit-gnome-authentication-agent-1.desktop.in [Desktop Entry] Name=PolicyKit Authentication Agent # ... Comment=PolicyKit Authentication Agent # ... Exec=@LIBDIR@/polkit-gnome-authentication-agent-1 Terminal=false Type=Application Categories= NoDisplay=true NotShowIn=KDE; AutostartCondition=GNOME3 unless-session gnome
AutostartCondition doesn’t seem to be well documented, but it means that
this agent should be automatically started on login if the session is of
GNOME3 but not
I was going to say
gnome-fallback, which was discontinued around the time
GNOME 3.10 was released, and thus coincides with the dev&release of
REHL 7 and etc. However, a little search has surprised me that
gnome-fallback was alive as gnome-flashback for quite a while (it’s official)! The history about
flashback is convoluting, and I found this post more or less match what I know. Now I’m starting to wonder whether
flashback will make a comeback! ;P.
Before describing each sequence in details, we explain the symptom as seen in the
bug. The default theming of
shell-agent authentication diaglog is black,
while the diaglog from
polkit-gnome is white. This makes the issue
ostentatious, easily catched by openQA: the
usual black dialogue is not found instead the white dialog is shown.
The normal sequence
SessionManager has internal
phases, which follows a
nearly linear style of state transitions.
A notable exception is during ending session (logout/reboot/shutdown),
GSM_MANAGER_PHASE_QUERY_END_SESSION can transition back to
GSM_MANAGER_PHASE_RUNNING in case of failures. Covered in another post
As shown above
Shell starts at
X-GNOME-Autostart-Phase=DisplayServer), much earlier than the
polkit-gnome was started at
this default phase as it has no explicit phase setting. So usually,
starts and registers itself as agent before
polkit-gnome can do anything.
The autostart condition
GNOME3 unless-session gnome will be
gnome-classic and thus
polkit-gnome will start. However, since there is only
one allowed polkit agent,
polkit-gnome will fail and exit.
What if there is an saved session
saved session situation,
gnome-session will detect the relevant
settings and saved session files under user home directory. It will load saved
apps from saved session files first, before the normal loading of
The latter desktop files would not override earlier desktop files.
gnome-session use the
provides field to decide whether a later loaded desktop file is effective.
org.gnome.Shell, as saved session file, the desktop entry file does NOT
have the autostart phase set to
DisplayServer, actually, not set at all. So
it’s started at
Application phase, together with
polkit-gnome is a much simpler program and might register itself as
shell is still running to get to its
In observation, with a
polkit-gnome almost always win as the
What if Shell died
Shell dies, it gets restarted by
This is not to be confused with the built-in restart as
Alt-F2, r would offter, in which the process itself remains the same. The
timing can be unfortunate for it can coincidences with the starting of
polkit-gnome, and be outrun by it in registering as polkit agent.
In this section, I will show important questions I asked during debugging and some useful tips to prove points mentioned in the Sequence.
polkit-gnome starts under
To prove this point, two approaches can be taken:
- Use Linux Audit.
- Manual Injection. Change the destktop entry file or the executable itself. In our case, capture the stdin/stderr are sufficient.
It’s usually better to use
Audit first and then empoly
Injection to get more
This seems to be the only non-intrusive approach. At the cost of complexity,
Audit is very powerful and capable of much more.
At the time of writing, I’ve read some news on a Dtrace-like functionality for kernel. Looks very interesting and promising, though I’m not knowledged enough, yet, to comment on the usefulness for debugging problems like this post’s.
Make sure all prerequiresites are met (Yelling!) I have been working on another post for more detials. Plainning….
## Precaution: all commands are executed under root prviliages. # setup rules, you can optionally add more filter fields, uid might be a good addition. auditctl -a exit,always -F arch=b64 -S all -F path=/usr/lib/polkit-gnome-authentication-agent-1 # verify the above is the only rule, not necessary but helpful in inspecting the log. auditctl -l # Logout and Login back again, in a new terminal, the -ts since time is the time # *just* before you login. Not necessary but helpful in reducting output size. ausearch -i -ts 16:10
You should see something like:
---- type=UNKNOWN msg=audit(10/27/16 16:03:10.518:106) : proctitle="/usr/lib/polkit-gnome-authentication-agent-1" type=PATH msg=audit(10/27/16 16:03:10.518:106) : item=1 name=/lib64/ld-linux-x86-64.so.2 inode=38532 dev=00:26 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL type=PATH msg=audit(10/27/16 16:03:10.518:106) : item=0 name=/usr/lib/polkit-gnome-authentication-agent-1 inode=160885 dev=00:26 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL type=CWD msg=audit(10/27/16 16:03:10.518:106) : cwd=/home/vagrant type=EXECVE msg=audit(10/27/16 16:03:10.518:106) : argc=1 a0=/usr/lib/polkit-gnome-authentication-agent-1 type=SYSCALL msg=audit(10/27/16 16:03:10.518:106) : arch=x86_64 syscall=execve success=yes exit=0 a0=0x11be100 a1=0x11be0a0 a2=0x11be150 a3=0xfc2c9fc5 items=2 ppid=1726 pid=1968 auid=vagrant uid=vagrant gid=users euid= vagrant suid=vagrant fsuid=vagrant egid=users sgid=users fsgid=users tty=(none) ses=91 comm=polkit-gnome-au exe=/usr/lib/polkit-gnome-authentication-agent-1 key=(null)
From the log we can deduce:
ps -up 1726 shows that it’s
/usr/lib/polkit-gnome-authentication-agent-1 as the sole argument. In other
Linux Auditing is clean and quick, great for verifying unintend execution.
However, it doesn’t capture possible output from processes. For this we would
have to inject extra code into the start sequence. Surely, we can replace
polkit-gnome-authentication-agent-1 with a custom script, which calls the real
executable only with io rediction to some custom files. However, I’d like to
introduce systemd-cat to take full advantage of
And avoids the hassles of managing log files.
systemd-cat can be really useful to debug
gnome-shell by piping JS logs into
journald. With advanced options offered by
journalctl, this way is much sweeter than weeding through log files or terminal output, particularly so for
First change the
Exec line has specific format requirement, in doubt consult the documentation line to something
# /etc/xdg/autostart/polkit-gnome-authentication-agent-1.desktop Exec=systemd-cat -t PK-G /usr/lib/polkit-gnome-authentication-agent-1
journalctl -t PK-G to pick log from
polkit-gnome out specifically.
ausearch, a since-time can be passed with
journctl features far richer time formats as documented in
man systemd.time. You can use relative time setting like
journalctl -S -1min.
ausearch doesn’t seem to support advanced time format. However, we can always use
date to achieve similar result. For example, to get audit logs since 90 sec ago, we can use
ausearch -i -ts $(date +'%H:%M:%S' -d '-300sec').
-- Logs begin at Wed 2016-10-26 18:35:03 CST, end at Thu 2016-10-27 16:36:48 CST. -- Oct 27 16:36:44 linux-rblp.suse PK-G: ** (polkit-gnome-authentication-agent-1:2704): WARNING **: Unable to register authentication agent: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: An authentication age Oct 27 16:36:44 linux-rblp.suse PK-G: Cannot register authentication agent: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: An authentication agent already exists for the given subject
The culprit is the condition line
GNOME3 unless-session gnome. The current
session name referred in this condition can be looked up by introspecting
Since the naming is not the same for this
session-name, there were some
confusion at the beginning, as some believed it refering to a
It turns out this setting is the default if no session type is specified on
gnome-session command line.
gdbus introspect --session --dest org.gnome.SessionManager \ --object-path /org/gnome/SessionManager | grep -i session ## Output> readonly s SessionName = 'gnome-classic' # alternatively you can get to the desired property value directly, though much # more complicated and not recommended. (And you're going to introspect first # anyway ;P) gdbus call --session --dest org.gnome.SessionManager \ --object-path /org/gnome/SessionManager \ --method org.freedesktop.DBus.Properties.Get \ org.gnome.SessionManager SessionName ## Output> (<'gnome-classic'>,)