Code-Reading: The Racing Of Two Agents
From a bug due to a race condition between two polkit agents, this post explain
the handling of the startup components and autostart apps within gnome-session
.
Components
GNOME Session
Other than the X server, gnome-session
In SP2, the comm
is actually called gnome-session-binary
which is exec
-ed from a shell script called gnome-session
. This indirection is meant to solve a environment passing bug. A very interesting problem worth a separate post. is the daemon managing
all the interesting stuff within a session. For our purpose, the startup
sequence is of most interest to us.
There is a Code-Reading(PLANNED) post covering the end session part as well. Other
components talk back to gnome-session
through the exported DBus service
org.gnome.SessionManager
. The term SessionManager
is used interchangable
with gnome-session
.
gnome-session
does have some documentations oneline. However, they all seem to
be outdated (compared the info I got from the code.). Nevertheless, they still
provide some insights and you can find the longer version
and the newer, shorter version here.
Saved Session
The concept of a saved session is also important for our discussion. Bascially the session will try to save all the open apps and their states upon logout and restores these saved apps at the next login. There seems to be a protocol(part of XSMP) on this saving behavior, however it doesn’t seem to be well supported among apps, not even within GNOME. Much earlier there was a bug about saved sessions. The previous investigation convinced me that this feature is deprecated and not maintained in the upstream.
Without delving into details about XSMP
, it’s sufficient to show saved
session’s desktop files here:
TODO normal app desktop, shell desktop files
GNOME Shell
Normally the agent built into the Shell is what users use. As
Shell is the required app (and thus an autostart app) for gnome-session
,
it’s handled slightly different from other clients, i.e. gnome-session
will
try to restart these apps if they die and end the current session if a required
component fails too often.
Q: Is this really the specialty for required app? Further, I’m puzzled by the line X-GNOME-AutoRestart=false
.
- Required apps are defined in session key file by searching predefined locations.
- On failure, required apps are restarted automatically. Until a limit is reached, the whole session ends with failure.
- The naming for
GNOME Shell
as a component is org.gnome.Shell. However, it’s worth citing the desktop file here:
[Desktop Entry]
Type=Application
_Name=GNOME Shell
_Comment=Window management and application launching
Exec=@bindir@/gnome-shell
# ...
NoDisplay=true
X-GNOME-Autostart-Phase=DisplayServer
X-GNOME-Provides=panel;windowmanager;
X-GNOME-Autostart-Notify=true
X-GNOME-AutoRestart=false
polkit-gnome
A separate, independent polkit agent that can work outside GNOME ecosystem. The
executable, named polkit-gnome-authentication-agent-1
, installed under
@LIBDIR
(usually /usr/share/lib/
).
The current distribution still ships an autostart desktop file. The earliest entry in the changelog on this desktop file is “Add as a source a .desktop file to start polkit agent: it was removed from tarball, but we still need it.” Here seems to be the upstream commit that removed this file. Much of the problem is caused by this file and its content is worthy full citation here:
# polkit-gnome-authentication-agent-1.desktop.in
[Desktop Entry]
Name=PolicyKit Authentication Agent
# ...
Comment=PolicyKit Authentication Agent
# ...
Exec=@LIBDIR@/polkit-gnome-authentication-agent-1
Terminal=false
Type=Application
Categories=
NoDisplay=true
NotShowIn=KDE;
AutostartCondition=GNOME3 unless-session gnome
The AutostartCondition
doesn’t seem to be well documented, but it means that
this agent should be automatically started on login if the session is of
‘GNOME3
but not gnome
‘
I was going to say gnome-fallback
, which was discontinued around the time GNOME
3.10 was released, and thus coincides with the dev&release of SLE-12
, REHL 7
and etc. However, a little search has surprised me that gnome-fallback
was alive as gnome-flashback for quite a while (it’s official)! The history about GNOME 3
, fallback
, classic
and flashback
is convoluting, and I found this post more or less match what I know. Now I’m starting to wonder whether flashback
will make a comeback! ;P.
Sequence
Before describing each sequence in details, we explain the symptom as seen in the
bug. The default theming of shell-agent
authentication diaglog is black,
while the diaglog from polkit-gnome
is white. This makes the issue
ostentatious, easily catched by openQA: the
usual black dialogue is not found instead the white dialog is shown.
The normal sequence
SessionManager
has internal phases
, which follows a
nearly linear style of state transitions.
A notable exception is during ending session (logout/reboot/shutdown),GSM_MANAGER_PHASE_QUERY_END_SESSION
can transition back to GSM_MANAGER_PHASE_RUNNING
in case of failures. Covered in another post
As shown above Shell
starts at DisplayServer
phase,
(X-GNOME-Autostart-Phase=DisplayServer
), much earlier than the
default Application
phase. polkit-gnome
was started at
this default phase as it has no explicit phase setting. So usually, Shell
starts and registers itself as agent before polkit-gnome
can do anything.
The autostart condition GNOME3 unless-session gnome
will be true
under
gnome-classic
and thus polkit-gnome
will start. However, since there is only
one allowed polkit agent, polkit-gnome
will fail and exit.
What if there is an saved session
In a saved session
situation, gnome-session
will detect the relevant
settings and saved session files under user home directory. It will load saved
apps
from saved session files first, before the normal loading of
required
and autostart
apps.
The latter desktop files would not override earlier desktop files. gnome-session
use the provides
field to decide whether a later loaded desktop file is effective.
org.gnome.Shell
, as saved session file, the desktop entry file does NOT
have the autostart phase set to DisplayServer
, actually, not set at all. So
it’s started at Application
phase, together with polkit-gnome
.
polkit-gnome
is a much simpler program and might register itself as polkit
agent
, while shell
is still running to get to its JS
part.
In observation, with a saved session
, polkit-gnome
almost always win as the
real agent.
What if Shell died
When Shell
dies, it gets restarted by gnome-session
.
This is not to be confused with the built-in restart as Alt-F2, r
would offter, in which the process itself remains the same. The
timing can be unfortunate for it can coincidences with the starting of
polkit-gnome
, and be outrun by it in registering as polkit agent.
Debugging
In this section, I will show important questions I asked during debugging and some useful tips to prove points mentioned in the Sequence.
polkit-gnome
starts under gnome-classic
To prove this point, two approaches can be taken:
- Use Linux Audit.
- Manual Injection. Change the destktop entry file or the executable itself. In our case, capture the stdin/stderr are sufficient.
It’s usually better to use Audit
first and then empoly Injection
to get more
details.
Linux Auditing
This seems to be the only non-intrusive approach. At the cost of complexity,
Audit
is very powerful and capable of much more.
At the time of writing, I’ve read some news on a Dtrace-like functionality for kernel. Looks very interesting and promising, though I’m not knowledged enough, yet, to comment on the usefulness for debugging problems like this post’s.
Make sure all prerequiresites are met (Yelling!) I have been working on another post for more detials. Plainning….
## Precaution: all commands are executed under root prviliages.
# setup rules, you can optionally add more filter fields, uid might be a good addition.
auditctl -a exit,always -F arch=b64 -S all -F path=/usr/lib/polkit-gnome-authentication-agent-1
# verify the above is the only rule, not necessary but helpful in inspecting the log.
auditctl -l
# Logout and Login back again, in a new terminal, the -ts since time is the time
# *just* before you login. Not necessary but helpful in reducting output size.
ausearch -i -ts 16:10
You should see something like:
----
type=UNKNOWN[1327] msg=audit(10/27/16 16:03:10.518:106) : proctitle="/usr/lib/polkit-gnome-authentication-agent-1"
type=PATH msg=audit(10/27/16 16:03:10.518:106) : item=1 name=/lib64/ld-linux-x86-64.so.2 inode=38532 dev=00:26 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL
type=PATH msg=audit(10/27/16 16:03:10.518:106) : item=0 name=/usr/lib/polkit-gnome-authentication-agent-1 inode=160885 dev=00:26 mode=file,755 ouid=root ogid=root rdev=00:00 nametype=NORMAL
type=CWD msg=audit(10/27/16 16:03:10.518:106) : cwd=/home/vagrant
type=EXECVE msg=audit(10/27/16 16:03:10.518:106) : argc=1 a0=/usr/lib/polkit-gnome-authentication-agent-1
type=SYSCALL msg=audit(10/27/16 16:03:10.518:106) : arch=x86_64 syscall=execve success=yes exit=0 a0=0x11be100 a1=0x11be0a0 a2=0x11be150 a3=0xfc2c9fc5 items=2 ppid=1726 pid=1968 auid=vagrant uid=vagrant gid=users euid=
vagrant suid=vagrant fsuid=vagrant egid=users sgid=users fsgid=users tty=(none) ses=91 comm=polkit-gnome-au exe=/usr/lib/polkit-gnome-authentication-agent-1 key=(null)
From the log we can deduce: ppid=1726
(ps -up 1726
shows that it’s
gnome-session-binary
) calls execve
with
/usr/lib/polkit-gnome-authentication-agent-1
as the sole argument. In other
words, gnome-session
started polkit-gnome
.
Manual Injection
Linux Auditing
is clean and quick, great for verifying unintend execution.
However, it doesn’t capture possible output from processes. For this we would
have to inject extra code into the start sequence. Surely, we can replace
polkit-gnome-authentication-agent-1
with a custom script, which calls the real
executable only with io rediction to some custom files. However, I’d like to
introduce systemd-cat to take full advantage of journald
.
And avoids the hassles of managing log files. systemd-cat
can be really useful to debug gnome-shell
by piping JS logs into journald
. With advanced options offered by journalctl
, this way is much sweeter than weeding through log files or terminal output, particularly so for gdm
shell.
First change the Exec
Exec
line has specific format requirement, in doubt consult the documentation line to something
like:
# /etc/xdg/autostart/polkit-gnome-authentication-agent-1.desktop
Exec=systemd-cat -t PK-G /usr/lib/polkit-gnome-authentication-agent-1
Then use journalctl -t PK-G
to pick log from polkit-gnome
out specifically.
As with ausearch
, a since-time can be passed with -S,--since=
. Notably journctl
features far richer time formats as documented in man systemd.time
. You can use relative time setting like journalctl -S -1min
. ausearch
doesn’t seem to support advanced time format. However, we can always use date
to achieve similar result. For example, to get audit logs since 90 sec ago, we can use ausearch -i -ts $(date +'%H:%M:%S' -d '-300sec')
.
-- Logs begin at Wed 2016-10-26 18:35:03 CST, end at Thu 2016-10-27 16:36:48 CST. --
Oct 27 16:36:44 linux-rblp.suse PK-G[2704]: ** (polkit-gnome-authentication-agent-1:2704): WARNING **: Unable to register authentication agent: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: An authentication age
Oct 27 16:36:44 linux-rblp.suse PK-G[2704]: Cannot register authentication agent: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: An authentication agent already exists for the given subject
Why polkit-gnome
started?
The culprit is the condition line GNOME3 unless-session gnome
. The current
session name referred in this condition can be looked up by introspecting
org.gnome.SessionManager
object.
Since the naming is not the same for this session-name
, there were some
confusion at the beginning, as some believed it refering to a dconf
setting.
It turns out this setting is the default if no session type is specified on
the gnome-session
command line.
gdbus introspect --session --dest org.gnome.SessionManager \
--object-path /org/gnome/SessionManager | grep -i session
## Output> readonly s SessionName = 'gnome-classic'
# alternatively you can get to the desired property value directly, though much
# more complicated and not recommended. (And you're going to introspect first
# anyway ;P)
gdbus call --session --dest org.gnome.SessionManager \
--object-path /org/gnome/SessionManager \
--method org.freedesktop.DBus.Properties.Get \
org.gnome.SessionManager SessionName
## Output> (<'gnome-classic'>,)
∎