Linux file modes

I thought I had a pretty good grasp of Linux file modes, but today I saw something unexpected in the ls -l output on my work mac. Before we dive into that, let us quickly recap the Linux file permission model.

The basics

The long listing format option (-l) of ls shows inode information for files in a given directory. It also works for individual files. The columns represent:

  • permissions, displayed as ten character codes
    • change with chmod (see below)
  • number of subdirectories in a directory or number of hard links to a file
  • owner
    • change with chown newuser file or chown newuser:newgroup file
  • group owner
    • change with chgrp newgroup file
  • size in bytes (use -h flag for human readable output)
  • last modified timestamp
  • name

Example:

$ docker run -it --rm ubuntu
$ touch testfile
$ mkdir testdir
$ ls -l
total 56
lrwxrwxrwx   1 root root    7 Mar  8 02:05 bin -> usr/bin
drwxr-xr-x   2 root root 4096 Apr 18  2022 boot
drwxr-xr-x   5 root root  360 Mar 16 20:02 dev
drwxr-xr-x   1 root root 4096 Mar 16 20:02 etc
drwxr-xr-x   2 root root 4096 Apr 18  2022 home
lrwxrwxrwx   1 root root    7 Mar  8 02:05 lib -> usr/lib
lrwxrwxrwx   1 root root    9 Mar  8 02:05 lib32 -> usr/lib32
lrwxrwxrwx   1 root root    9 Mar  8 02:05 lib64 -> usr/lib64
lrwxrwxrwx   1 root root   10 Mar  8 02:05 libx32 -> usr/libx32
drwxr-xr-x   2 root root 4096 Mar  8 02:05 media
drwxr-xr-x   2 root root 4096 Mar  8 02:05 mnt
drwxr-xr-x   2 root root 4096 Mar  8 02:05 opt
dr-xr-xr-x 287 root root    0 Mar 16 20:02 proc
drwx------   2 root root 4096 Mar  8 02:08 root
drwxr-xr-x   5 root root 4096 Mar  8 02:08 run
lrwxrwxrwx   1 root root    8 Mar  8 02:05 sbin -> usr/sbin
drwxr-xr-x   2 root root 4096 Mar  8 02:05 srv
dr-xr-xr-x  11 root root    0 Mar 16 20:02 sys
-rw-r--r--   1 root root    0 Mar 16 20:02 testfile
drwxr-xr-x   2 root root 4096 Mar 16 20:02 testfolder
drwxrwxrwt   2 root root 4096 Mar  8 02:08 tmp
drwxr-xr-x  14 root root 4096 Mar  8 02:05 usr
drwxr-xr-x  11 root root 4096 Mar  8 02:08 var

In the remainder of this post we will focus on the first column. The first character represents the file type. Common types include:

  • -: regular file
  • d: directory
  • l: symbolic link
  • p: named pipe (FIFO, unidirectional)
  • s: socket (duplex)
  • c: character device (serial stream of input and output)
  • b: block device (random access)

While in Windows everything is an object, Linux follows the "everything is a file" philosophy. In daily life, only the entries with type - are what we really call files. To show that a directory is also just a file, try running stat /home. It works like it would work on any other file. You can also run ls -ld /home to see the permissions of the directory instead of those of its contents.

Except regular files (-) and directories (d), the example also shows symbolic links (l). For example, /bin is a symbolic link to /usr/bin.

The remaining nine characters of the permission column are divided in three blocks of three characters (a triad). The first triad (with code u) represents the permissions of the user that owns the file. The second triad (g) shows the permissions granted to users within the file group and the last triad (o) shows the permissions of others that are neither the owner nor part of the group.

Each triad can be interpreted in a similar fashion. The first character is the read indicator with common values r if the permission is granted to the corresponding population, or - otherwise. For directories, this indicates whether listing files (ls) is allowed.

The second character indicates presence (w) or absence (-) of write permissions. In the context of directories, write permissions imply being able to add, rename and delete contents. This implies that someone with write permissions on a directory can delete a file within it, even if that person does not have read or write permissions to that file.

Lastly, x or - in third place indicates whether a file is executable. In directories this means being able to read from and write to containing files (if their individual permissions also allow it). The cd command also requires this permission.

For files these three permissions are orhogonal, while for directories granting w without x is possible but pointless. Because add, rename and delete operations require updates to the inode information, they will fail if you do not also have execute permissions.

Instead of using letter codes, each triad can also be represented by three bits. For example rw- can be translated to 110. By using as base instead, the value can be represented with a single octal digit. In this case 110 = 6. The three triads together can be represented by three octal digits. For example 755 can be translated to rwxr-xr-x and vice versa. Similarly, 777 grants all permissions to everyone and 000 grants none. ls -l does not show permissions in octal notation, but stat does.

The chmod command can change the mode of a given file.

$ ls -l testfile
-rw-r--r-- 1 root root 0 Mar 16 20:02 testfile
$ chmod 755 testfile
$ ls -l testfile
-rwxr-xr-x 1 root root 0 Mar 16 20:02 testfile

If you are not a fan of octal math, you can set the permissions for a specific triad with the = syntax. Let's grant read-write-execute permissions to group:

$ ls -l testfile
-rwxr-xr-x 1 root root 0 Mar 16 20:02 testfile
$ chmod g=rwx testfile
$ ls -l testfile
-rwxrwxr-x 1 root root 0 Mar 16 20:02 testfile

Instead of explicitly setting the permissions, you can also selectively update existing permissions. chmod a+rw testfile grants read/write permissions to all (i.e., user, group and other) while chmod a-rw testfile takes them away. You can also prefix the plus or minus sign with u (user), g (group) or o (others) to have more fine-grained control. Example: chmod o-rwx testfile.

umask plays a role in determining which permissions a new file receives by default. Like chmod, it uses octal notation by default. However, unlike chmod it refers to permissions that must be denied rather than granted. While chmod 022 testfile would give result in ----w--w- permissions, a umask of 022 indicates that new files can at most receive rwxr-xr-x. In octal math: .

$ umask
0022
$ umask -S
u=rwx,g=rx,o=rx
$ umask 555
$ umask -S
u=w,g=w,o=w
$ touch testfile2
$ ls -l testfile2
--w--w--w- 1 root root 0 Mar 16 20:10 testfile2

The not-so-basics

Those who paid close attention to the initial example have noticed that something odd is going on with the permissions of /tmp: drwxrwxrwt. Turns out every file contains yet another permission triad with a variety of functions:

  • set user ID (setuid)
    • run binary executable file with privileges of owner
    • no effect on non-binary executables (e.g., scripts)
    • different effect on directories
    • octal code: 4___
  • set group ID (setgid)
    • run binary executable file with privileges of group owner
    • octal code: 2___
  • sticky bit
    • only file owner, directory owner or root can delete files
    • historically used to keep important programs in memory
    • only useful on directories, not files
    • octal code: 1___

Instead of displaying this triad separately as another section in the ls -l output, it is integrated in the other three triads in a confusing manner.

  • when setuid is set
    • __x ___ ___ becomes __s ___ ___
    • __- ___ ___ becomes __S ___ ___
  • when setgid is set
    • ___ __x ___ becomes ___ __s ___
    • ___ __- ___ becomes ___ __S ___
  • when the sticky bit is set
    • ___ ___ __x becomes ___ ___ __t
    • ___ ___ __- becomes ___ ___ __T

This explains our /tmp conundrum. drwxrwxrwt means that the sticky bit is set. In octal notation these settings show up as a fourth, leading digit. For example, the corresponding code for /tmp is 1777.

Another example: ping is owned by root and commonly has setuid active. Its permissions will be shown as rwsr-xr-x or 4755. It requires elevated permissions to perform its job: create a raw ICMP network packet. setuid is an easy way to give non-root users access to this feature. Note that modern Linux distributions often solve this problem differently, by using fine-grained capabilities instead of running a program as full root.

Note that permissions containing capital S are theoretically possible but pointless. This code indicates that the file does not have execution permissions, yet setuid or setgid are set. Both only have an effect on executable binaries, so this is moot.

What I learned today

On macOS, the ls -l output may contain the symbols @ and + in the permissions string. These symbols indicate the presence of extended attributes and access control lists (ACLs), respectively, which offer additional functionalities and granular control beyond traditional Unix permissions.

Extended Attributes: @

Extended attributes are metadata associated with a file or directory, such as Finder information, custom icons, or user-defined tags. The @ symbol signifies that the file or directory has one or more extended attributes.

To list the extended attributes of a file, use xattr -l:

$ ls -l
-rw-r--r--@ 1 user group 231 Apr  1 12:34 file.txt

$ xattr -l file.txt
com.apple.metadata:kMDItemWhereFroms: ...

You can add, modify, or remove extended attributes using the xattr command with the appropriate options:

  • To add or modify an extended attribute: xattr -w attribute_name attribute_value file
  • To remove an extended attribute: xattr -d attribute_name file

Access Control Lists: +

Access control lists (ACLs) provide fine-grained control over file and directory permissions, allowing for more complex permission configurations than traditional Unix permissions. The + symbol indicates that a file or directory has an ACL associated with it.

To display the ACLs of a file or directory, use ls -le:

$ ls -l
-rw-r--r--+ 1 user group 231 Apr  1 12:34 file.txt

$ ls -le file.txt
-rw-r--r--+ 1 user group 231 Apr  1 12:34 file.txt
 0: group:everyone deny delete

ACLs consist of entries that define the allowed or denied permissions for a specific user or group. Each entry has the following components:

  • Access control type: allow or deny
  • Principal (user or group)
  • Permissions: read, write, execute, etc.

To modify the ACLs of a file or directory, use the chmod command with the appropriate options:

  • To add an ACL entry: chmod +a "user/group:permissions" file
  • To remove an ACL entry: chmod -a "user/group:permissions" file

For example, to grant read and write permissions to a specific user:

$ chmod +a "username allow read,write" file.txt

This command should not be confused with the chmod a+rw file instruction we saw earlier.

Conclusion

This blog post covered the basics of Linux file permissions, including the output of ls -l, file types, file modes, group ownership, and the role of chmod, chown, and chgrp. Additionally, we have explored advanced permissions such as setuid, setgid, and the sticky bit. Finally, we discussed the meaning of @ and + symbols in the macOS ls -l output.

References

Docker ARGs

Today I realised that there is more depth to the ARG docker command than I had anticipated. Especially its scope in the context of multi-stage builds is non-trivial. Take the following (highly convoluted) Dockerfile for example:

ARG VERSION=latest

FROM alpine:$VERSION as build
ARG VERSION=3.16
RUN echo $VERSION > image_version

FROM alpine:$VERSION
COPY --from=build /image_version /other_image_version
ARG VERSION
RUN echo $VERSION > image_version
CMD cat other_image_version && cat image_version

The first stage will use alpine:latest as base image. VERSION then gets overwritten to 3.16 and this value is saved to file image_version.

Which alpine tag will the second stage use? Because ARG commands within a stage are local to that stage, the ARG VERSION=3.16 line has no effect on this FROM statement. Instead, the previous value latest will be used.

This second stage retrieves the version file from the first stage and also creates its own image_version file. Which value will be written to this file? We defined an argument VERSION without value in this stage, but that does not mean that the argument will be empty. Rather, this line allows the ARG VERSION=latest statement on line 1 to enter the scope of the second stage. As a consequence, the value latest will be written to the image_version file. If we had skipped the ARG VERSION line, $VERSION would have been undefined and the image_version file would have been empty.

Let's confirm our theory:

$ docker build -t argtest .
[+] Building 1.0s (8/8) FINISHED
 => [internal] load build definition from Dockerfile                        0.0s
 => => transferring dockerfile: 320B                                        0.0s
 => [internal] load .dockerignore                                           0.0s
 => => transferring context: 2B                                             0.0s
 => [internal] load metadata for docker.io/library/alpine:latest            0.0s
 => [build 1/2] FROM docker.io/library/alpine:latest                        0.0s
 => [build 2/2] RUN echo 3.16 > image_version                               0.3s
 => [stage-1 2/3] COPY --from=build /image_version /other_image_version     0.0s
 => [stage-1 3/3] RUN echo latest > image_version                           0.5s
 => exporting to image                                                      0.0s
 => => exporting layers                                                     0.0s
 => => writing image sha256:667654e8ba78edf07a9d64a3fb7576fdfb6a4be421b...  0.0s
 => => naming to docker.io/library/argtest                                  0.0s

$ docker run --rm argtest
3.16
latest

When we manually specify a value for the build argument, it is applied everywhere:

$ docker build -t argtest --build-arg VERSION=edge .
$ docker run --rm argtest
edge
edge

Fun fact

ARGs are not secret. The values passed during docker build can be retrieved from an image with the docker history command. Do not use ARGs to pass sensitive information such as passwords. Use RUN --mount=type=secret instead.

Summary

  • ARGs defined before the first stage
    • can only be used in FROM statements
    • can be imported in the scope of a stage by re-declaring them without value
  • ARGs defined within a stage
    • are scoped to the subsequent lines of that stage
    • shadow any outside ARGs with the same name
  • ARGs are unsuited for secrets

References

Snakes

Any python programmer worth his salt knows that the name of his beloved programming language is derived not from the snake but from Monty Python. That did not stop the company formerly known as Continuum Analytics from naming their python solutions after other snakes. In this post we will take a look at this confusing terminology.

Let's start with the most ambiguous of the bunch: anaconda. The website https://www.anaconda.com would tell you that it is a data science platform. That could mean a lot of things. If we stick with their free tier offering, we can call it a python distribution: a collection of tools that makes it easier to get python along with some packages up and running on your computer.

After a while, this distribution became so popular that in 2017 the company behind it decided to change its name from Continuum Analytics to Anaconda Inc.

Why stop there though? Giving three things the same name is better than just two. That is why Anaconda Inc.'s python package repository is also called anaconda.

All things considered, we should be thankful that they named their package manager conda instead of reusing the anaconda name for the fourth time.

The other related products fortunately all have different - but related - names. Here is a quick rundown:

  • Anaconda Inc.: the company behind most of these tools
  • anaconda: the python distribution
  • anaconda: the package repository
  • miniconda: the minimalistic python distribution
  • conda-forge: the community-driven package repository
  • miniforge: miniconda with conda-forge as single, default channel
  • conda: the package manager
  • condabuild: the package builder
  • mamba: the faster package manager, using conda CLI syntax
  • micromamba: even faster than mamba, but with different syntax
  • boa: the mamba-based package builder
  • mambaforge: miniforge with mamba preinstalled

Fun fact: pythons, boas and mambas are snakes from three completely different families, evolutionary speaking.

Vim tutorial

The other day, I came across this vim tutorial. It offers guides on six different levels:

  • beginner
  • intermediate
  • advanced
  • adept
  • veteran
  • expert

I have been using vim for years, so it is nice to see that there are still new things to learn. I am currently working my way through the adept level.