Everything you ever wanted to know about file links


What are "links"?

Links are files that are "pointers" to other files or directories. For example, say you have a 1 gig file. You could create one or more links to that file, and the links would take up very little space (only a few bytes). The links can have different names than what you are linking to, and can be in different directories.

There are two types of links - "hard" links and "symbolic" links (also know as "symlinks"). I'll explain more below.


The Gory Details - How are hard links different from symbolic links?

In linux / unix, a file actually has several parts. There's a "filename", an "inode", and the actual data. The "filename" part is exact what it sounds like. However, this part doesn't JUST store the filename - it also has an "inode number". This inode number is matched up to the inode. The inode contains the owner, permissions, creation/last access times, etc, as well as pointers to where the actual file data is stored on the disk.

normal file picture

A symlink has it's own inode and inode number, which is different than the file that it's linking to. In the data portion, it stores the path and filename of the target file it's linking to. You can remove the target file, and the symlink will still exist (although now it will point to a file that doesn't exist - so you will get errors if you try to access the data in the file). Conversely, you can delete the symlink and it won't affect the target file. I'll show examples of this in a bit.

softlink picture

A hard link has the SAME inode number as the file that it's linking to. Basically it only has the "filename" part.

hardlink picture

Directories are similar to files (you can create symlinks to directories), except that they also store the inodes of all the files contained in that directory, as well as the inode of it's parent directory. If you ever have to run the "fsck" utility to check a corrupted filesystem, it's basicly walking through the directory inodes and matching them up with the files. Files that don't match are put in the "lost+found" directory.


Why are symlinks useful?

You can use them for version control - IE: I have activemq version 5.4.2. By changing the symlink to point to the 5.9.0 version, I don't have to worry about changing the /etc/rc script to point to a specific version, changing directory paths for config files, etc. Rollback is easy - just delete the symlink and create a new symlink pointing to the original version. Sure, you could just use the "mv" command to rename the directories, but with symlinks it's crystal clear what version is the "active" version.

You can also use symlinks to move or "virtualize" directories or filesystems. For example, mysql has the following line telling it where it should look for database:

innodb_data_home_dir = /var/lib/mysql/data

You might have a few databases in this directory, and you want to put ONE of the databases in another directory (one that's on fast SSD storage, perhaps). No problem - you can move the datafiles for that particular database to a new directory, then create a symlink in the /var/lib/mysql/data directory to point to the new location. Nice!

Symlinks are used pretty extensively by RedHat/CentOS to control what services start at which runlevels (have a look at /etc/rc.d/rc3.d (for runlevel 3) and see how the services that are "on" are symlinks to /etc/rc.d/init.d.

I've heard of people symlinking to syncronize their firefox/chrome bookmarks to dropbox, so that every machine you log into has the same preferences. You can get pretty creative...

Why are hardlinks useful?

Often used for file-level deduplication - backuppc, Apple OS X time machine, rsnapshot, BURP backup, etc all use hardlinks to save space and present a "point in time" view of the filesystem without having a full copy of each file for each snapshot / backup.

Some unix based systems (like some versions of Solaris) use hardlinks for their /etc/rc scripts (as described in the symlink section above).

NOTE: Symlinks can work across different partitions & disks. Hardlinks MUST be on the same partition/filesystem (since they use the same inode numbers). If you try to create a hardlink between different partitions/filesystems, you'll get an error:

ln /foo.txt /boot/mylink.txt
ln: creating hard link `/boot/mylink.txt' => `/foo.txt': Invalid cross-device link

So how do you create them?

symlinks are very easy to create, you just use the "ln" command. By default, the "ln" command will create a hard link. You can use "ln -s" to create a symlink. Always specify the original file first, and the name of the link as the second argument (the original programmers were VERY effecient - if you typo the original filename, they don't bother processing the second argument, at least that's how I try and remember the order).

As an example, create a small text file:

echo "This is a test" > myfile.txt
 
ls -l
-rw-r--r-- 2 root root 15 Feb 14 12:01 myfile.txt

make a hardlink (the default):

ln myfile.txt myhardlink1.txt
 
ls -li
266714 -rw-r--r-- 3 root root 15 Feb 14 12:01 myfile.txt
266714 -rw-r--r-- 3 root root 15 Feb 14 12:01 myhardlink1.txt
(you can see that they have the same inode number: 266714)

make a symlink (use the -s option):

ln -s myfile.txt mysoftlink1.txt

You can see that the symlink has a different inode number from the original file (266716 versus 266714). Also note the "l" in the permissions (lrwxrwxrwx) and the "->"

ls -li
266714 -rw-r--r-- 3 root root 15 Feb 14 12:01 myfile.txt
266714 -rw-r--r-- 3 root root 15 Feb 14 12:01 myhardlink1.txt
266716 lrwxrwxrwx 1 root root 10 Feb 14 12:39 mysoftlink1.txt -> myfile.txt

You can "cat" any of these three files and you'll get the contents. However, only ONE copy of the contents of the file exist on disk - handy if the file is large!

If you remove the file that the symlink points to, the symlink still exists. However, you won't be able to get at the data (basically it's a pointer to nowhere):

rm myfile.txt
ls -li
 
266714 -rw-r--r-- 2 root root 15 Feb 14 12:01 myhardlink1.txt
266716 lrwxrwxrwx 1 root root 10 Feb 14 12:39 mysoftlink1.txt -> myfile.txt
 
cat mysoftlink1.txt
cat: mysoftlink1.txt: No such file or directory

However, the hardlink still works:

cat myhardlink1.txt
This is a test

Things to note:

symbolic link uses permissions from the actual file, not the symlink:

 
[activemq@linuxbox opt]$ ls -l
total 8
lrwxrwxrwx 1 root     root       21 Feb 14 11:09 apache-activemq -> apache-activemq-5.4.2
drwx------ 9 activemq activemq 4096 Nov 26  2010 apache-activemq-5.4.2
drwxr-xr-x 2 activemq activemq 4096 Feb 14 11:11 apache-activemq-5.9.0
 
 
[activemq@linuxbox opt]$ cd apache-activemq
[activemq@linuxbox apache-activemq]$

NOTE: you can change the ownership of the actual symlink (IE: to prevent someone from changing it to point somewhere you don't want it to point to), but you have to use "chown -h":

[activemq@linuxbox opt]$ ls -l
total 8
lrwxrwxrwx 1 root     root       21 Feb 14 11:09 apache-activemq -> apache-activemq-5.4.2
 
[activemq@linuxbox apache-activemq]$ rm apache-activemq
rm: cannot remove `apache-activemq': No such file or directory

regular "chown" silently fails:

[root@linuxbox opt]# chown activemq apache-activemq
[root@linuxbox opt]# ls -al
total 16
drwxr-xr-x.  4 root     root     4096 Feb 14 11:11 .
dr-xr-xr-x. 26 root     root     4096 Jan 15 16:58 ..
lrwxrwxrwx   1 root     root       21 Feb 14 11:09 apache-activemq -> apache-activemq-5.4.2
 
 
[root@linuxbox opt]# chown -h activemq apache-activemq
[root@linuxbox opt]# ls -l
total 8
lrwxrwxrwx 1 activemq root       21 Feb 14 11:09 apache-activemq -> apache-activemq-5.4.2

Tips and tricks:

To find the inode number of a file, use "ls -i":

[root@linuxbox test]# ls -i myfile.txt
266714 myfile.txt

You can see how many hard links a file has - the "3" in the output below tells us that there are 3 hardlinks for this file:

[root@linuxbox test]# ls -l hlink1.txt
-rw-r--r-- 3 root root 15 Feb 14 12:01 hlink1.txt

find all copies of a hard linked file (we get the inode number ("266714") from the "ls -i" above:

[root@linuxbox /]# find / -xdev -inum 266714 -print
/data01/foobar/hlink2.txt
/test/myfile.txt
/test/hlink1.txt

The "-xdev" argument of the find command prevents the find from looking in other mount points - remember, hardlinks can only exist on the same filesystem/mountpoint, otherwise there is no guarantee that the inode number would be unique.

There IS a limit to how many links you can create on a system (basically limited by the number of inodes). You can use "df -i" to find the maximum and current number of inodes in a filesystem:

[root@linuxbox test]# df -i /
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/mapper/vg_linuxbox-lv_root
                      758880   97856  661024   13% /

AFAIK, you cannot increase the number of inodes on an existing ext2/ext3/ext4 filesystem, but you can change the number when you initially create the filesystem with the "-N" option in mkfs.ext4. The defaults wor well for most uses, but if you're going to make a lot of links (IE: using backuppc, BURP backup, rsnapshot) plan accordingly.