Linux: Copy only certain filetypes with RSYNC from foldertree

As I struggled again with the fiddly include/exclude-syntax of RSYNC, I think it’s worth a note here.

I want to copy only JPG- and PNG-files with their corresponding folder-structure from a foldertree – leaving all the rest where it is.

To explicitly include or exclude folders and files in an rsync-command I can define patterns that are checked one after another against each object in the given tree until one matches.
The catch is, that the given patterns are not exclusive. That means, if I define “–include=”*.JPG” –include=”*.PNG”” WITHOUT a terminating “–exclude=”*””, my include would have virtually no effect. It would just mean: “Explicitly include all JPG- and PNG-files and implicitly include all the rest too!”.
But here’s another catch: with the above include/exclude-patterns rsync would not descent into my foldertree. Just staying on the first level, as the “–exclude=”*”” checked against each subfolder would skip them. To let rsync run through all folders I have to set “–include=”*/” before the “–exclude=”*””. That means: “Explicitly include all folders!” so that the “–exclude=”*”” is never reached for a folder.

Remember: The include/exclude-patterns are checked in the given order from first to last for a match. When a matching pattern is found, the pattern-processing of the current object is stopped and the next one is checked. If finally no pattern matched, the object IS NOT SKIPPED but copied over to DEST.

So here is my case:

  • I want to copy only all JPG- and PNG-files with parent-folders from SOURCE to DEST
  • Don’t want to copy the content of the folders THUMB, ARCHIVE, DELETED and TEMP

The resulting rsync-command is this:

rsync -r --exclude="thumb/" --exclude="archive/" --exclude="deleted/" --exclude="temp/" --include="*/" --include="*.JPG" --include="*.PNG" --exclude="*" /usr/people/lampp_htdocs/vrwiki/images/ /usr/people/lampp_htdocs/test/vrclone/images/

Let’s analyse this command in detail:

All patterns ending with slash (/) are taken as folders. So I first exclude all folders I don’t want:

--exclude="thumb/" --exclude="archive/" --exclude="deleted/" --exclude="temp/"

Then I include all remaining folders so that rsync goes through my whole tree (the “-r” option needs to be set also):

--include="*/"

Then I explicitly include my desired image-files:

--include="*.JPG" --include="*.PNG"

And finally exclude all the rest that didn’t matched so far:

--exclude="*"

The pattern-search is case-sensitive. To make it case-insensitive and e.g. sync all JPG, jpg, PNG and png files one has to use regular expressions in the “include”:

--include="*.[Jj][Pp][Gg]" --include="*.[Pp][Nn][Gg]"

The man-page-contents for RSYNC can be found here:
http://ss64.com/bash/rsync.html

Advertisements

2 Responses to Linux: Copy only certain filetypes with RSYNC from foldertree

  1. sam says:

    One big gotcha when trying to sync a dir. If you dir name is XXX and you have
    /path/A/to/XXX

    and

    /path/B/to/XXX

    you do NOT do

    rsync -av –include=’*/’ –include=’*.png’ –exclude=’*’ /path/A/to/XXX /path/B/to/XXX

    Rather you ommit XXX from the destination, so you DO DO

    rsync -av –include=’*/’ –include=’*.png’ –exclude=’*’ /path/A/to/XXX /path/B/to/

  2. Ravi says:

    Thanks for summarizing this, saved quite a bit of time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: