McWalter.org :: Selective recursive copying


Users familiar with DOS/Windows' XCOPY command often miss that program's ability to recursively copy a directory tree while only copying files of specific types. For example, to duplicate only .htm files one would use:

xcopy /s *.htm foo

Although UNIX's cp command features recursive copying, the way UNIX does filename globbing (i.e. how wildcards like * and ? are handled) makes a simple solution like XCOPY's not work on Unix.

In DOS, wildcards are handled by the commands themselves, and thus XCOPY chooses to perform its * handling in each directory it processes. UNIX, on the other hand, generally does wildcard globbing in the shell, so what a program like cp actually gets as command-line arguments already has the * expanded out to show only the matching files in the current directory (i.e. the root of our proposed copy).

It's downright difficult to do what we want using conventional tools. Using find is the obvious solution, but to copy files needs two uses of the {} expansion, which find doesn't support. You can have find call a shellscript for each directory, or even for each file, and have this script do the copy. A better, if rather weird, solution entails using some of the nice properties of late-model tar implementations (I've tested it with GNU tar, and I can't speak for whether it'll work with other implementations).

The following bash script function (which is inspired by section 18.16 of Power Tools) copies files that end with a dot-something extension recursively from one place to another:


   function copy_files() {
      echo copying $1 files
      tar -cf - `find . -name "*.$1" -print` | ( cd ../dest && tar xBf - )
    }

This copies all of the desired filetype from the current directory (and subdirectories) to the equivalent directory structure with the root ../dest (change this to suit your own environment, or pass it in as a second parameter to the script).

Then to copy files you need only call the function above (from the same bash script):


  copy_files html
  copy_files jpg
  copy_files png

This works by having find scan for the desired files and then pass them through tar (the first one). This tar just collects them and shovels them out through a pipe (that's what the f - part is for) into another tar running in a different directory (../dest in the above case) which just spits it back out, rebuilding the necessary parts of the directory tree as it goes.

Change the pattern in find (from "*.$1") if you want to search for a different kind of filenames.