I came across this question on the Stack Exchange site Unix & Linux. The question interested me so I answered it but thought I’d cross post it on my blog as well, given I took a pretty significant amount of time to put together a test case and write-up of how the solution ultimately worked.
Problem
I’m using rsync to copy some files from a share to another.
Recursively, I need to:
- delete files at the destination that are deleted in the origin
– Only sync php and js files
– exclude de rest of file types
– Don’t delete .svn/ directory in the destinationIf I use this:
rsync -zavC --delete --include='*.php' --include='*.js' --exclude="*" /media/datacod/Test/ /home/lucas/Desktop/rsync/
Then
rsync
is not recursive because exclude=”*” excludes all files but also foldersIf I add
--include="*/"
then the.svn/
directory gets deleted (it also gets included)How can I solve this mind blasting dilemma?
Solution
The solution I ultimately came up with made use of a little known feature, at least to me, called filters. Filters allow you to play games with the includes/excludes by protecting portions based on regular expressions. Read on, I’ll discuss them further down.
1 2 | rsync -avzC --filter='-rs_*/.svn*' --include="*/" --include='*.js' --include='*.php' \ --exclude="*" --delete dir1/ dir2/ |
test data
To help determine if my solution was going to work or not I created some sample data so that I could test it out. For starters I wrote a script that would generate the data. Here’s that script, setup_svn_sample.bash
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | #!/bin/bash # setup .svn dirs mkdir -p dir{1,2}/dir{1,2,3,4}/.svn # fake data under .svn mkdir -p dir1/dir{1,2,3,4}/.svn/origdir mkdir -p dir2/dir{1,2,3,4}/.svn/keepdir # files to not sync touch dir1/dir{1,2,3,4}/file{1,2} # files to sync touch dir1/dir{1,2,3,4}/file1.js touch dir1/dir{1,2,3,4}/file1.php |
Running the above script produces the following directories (dir1
& dir2
):
source dir
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | $ tree -a dir1 dir1 |-- dir1 | |-- file1 | |-- file1.js | |-- file1.php | |-- file2 | `-- .svn | `-- origdir |-- dir2 | |-- file1 | |-- file1.js | |-- file1.php | |-- file2 | `-- .svn | `-- origdir |-- dir3 | |-- file1 | |-- file1.js | |-- file1.php | |-- file2 | `-- .svn | `-- origdir `-- dir4 |-- file1 |-- file1.js |-- file1.php |-- file2 `-- .svn `-- origdir |
destination dir
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | $ tree -a dir2 dir2 |-- dir1 | `-- .svn | `-- keepdir |-- dir2 | `-- .svn | `-- keepdir |-- dir3 | `-- .svn | `-- keepdir `-- dir4 `-- .svn `-- keepdir |
Running the above rsync
command which includes the --filter
below we can see that it’s only syncing the files that match the --include
patterns:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | rsync -avzC --filter='-rs_*/.svn*' --include="*/" --include='*.js' --include='*.php' \ --exclude="*" --delete dir1/ dir2/ sending incremental file list dir1/file1.js dir1/file1.php dir2/file1.js dir2/file1.php dir3/file1.js dir3/file1.php dir4/file1.js dir4/file1.php sent 480 bytes received 168 bytes 1296.00 bytes/sec total size is 0 speedup is 0.00 |
Resulting dir2
afterwards:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | $ tree -a dir2 dir2 |-- dir1 | |-- file1.js | |-- file1.php | `-- .svn | `-- keepdir |-- dir2 | |-- file1.js | |-- file1.php | `-- .svn | `-- keepdir |-- dir3 | |-- file1.js | |-- file1.php | `-- .svn | `-- keepdir `-- dir4 |-- file1.js |-- file1.php `-- .svn `-- keepdir |
Why does it work?
The key piece to this script is to make use of the filters capability of rsync
. Filters allow you to remove files from the matched set at various points in the command. So in our case we’re filtering any files that match the pattern */.svn*
. The modifiers -rs_
tell the filter that we want to filter on both the source side as well as the target side.
excerpt from the FILTER NOTES section of rsync’s man page
- An s is used to indicate that the rule applies to the sending side. When a rule affects the sending side, it prevents files from being
transferred. The default is for a rule to affect both sides unless--delete-excluded
was specified, in which case default rules become sender-side only. See also the hide (H) and show (S) rules, which are an alternate way to specify sending-side includes/excludes.- An r is used to indicate that the rule applies to the receiving side. When a rule affects the receiving side, it prevents files from being deleted. See the s modifier for more info. See also the protect (P) and risk ® rules, which are an alternate way to specify receiver-side includes/excludes.
See man rsync for more details.
Tips for figuring this out (hint using --dry-run
)
While describing how to do this I thought I’d mention the --dry-run
switch to rsync
. It’ extremely useful in seeing what will happen without having the rsync
actually take place.
For Example
Using the following command will do a test run and show us the decision logic behind rsync
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | rsync --dry-run -avvzC --filter='-rs_*/.svn*' --include="*/" \ --include='*.js' --include='*.php' --exclude="*" --delete dir1/ dir2/ sending incremental file list [sender] showing directory dir3 because of pattern */ [sender] showing directory dir2 because of pattern */ [sender] showing directory dir4 because of pattern */ [sender] showing directory dir1 because of pattern */ [sender] hiding file dir1/file1 because of pattern * [sender] showing file dir1/file1.js because of pattern *.js [sender] hiding file dir1/file2 because of pattern * [sender] showing file dir1/file1.php because of pattern *.php [sender] hiding directory dir1/.svn because of pattern */.svn* [sender] hiding file dir2/file1 because of pattern * [sender] showing file dir2/file1.js because of pattern *.js [sender] hiding file dir2/file2 because of pattern * [sender] showing file dir2/file1.php because of pattern *.php [sender] hiding directory dir2/.svn because of pattern */.svn* [sender] hiding file dir3/file1 because of pattern * [sender] showing file dir3/file1.js because of pattern *.js [sender] hiding file dir3/file2 because of pattern * [sender] showing file dir3/file1.php because of pattern *.php [sender] hiding directory dir3/.svn because of pattern */.svn* [sender] hiding file dir4/file1 because of pattern * [sender] showing file dir4/file1.js because of pattern *.js [sender] hiding file dir4/file2 because of pattern * [sender] showing file dir4/file1.php because of pattern *.php [sender] hiding directory dir4/.svn because of pattern */.svn* delta-transmission disabled for local transfer or --whole-file [generator] risking directory dir3 because of pattern */ [generator] risking directory dir2 because of pattern */ [generator] risking directory dir4 because of pattern */ [generator] risking directory dir1 because of pattern */ [generator] protecting directory dir1/.svn because of pattern */.svn* dir1/file1.js dir1/file1.php [generator] protecting directory dir2/.svn because of pattern */.svn* dir2/file1.js dir2/file1.php [generator] protecting directory dir3/.svn because of pattern */.svn* dir3/file1.js dir3/file1.php [generator] protecting directory dir4/.svn because of pattern */.svn* dir4/file1.js dir4/file1.php total: matches=0 hash_hits=0 false_alarms=0 data=0 sent 231 bytes received 55 bytes 572.00 bytes/sec total size is 0 speedup is 0.00 (DRY RUN) |
In the above output you can see that the ./svn
directories are being protected by our filter rule. Valuable insight for debugging the rsync
.
References
- Delete extraneous files from dest dir via rsync?
– Above scripts in a tarball