Linux: Get a tree-view of your directories

To my surprise there is one thing Windows has, that Linux lacks: A graphical tree-view of a given directory-structure. On Windows it’s as simple as typing “tree”. But as always on Linux, one could grab the necessary building-blocks and combine them to get what we need.

On http://www.centerkey.com/tree/ I found this nice one-liner based on “ls”,”grep” and “sed”:

ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^\/]*\//--/g' -e 's/^/   /' -e 's/-/|/' 

Example:

[apache]$ ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^\/]*\//--/g' -e 's/^/   /' -e 's/-/|/'
   .
   |-descriptor
   |-doc
   |-io
   |---export
   |---import
   |-languages
   |-skins
   |-specials
   |---SMWCheckInstallation
   |-tools
   |---maintenance
   |-----export
   |-----resources
   |-------dd_templates
   |---onto2mwxml
   |---smwadmin

There is another approach here, which I didn’t checked.

Advertisements

Bad filenames after “ZIPing” files from Linux to Windows

I need to copy the whole images-directory of our Mediawiki from a Linux-Box to a Windows-Machine. But no matter if I tar or zip the files, the filenames with non-ASCII-characters (ü,ö,ä,…) are messed up after unziped on Windows. All those non-ASCII-chars are shown as squares or other obscure characters under Windows. Not just that they look ugly this way, the Mediawiki won’t find these files anymore as their names changed referred to the entry in the wiki’s database.

Again it smells like an encoding-problem. How nice it would be, if the whole IT-world would just use unicode.

Our Linux is a RedHat 5 where the command “locale” shows this:

LANG=C
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=

Don’t know what charset this “C” is meant to be. Edit 2012-10-18: “C” seems to mean charset “ANSI-C”. I thought all Linuxes use UTF-8 per default but at least for ours this seems not to be true. We found two solutions:

1.) Use WinSCP to copy all files over from Linux to Windows. This way the filenames get converted to Window’s own charset WIN1252.

2.) Change the charset of the Linux-console explicitly to UTF-8 prior to ZIP the files. Under Bash I do this:

export LANG=de_DE.UTF-8

Afterwards I zip the files using 7zip (in 7z-Format!). When I unzip them under Windows with 7zip too, all is fine. Using the normal ZIP-command to compress under Linux still messed up my filenames.

Linux: Copy only certain filetypes with RSYNC from foldertree

As I struggled again with the fiddly include/exclude-syntax of RSYNC, I think it’s worth a note here.

I want to copy only JPG- and PNG-files with their corresponding folder-structure from a foldertree – leaving all the rest where it is.

To explicitly include or exclude folders and files in an rsync-command I can define patterns that are checked one after another against each object in the given tree until one matches.
The catch is, that the given patterns are not exclusive. That means, if I define “–include=”*.JPG” –include=”*.PNG”” WITHOUT a terminating “–exclude=”*””, my include would have virtually no effect. It would just mean: “Explicitly include all JPG- and PNG-files and implicitly include all the rest too!”.
But here’s another catch: with the above include/exclude-patterns rsync would not descent into my foldertree. Just staying on the first level, as the “–exclude=”*”” checked against each subfolder would skip them. To let rsync run through all folders I have to set “–include=”*/” before the “–exclude=”*””. That means: “Explicitly include all folders!” so that the “–exclude=”*”” is never reached for a folder.

Remember: The include/exclude-patterns are checked in the given order from first to last for a match. When a matching pattern is found, the pattern-processing of the current object is stopped and the next one is checked. If finally no pattern matched, the object IS NOT SKIPPED but copied over to DEST.

So here is my case:

  • I want to copy only all JPG- and PNG-files with parent-folders from SOURCE to DEST
  • Don’t want to copy the content of the folders THUMB, ARCHIVE, DELETED and TEMP

The resulting rsync-command is this:

rsync -r --exclude="thumb/" --exclude="archive/" --exclude="deleted/" --exclude="temp/" --include="*/" --include="*.JPG" --include="*.PNG" --exclude="*" /usr/people/lampp_htdocs/vrwiki/images/ /usr/people/lampp_htdocs/test/vrclone/images/

Let’s analyse this command in detail:

All patterns ending with slash (/) are taken as folders. So I first exclude all folders I don’t want:

--exclude="thumb/" --exclude="archive/" --exclude="deleted/" --exclude="temp/"

Then I include all remaining folders so that rsync goes through my whole tree (the “-r” option needs to be set also):

--include="*/"

Then I explicitly include my desired image-files:

--include="*.JPG" --include="*.PNG"

And finally exclude all the rest that didn’t matched so far:

--exclude="*"

The pattern-search is case-sensitive. To make it case-insensitive and e.g. sync all JPG, jpg, PNG and png files one has to use regular expressions in the “include”:

--include="*.[Jj][Pp][Gg]" --include="*.[Pp][Nn][Gg]"

The man-page-contents for RSYNC can be found here:
http://ss64.com/bash/rsync.html

Linux: get the length of a string

To get the length of a string on a Linux-Shell as number of characters you have various possibilities:

bash> VAR="linux-operating-system"

bash> expr length $VAR
22

bash> echo ${#VAR}
22

bash> echo $VAR | awk '{print length}'
22

bash>

To store the length in a variable you could do this:

bash> LEN=`expr length $VAR`

bash> echo $LEN
22
bash>

bash-scripting: preserve whitespaces in variables

When you want to store a string like

"abc        def    gh ijk"

in a variable on a linux-shell, you’d be normally faced with this:

bash> VAR1="abc        def    gh ijk"
bash> echo $VAR1
abc def gh ijk
bash>

Your whitespaces are trimmed. This is a problem if you need the exact string e.g. for a string-compare.
The cause of this behaviour is the internal shell variable $IFS (Internal Field Separator), that defaults to whitespace, tab and newline.
Thus the variable $VAR1, when passed over to “echo”, is not seen as one single string but as a bunch of strings separated by whitespaces:

bash> VAR1="abc        def    gh ijk"
bash> for a in $VAR1
> do
> echo $a
> done
abc
def
gh
ijk
bash>

To preserve all contiguous whitespaces you have to set the IFS to something different:

bash> IFS='%'
bash> echo $VAR1
abc        def    gh ijk
bash>
bash> for a in $VAR1; do echo $a; done
abc        def    gh ijk
bash>

Afterwards you can switch back to default with “unset IFS”:

bash>unset IFS
bash>echo $VAR1
abc def gh ijk
bash>

Edit 2014-05-21:
Please read Martin’s advice and my reply on it below for another approach.