Logiciel Libre

September 25, 2007

recursively grep only certain files

Filed under: Default — Tags: — adam @ 5:44 pm

Here’s a grep hint. Say you want to recursively search a directory tree for C source code containing a particular header inclusion. One might do:

find . -type f -name '*.c' -print0 \\
  | xargs -0 grep -P -H '^#include\\s+<rpc'

But with grep it is possible to limit the files searched, like so:

grep --include=*.c -P -r '^#include\\s+<rpc' .

Arguments to find, explained:

  • find . means to search the current dir and subdirs
  • -type f limits search to files, not directories or other file types
  • -name '*.c' limits search to files ending in .c. Notice the non-regex syntax here!
  • -print0 sends results to standard output delimited by null characters. This adds robustness when we pipe to xargs, since filenames cannot contain null characters.

And yes, xargs is neat. The only argument used is the -0 to accept null-delimited input. Now on to grep

  • -P specifies that Perl-compatible regular expressions should be used. This didn’t work on Ubuntu last time I tried.
  • -H means to always print filenames, even if only one file is being grepped
  • -r tells grep to descend into all subdirs
  • in the second grep command line, --include='*.c' says to only look inside files ending with the name .c. Notice the alternate pattern syntax here. Ugh!
  • in the second grep command line, the last (required!) argument is the names of the directories in which to recurse.

3 Comments »

  1. what about:

    egrep '^#include +<rpc' */*.c
    egrep '^#include +<rpc' */*/*.c
    egrep '^#include +<rpc' */*/*/*.c
    egrep '^#include +<rpc' */*/*/*/*.c
    

    If you’re in a hurry the command line substitution is pretty quick to add another splat slash to each one until you run out of directory depth.

    egrep has better support for regexes than plain ‘ol grep does.

    find . -type f -name *.c -print0 |xargs -0 egrep '^#include +<rpc'
    

    I don’t think you need the -P if you use egrep or the -H at all. I’m not sure what you’re getting out of the -r as find is passing xargs fully qualified paths to files that you’re searching. You can’t descend to a subdirectory of a file.

    Maybe I’m missing something.
    etc…

    Comment by jason — September 25, 2007 @ 10:52 pm

  2. The first set of egreps you provide are fine if you know the exact directory depth of the source files.

    egrep regex syntax is fine, I’m just more used to Perl-style regexes. The -H is necessary if find only passes one filename to xargs, but you’re right, it would be fine to omit that.

    I agree the grep args -r paired with --include are conceptually the same as using find and piping to xargs and grep, but you’re firing up three processes when only one is necessary. Also, the command line is a little shorter. :)

    Comment by adam — September 26, 2007 @ 8:44 am

  3. While I’m at it, here’s one way to recurse down selected directory paths, searching within files for an arbitrary pattern:

    #!/bin/bash
    find . -path '*.svn*' -prune -o -type f -print0 \\
        | xargs -0 grep --col "$1"
    

    This is of course a bash script wherein the first argument is the pattern to search for within the files. I’ve hardcoded it to skip any files including '.svn' in their full path. I guess I’ve also hardcoded where to start the search (the current directory).

    Comment by adam — September 30, 2007 @ 3:45 pm

RSS feed for comments on this post.

Leave a comment

Powered by WordPress