Set of seven pruning tools
,

Filter BASH find results using -prune and grep Linux + WSL

The search indexing service that runs in the background on the Windows operating system is nice, but it doesn’t really include all of the stuff you probably want if you’re a web developer. Especially if you’re running WSL on windows and the files you want to index are located in the virtual linux filesystem (e.g. \\wsl.localhost\Debian\var\www\html ). It is possible to use a 3rd party search index software, such as Everything by Void Tools to index locations like the WSL filesystem mentioned above.

What if you’re not working on a Windows operating system, and you don’t have the advantage of a Search indexer like Everything? Sure there are similar software apps for Linux. When you find one that functions as seamlessly, and “correctly” as Everything by Void for Windows, then please drop me a line! Meanwhile, I recommend you learn to use the find from the BASH shell to locate what you need.

Find is fast and produces precisely the results you want, if you know how to use it correctly. In my experience, the best way to learn it is just that– through experience. You have to keep trying your own expressions to learn the peculiarities, or you’ll forever be trying to find someone else’s prefab find script to locate what you need, while someone else’s script may not truly be what you’re searching for specifically.

Why Prune?

Think of the old lady across the street who you’ll see in the Spring time, out there pruning her flowers. What’s she doing in that pruning!? Crazy old bat! Naw, she’s just taking off what she doesn’t want there! She’s making her results more beautiful by cutting away the parts that aren’t needed. That’s pruning!

In the case of the find command, it’s the same thing. You have to tell it what you want to prune from the results. It’s not super easy to figure out how to do it. You might think, “prune xyz from my path”, but that’s actually not it. It’s more like, “take this [path] and -prune it”. In other words, think like the old lady: decide and identify what you want to have pruned, then say: “okay, it’s [this path] that i don’t like! Now prune it!” That is, talk about paths first, then talk about what you want to have done with it.

Try the following in your web dev primary folder. It doesn’t need to be the same path as mine. You’ll figure it out. For now, let’s imagine your web dev containers are on the path /var/www/html . That being agreed upon and understood, let’s go for it!

Pruning Test – Find that Kind Bud, yo!

# find a file named "notes.css". but first PRUNE WordPress (exclude, like the old lady with the flowers) native directories which are -type directory. -o (OR) printf (print Formatted results, including %Time), then | (pipe) those results to the sort command. 

find . -name 'wp-*' -type d -prune -o -type f -iname notes.css -printf "%T@ %Tc %p\n" | sort -n

The find operation above is pretty simple. We’re telling find to prune folders named “wp-*”, where the asterisk is a simple wildcard so all folders named wp-something are pruned. Then -printf (print formatted) is used to get the timestamp info we need if we want to sort by date modified (which is what I wanted there). finally those results are piped (using the | pipe character) to the sort command. Try it!

The following operation is a bit more complex in that it’s using multiple prune expressions. Then grep (well, egrep because I prefer the extended regular expressions option, though it’s not utilized here) to dig into the results and see if there’s anything matching my grep test, which is simple: find .html or .js in the contents of the text files found. Try this one if you’re developing with Node.js! Play around with changing what you want to find in the grep expression. Try changing some of the prune paths. See what you come up with.

# LOOK IN FOLDERS at PATH .
            # PRUNE:
                # node_modules and is -type d(irectory)
                # public
                # assets
            # GREP files containing \.html|\.js
                 
find . \( -path "./node_modules*" -type d -prune -o -path "./public" -prune -o -path "./assets" -prune \) -o -type f -exec egrep -nH --color "\.html|\.js" {} +

Let’s modify the find expression above once more, to exclude more stuff and enhance our results using grep –color! We’ll start by pruning more stuff.

# -PATH probably contains something about a log. PRUNE IT!
# prune a bunch of stuff... including 
# prune *vendor* to exclude Composer installations
# prune *wso.net* under which I have backup sites from that host
# prune *node_modues* to exclude npm installations
# prune *wp-* to exclude wordpress stuff
# prune *.sql" to exclude database backup files

find . -path "./*log" -prune 
	-o 
	-path "./*vendor/*" -prune 
	-o 
	-path "./*wso.net*" -prune 
	-o 
	-path "./*node_modules*" -prune 
	-o 
	-path "./*wp-*" -prune 
	-o 
	-path "*.sql" -type f -prune 
#
# FINISHED PRUNING
#
# now lets -exec (execute) grep using the --color option to highlight our found results!

-o -type f -exec egrep -nH --color "\.html|\.js" {} +

#
# DONE: FINISHED EXPLAINING - NOW THE REAL DEAL... 
#

# DO HERE: What follows are the above commands, as a single line 
# (without new 
# line breaks which could 
# break for executing 
# the separate lines)
#
# Try it as follows, in one usable line to copy paste into your BASH shell:

find . -path "./*log" -prune -o -path "./*vendor/*" -prune -o -path "./*wso.net*" -prune -o -path "./*node_modules*" -prune -o -path "./*wp-*" -prune -o -path "*.sql" -type f -prune -o -type f -exec egrep -nH --color "\.html|\.js" {} +

Now it’s time for you to go off and try some things on your own. Work with it until you feel like you have a solid foundation for how to use prune, and -exec in the find expression, along with how to | pipe the results to another command, like sort is used above. You got this! Please do come back and provide a comment here if you found it useful, or likewise if you find it useless!

Small List of Examples to Try

Some of the examples produce the same results. Can you tell why?

What is the purpose of using $(realpath *)?

# SORT BY MOST RECENTLY MODIFIED
find . -name 'wp-*' -type d -prune -o -type f -iname notes.css -printf "%T@ %Tc %p\n" | sort -n

# FIND BUT -prune (excludee) paths beneath 'node_modules' (packages installed by NPM)
find . -name 'node_modules' -type d -prune -o -type f -exec grep ".html" {} +

#FIND ONLY \.html in folder at path .
#    prune (exclude) node_modules and adminer.php
find . -name 'node_modules' -type d -prune -o -type f -name 'adminer.php' -prune -o -type f -exec grep -nH --color "\.html" {} +

 

# FIND IN FOLDERS at PATH .
#            PRUNE:
#                node_modules
#                public_
#                assets
#            GREP files containing \.html|\.js
                 
find . \( -path "./node_modules" -prune -o -path "./public" -prune -o -path "./assets" -prune \)           -o -type f -exec egrep -nH --color "\.html|\.js" $(realpath *) {} +



# FIND - Using -exec
# Use grep to show instances of 'html' in -type f (files) 
# tell grep to display the line number (-n), 
# display 2 lines of --context around the located 'html'
find . -type f -exec grep -n --context=2 --color html $(realpath *) {} +

# xARGS + FIND
# Use xARGS to pass FIND results to external app, as arguments 
#
# EXAMPLE:
# CD to nginx sites-available direcotry#
# Find all items of -type file
#
# pass the aguments (the -type file found) to grep 
# display the line containing 'server' 
# Use realpath to output the FULL PATH to the item discovered
cd /etc/nginx/sites-available
find . -type f | xargs grep -n --color server $(realpath *)

Cheers, y’all!

†As an advocate of Everything for Windows, I highly recommend FSearch for Linux. It’s not quite as robust as Everything (e.g. no option to preview images at time of writing), but it has saved me from losing my mind a handful of times since I’ve discovered it. I’m pretty sure I installed it using apt on Debian. Don’t let it dissuade you from learning how to use Find correctly!


Leave a Reply

Your email address will not be published. Required fields are marked *