Finding text in Windows files – and finding not text

I wanted to sanitize some computers (a.k.a remove all personal information) before sending them to their next home. I did ~not~ want to do a fresh install of an OS. Some the programs on the computer are very useful and configured, so I wanted to leave them on the computer, just bereaved of any personal information or authorizations.

For example, I use FileZilla. I deleted all my accounts, so the new owner could not access them, but only with a deeper scan of the computer did I find that Filezilla kept a “recently used” server list in plaintext file that included my accounts and passwords. Ouch!

After un-installing everything I want to uninstall, and deleting known configuration files (searching all of your computer for files matching the program name is a good start), I wanted to scan the computer to know if any file on the computer has in the file (not in the name of the file) my name, my emails, pieces of my email, my passwords, or any other “HIT” texts. I also needed “not” find functions; for example, find “pepper” but don’t find “peppers”.

These ideas are possible with Linux tool chains such as

 $ egrep -inl -d recurse -e '\b(pepper|secondword)\b' ./*
 -i ignore case
 -n print line number prefix
 -l (lower case L) to just give the filenames
 -d specify skip or recurse for directories
 -v option could be used to invert the match logic if you want something fancier
 -e specifies extended grep search pattern
 ./* we search all files

The regex \b notation forces whole word matches, same as the egrep -w option would do if you used it instead.  The regex () notation lets you specify alternative hits with the | separator.  If you don’t use separators you don’t need the ().  In some situations you may need to include single quotes around the search filename like this ‘./*’   Be patient if you recurse many directories; the command is reading inside of a lot of files.

Linux aside, I needed a solution for Windows XP through Win10. The Windows OS doesn’t make this possible. However, I was totally surprised that my favorite file handling program does have this and much more 🙂 Look at Total Commander. It’s been around since at least Windows 95 and works with everything up to Windows 10.

Go to the Total Commander C)ommands S)earch menu. Ignore the top panel, which searches for filenames. Use the bottom panel, which searches for file content. I found success by checking UTF8 character set and Whole words only with the word “pepper” in the text box. Searching 82 files in my root Documents directory, took 2 seconds to find the 4 proper test-case files I created.

I also tried fancier regex expressions. Specifically, I searched for \b(pepper)\b with the RegEx box checked. The SysInternals Process Explorer program showed Total Commander was using CPU ran at steady 24%. Ouch! Finding the same 4 test files out of 82 files took a surprisingly long 2+06 minutes with regex. One advantage of the regex method is that it can do all my “hit words” at the same time such as \b(pepper|onion|squash)\b. Searching for 3 strings at the same time took 2+31, so not it’s not a linear effect.

When done removing all the files you want, look at the SysInternals utility sdelete to wipe all blank hard drive space.

About Brian

Engineer. Aviator. Educator. Scientist.
This entry was posted in Computers. Bookmark the permalink.

Leave a Reply