Finding text Windows files – and finding not text

I wanted to sanitize some computers (a.k.a remove all personal information) before sending them to their next home. I did ~not~ want to do a fresh install of an OS. Some the programs on the computer are very useful and configured, so I wanted to leave them on the computer, just bereaved of any personal information or authorizations.

For example, I use FileZilla. I deleted all my accounts, so the new owner could not access them, but only with a deeper scan of the computer did I find that Filezilla kept a “recently used” server list in plaintext file that included my accounts and passwords. Ouch!

After un-installing everything I want to uninstall, and deleting known configuration files (searching all of your computer for files matching the program name is a good start), I wanted to scan the computer to know if any file on the computer has my name, my emails, pieces of my email, my passwords, or any other “HIT” texts. I also needed “not” find functions; for example, find “pepper” but don’t find “peppers”.

These ideas are possible with Linux tool chains such as

 $ egrep -Rinwl -dskip './*' -e '\b(pepper|secondword)\b'
 -R for complete recursion
 -i ignore case
 -w whole words only; redundant with the extended regex \b option, but I included it in case someone wanted to use only one word such as 'pepper'.
 -l (lower case L) to just give the filenames
 -v option could be used to invert the match logic if you want something fancier than "words only"

Linux aside, I needed a solution for Windows XP through Win10. The Windows OS doesn’t make this possible. However, I was totally suprised that my favorite file handling program does have this and much more 🙂 Look at Total Commander. It’s been around since at least Windows 95 and works with everything up to Windows 10.

Go to the Total Commander C)ommands S)earch menu. Ignore the top panel, which searches for filenames. Use the bottom panel, which searches for file content. I found success by checking UTF8 character set and Whole words only with the word “pepper” in the text box. Searching 82 files in my root Documents directory, took 2 seconds to find the 4 proper test-case files I created.

I also tried fancier regex expressions. Specifically, I searched for \b(pepper)\b with the RegEx box checked. The SysInternals Process Explorer program showed Total Commander was using CPU ran at steady 24%. Ouch! Finding the same 4 test files out of 82 files took a surprisingly long 2+06 minutes. One advantage of the regex method is that it can do all my “hit words” at the same time such as \b(pepper|onion|squash)\b. Searching for 3 strings at the same time took 2+31, so not it’s not a linear effect.

When done removing all the files you want, look at the SysInternals utility sdelete to wipe all blank hard drive space.

About Brian

Engineer. Aviator. Educator. Scientist.
This entry was posted in Computers. Bookmark the permalink.

Leave a Reply