Welcome to WFindStr.exe – Alex Weir 2002 – All Rights Reserved – Conditional Shareware – 2002.01.28

 

Overview

 

WFindstr.exe is a text search utility for rapid search of a combination of keywords in .htm (.html), .doc (MS word), .pdf (Adobe Acrobat), .txt (text), or .* (any format)  files on local hard drives and especially on CD Rom drive and/or DVD drive.  It should also work on remote mapped network drives (e.g. D: thru Z:) but I have not checked that.  The intention is that these documents or pages are accessed directly (the same way as by clicking on the file from Windows Explorer) and not through a local web-server or intranet.

 

It is designed for the rapid search and viewing of .htm,.doc, .pdf and .txt files, and is therefore ideal for any text database of for example CV’s (resumes, biodata).  There is a refinement built in which enables you to do a search and then to create automatically a list of the email addresses corresponding to the file matches.  This list is in the format which allows you to copy and paste into the c.c. box in Microsoft Outlook and similar email packages.  The secret is in the naming of the .doc or .htm files – they should have as a name the Email Address, with a .doc or .htm at the end.   Because some people want to have more than one version of their CV, then the processing system in WfindStr.exe will also accept __01 , __shortversion and other types of entry like that in between the email address and the .htm or.doc at the end of the filename.  Note that this convention uses a DOUBLE Underscore (not a single underscore!).  Therefore an email address alexweir@email.com can have a CV filename alexweir@email.com.doc or alexweir@email.com__02.doc or alexweir@email.com__short.htm .   Press the LIST EMAIL button after a SEARCH is done to get this email address listing.  You can then shade, CTRL-C to copy, and paste into Microsoft Outlook or any other email package.  Thus if consultants email their CV as an email attachment using this naming convention, then the Consulting Company can simply drop those cv’s into any directory or into any directory-and-sub-directories system and do rapid searching when required to create short (or long) email address lists, to which job requirements are then emailed out….

 

It works only under Windows (sorry) – all 32-bit flavors I think, i.e. Win95, Win98, Win2000, NT etc etc..

 

Note that this product is Freeware (i.e. free to use) for Charitable Organizations, NGO’s, for my personal friends, for any and all users in the Third World, and for use in Schools, Colleges and Universities globally.  For others I will fix a Shareware Price, which will be very reasonable, but which will depend on the organization.   Typically this will be US$ 5-00 one-off fee for individuals, more for organisations.  I reserve also the right to change any and all of the above – my website as above will hold any news on that.  Interested organizations please email me at alexweir@email.com

 

It utilizes the windows FINDSTR.EXE command to do its work, and shells from a Visual Basic 6 GUI

 

The installation is 3 Megabyte total and is available on download from http://www.anytimenow.com/guest  - enter username <alexweir>, enter password as <password> (yes!) and then click on the <files> link.  There are 3 files to download – WfindS2.cab, WfindS3.cab, and WFIunzip.zip.  Use the <save to hard drive> option (not <run from this location>).  Download all 3 into any directory, unzip the WFIunzip.zip files into the same directory, then double click on the setup.exe file.  If that location changes then my website http://www24.brinkster.com/alexweir/WFindStr  will point you to the current download site.

 

It can search on htm, doc, pdf, txt or .* files,  or on any of the 15 combinations of these 4 types.  You can set the type(s) of files which you are searching for, and store that as a setting which will also become default value when you next start the application.

 

It can search on up to 8 “AND” conditions – so it is quite powerful.  Windows search utility can only really search on 1 condition at a time….  Put each condition in any one of the 8  text boxes – press the HELP button to get an example, then press the SEARCH button to do the search.   To maximize search speed if you are using more than one condition, then put what you expect to be the rarer keywords or key-phrases in the left-hand box or boxes.

 

If you get zero results for a normal search while using 2 or more keywords or key phrases, then you can press the ADV SEARCH button to get a file count for each of the keywords or keyphrases you have entered.  That information will then help you to modify your search so as to get some matches (e.g. by eliminating any keywords which have a zero filecount).  Better still – do a FUZZY SEARCH – this will go down up to 4 levels to find partial matches for the keywords and keyphrases you entered – note that this FUZZY SEARCH can save you literally hours of work re-entering some of the keywords….  You can also enter (in the textbox next to the Fuzzy Search button) the minimum number of matches or hits which is acceptable to you on a fuzzy search.  The default value for that is 1 but you may wish to set the cut-off point at 5 or even 20 or more…  Of course on the result page, the matches closer to the top usually match more keywords – check that out – the explanation on the results page is good…

 

Each condition can be a single word or can be several words (but note that if the htm code containing those several words is not on one line only, then the search will miss the phrase)

 

The time to do each search is detailed above the search results – it is typically 1 second on a CD in a 12x DVD drive searching thru 13 megabytes of htm files (185 files) for up to 8 keywords or keyphrases.  Faster obviously on hard drive.  170 seconds searching through 160 meg in 12,000 files on a CD in a 12x DVD drive with up to 8 keywords.  Check out your own speeds in your own environment.

 

You can link directly to each and every .doc, .htm. .pdf or .txt  file in the search results.  And this linking is done by a normal pop-up browser window, so that you can use the Find On This Page command in Internet Explorer to exactly locate any or all of the keywords you used.

 

You can easily change the drive and directory under which you search.  All searches automatically search subdirectories.  On this version you can only specify one drive and major directory; if there is demand to be able to search multiple drives and directories then this could be built into a future version of this Utility – email me at alexweir@email.com.   You can store the changed drive and directory info so that on restarting the application, that new drive and directory become your default – press the “Store Drive + Directory” Button after changing the drive and/or directory to be searched.

 

The htm code for each and every search result can be copied by pressing the “Copy Result” button – then you can make your own index htm page(s) with several commonly used searches if you wish (a bit like using FAQ’s).  These “pre-searches” also mean that CD’s or DVD’s designed for use on Windows, Mac and Linux can all benefit from WfindStr.

 

If there is a demand I can improve this product to allow 2 or 3 “OR” searches (each with up to 8 AND conditions)  – email me at alexweir@email.com

 

Similarly I could do a proximity search if there is demand – i.e. several keywords on the same or nearby lines.

 

If this product starts getting used on any scale and I find that there are some FAQ’s (frequently asked questions), then I will deal with them on my website – http://www24.brinkster.com/alexweir/WFindStr

 

 

There can be problems searching through files for non-standard ascii characters such as accents, umlauts, graves, acutes, etc as are found in French, German, Spanish, Nordic languages etc..  This product deals with that for HTM files (but not yet for PDF or DOC files) – any and all of the HTM, DOC_HTM_PDF_TXT, HTM_PDF etc options do this conversion automatically for you when searching through .htm files (but the ALL_files option does NOT).  There is a constant in the WfindStr.ini file called AsciConv – by default that is set to value =1 , whereby it will convert for example a umlaut to &#228 before doing the search.  If that feature becomes inconvenient for any reason then manually reset the value to 0 using notepad or some other text editor.  Handling .doc and .pdf files should also be possible if and when there is demand for that – please ask.

 

The searches are not case-sensitive – they do not differentiate between capital and non-capital letters.

 

There is a Beep Facility – the default threshold is 20 seconds – any searches longer than 20 seconds will beep when the search is complete – you can change this value in the text box above the STORE button to anything you wish.  A value of 0 means that beeping never takes place.  This is convenient if any searches take really long – you can minimize the window and do something else useful like make coffee.

 

The HELP button also loads 8 specimen keywords and key-phrases into the 8 text boxes, then press SEARCH button to conduct  search.

 

Note the installation of the package is in French – please just use your intuition if you don’t speak the language.  There is sometimes an IGNORE and then a YES at the end of the install routine.

 

Alex Weir  ,  2002.01.28

 

The Following is an overview/spec with I wrote 15.01.2002 – some of the above re-appears below.

 

 

String Search Utility

 

Need

 

  1. windows find.exe and findstr.exe are fast and efficient, but do not allow searching for multiple words inside a document or htm page.

 

  1. index server is limited to certain platforms (usually more expensive operating systems such as NT or Windows 2000 Enterprise).  And index server is dubious with CD Roms (?)

 

  1. when issuing info on CD rom, then use of a packager like Greenstone Library is useful and allows good and rapid searching, but it encapsulates the contents and makes modular useage and copying off problematic.

 

  1. If a CD rom is issued only in htm format, then searching on multiple words becomes a problem (as in (1) above), although modular useage is simple.

 

  1. Some (many) organizations keep databases of people  and/or CV’s (resumes) – it if often useful to them to be able to do free text search, either as the main search medium, or as a backup search technique (e.g. if a new – non-classified - search word or phrase becomes necessary).  Note that if free text search is the main search medium then there is no need for admin work to –pre-categorise CV’s – they can be dropped into a directory or directories as .doc or .htm files.

 

Present products

 

  1. Altavista used to have a free product called Discover or Discovery (or something like that).  This has now been replaced by expensive chargeable products.

 

  1. Freeware and shareware sites (see http://www24.brinkster.com/alexweir/ShareWareCD  for a listing of some of these sites) have some products which I last reviewed, downloaded and tested August 2001, and found quite far from the requirements as I saw them.

 

  1. Ask Sam is possibly the most famous text search utility outside Microsoft index server

 

 

Specification

 

  1. Something which allows AND and OR command searching

 

  1. the results are presented as a web page with links, from which the docs or htm’s can be immediately accessed

 

  1. that access either by existing browser window or by pop-up window, at least so that the Find On This Page command can be used to locate exactly one or several of the search keywords

 

  1. something which allows up to at least 3 and maybe up to 8 AND combinations of key words

 

  1. operation must be rapid

 

  1. results should also if possible be able to be copied and pasted into other more permanent and/or more useable index pages.

 

  1. the product should be freeware or shareware

 

  1. it is possible that the search on multiple keywords should operate on a proximity basis – e.g. be on a range within +/- 2 lines of each other in the doc or htm.  This should be a selectable / toggleable feature, and the number of lines should also be specifiable.

 

  1. the drive and directory to be searched should be saveable in a config or .ini file, but of course be modifyable (and re-saveable) on each search.

 

  1. possibly up to 3 drive+directory combinations should be searchable on each search

 

  1. one should be able to search on .doc or .htm or (.doc and .htm)

 

  1. possibly additional user-specified formats should be included, such as .txt files etc etc..

 

  1. ideally UNC addresses as well as normal local and/or mapped drives should be searchable

 

  1. possibly a database of word frequencies should be included, so that the fastest searching can take place and so that a zero-return search can be indicated without any search having to take place

 

 

Proposed Methodology

 

  1. Use of the standard windows FINDSTR.EXE command with a Visual Basic GUI wrapper.

 

Potential Users

 

  1. Academics doing self-publishing of teaching and other materials on CD Rom and/or DVD,  computer recruitment agencies,  other recruitment agencies, large companies for personnel records and application letters/cv’s.  Production and marketing companies issuing product CD Roms or DVD’s.  Etc etc etc..

 

Alex Weir

 

Acknowledgements:

 

Martin Parkes (currently on assignment in China), VITA USA (http://www.vita.org) and Michael Loots of Humaninfo  Belgium (http://www.humaninfo.org) have all been instrumental (sometimes without knowing) in convincing me that there is a need for something like WFindStr.  Klaus Stelzl from Munich Germany provided the VB Shelling code – thanks.  Matthias Heuer of Cambridge Technology Partners Frankfurt Germany drew my attention to language-specific issues – thanks.  Verner Jensen of Danagro Copenhagen drew my attention to AltaVista Discover – thanks.  Ian Mitchell suggested I try some fuzzy logic – thanks Ian.  Thanks to http://www.anytimenow.com for running a free download site which seems to meet the requirements of programmers like me – there seem to be a lot of possible File Download sites and free sites out there but most have difficult-to-discover and apparently stupid restrictions which make them unuseable. 

 

PS – I am a commercial programmer, specializing in Visual Basic (VB), VB.Net, Visual Studio Net, Sql Server, Oracle, Access,  etc etc – Client Server and Web Database solutions.  I freelance and work for anyone anywhere – email me if interested, and/or view my Resume at http://www24.brinkster.com/alexweir/resume.  And I am interested in doing other Systems which are useful to Mankind – so little of IT work has any socially beneficial impact – such Systems I am interested to do free of charge – contact me.