[Hidden-tech] Text Manipulation Problem

Steven Brewer limako at bierfaristo.com
Sun Oct 22 09:21:54 EDT 2017


I see people have made all the obvious suggestions. Let me add that
NeoOffice can do search and replace with regular expressions.

But folks should also be aware of OpenRefine: It's a tool for taking
messy data sets and cleaning them up. It's perhaps overkill for
something like this, but maybe not: It has a bunch of tools for
identifying classes of problems (like those that crop up with dodgy OCR)
and being able to correct them all at once. It's worth being aware of
anyway.

Good luck!

On 10/21/17 7:34 AM, David Greenberg wrote:
> I have a hard copy list of names, addresses and phone numbers. I can
> scan to PDF and then copy and paste to a text editor (BBEdit) or other
> file. I then need to manipulate the text so that I end up with a csv
> file that can be opened by a spreadsheet program. Tools that I have at
> my disposal include BBEdit (with Grep), a MAMP stack, NeoOffice (Mac
> version of OpenOffice) and FileMaker.
> 
> Input looks like this:
> 
> John Doe
> (413) 111-1111
> 123 First St Greenfield 01301
> Jane Smith
> 456 So Main Ln Greenfield 01301
> Jane Ann Smith
> (413) 222-2222
> 78 Main Ct Greenfield 01301
> 
> Note that all addresses will include 'Greenfield 01301' and, /if /the
> data includes a phone number, it will start with '(413)'.
> 
> Output should look like this:
> 
> John,Doe,(413) 111-1111,123 First St,Greenfield,01301
> Jane,Smith,,456 So Main Ln,Greenfield,01301
> Jane Ann,Smith,(413) 222-2222,78 Main Ct,Greenfield,01301
> 
> Any suggestions greatly appreciated. Thanks.
> 
> David
> 
> 
> _______________________________________________
> Hidden-discuss mailing list - home page: http://www.hidden-tech.net
> Hidden-discuss at lists.hidden-tech.net
> 
> You are receiving this because you are on the Hidden-Tech Discussion list.
> If you would like to change your list preferences, Go to the Members   
> page on the Hidden Tech Web site.
> http://www.hidden-tech.net/members
> 

-- 
Steven D. BREWER <limako at bierfaristo.com>
http://blog.bierfaristo.com/
Cent jarojn silentis kaj subite sin prezentis.


Google

More information about the Hidden-discuss mailing list