Since I do a lot of text handling for a number of projects, I'll add a few more comments: 1) *programmed (perl,sed,script or saved regex) vs find & replace.* I find any repeatable method far better then find/replace for a number of reasons. David didn't mention if this is a one time need or repeating, clearly repeating requires more of a saved method. Even with a one time need, F/R has a fatal flaw if you pick a bad pattern or just mis-type, while using a saved technique you can test your method until it's right. A few comments about unexpected variations in data re-enforce this idea. 2) *Create Tab delimited vs CSV *He didn't say which spreadsheet, Excel and most others will accept tab delimited and using tabs does reduce a variety of bumps that extra commas produce. 3) *OpenRefine* Leave it to HT (thanks Steve) to add a tangent idea of use to others. I am working on a variety of text processing tasks, using OCR and various scripts and that looks to be a useful tools. In one case, I have processed some 15 data sources into a common display system for Shaker community members over the 200 years of 17 communities and some 15,000 members. I still am working on a OCR of a 1970 microfilm data set of 16,000 more entries. You can see some of the results at: http://memoirs.shakerpedia.com/ If any Shaker aficionados on HT, any help is welcome on that or http://shakerpedia.com/ in general. 4) If anyone has such conversion/scanning projects for community groups, esp historical society, please contact me. We are now doing some work for ours: Historical society of Greenfield. Good luck to David - Rich On 10/22/2017 6:21 AM, Steven Brewer wrote: > I see people have made all the obvious suggestions. Let me add that > NeoOffice can do search and replace with regular expressions. > > But folks should also be aware of OpenRefine: It's a tool for taking > messy data sets and cleaning them up. It's perhaps overkill for > something like this, but maybe not: It has a bunch of tools for > identifying classes of problems (like those that crop up with dodgy OCR) > and being able to correct them all at once. It's worth being aware of > anyway. > > Good luck! > > On 10/21/17 7:34 AM, David Greenberg wrote: >> I have a hard copy list of names, addresses and phone numbers. I can >> scan to PDF and then copy and paste to a text editor (BBEdit) or other >> file. I then need to manipulate the text so that I end up with a csv >> file that can be opened by a spreadsheet program. Tools that I have at >> my disposal include BBEdit (with Grep), a MAMP stack, NeoOffice (Mac >> version of OpenOffice) and FileMaker. >> >> Input looks like this: >> >> John Doe >> (413) 111-1111 >> 123 First St Greenfield 01301 >> Jane Smith >> 456 So Main Ln Greenfield 01301 >> Jane Ann Smith >> (413) 222-2222 >> 78 Main Ct Greenfield 01301 >> >> Note that all addresses will include 'Greenfield 01301' and, /if /the >> data includes a phone number, it will start with '(413)'. >> >> Output should look like this: >> >> John,Doe,(413) 111-1111,123 First St,Greenfield,01301 >> Jane,Smith,,456 So Main Ln,Greenfield,01301 >> Jane Ann,Smith,(413) 222-2222,78 Main Ct,Greenfield,01301 >> >> Any suggestions greatly appreciated. Thanks. >> >> David >> >> >> _______________________________________________ >> Hidden-discuss mailing list - home page: http://www.hidden-tech.net >> Hidden-discuss at lists.hidden-tech.net >> >> You are receiving this because you are on the Hidden-Tech Discussion list. >> If you would like to change your list preferences, Go to the Members >> page on the Hidden Tech Web site. >> http://www.hidden-tech.net/members >> -- Rich Roth Webmaster/Steering Committee Member Hidden-tech http://www.hidden-tech.net The Talent you need is right here, Join and share your skills ((Sponsored by Thrives Media)) http://www.thrivesmedia.com http://www.welovemuseums.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.hidden-tech.net/pipermail/hidden-discuss/attachments/20171022/7f2bc6cf/attachment.html