Regarding tabs-vs-commas: it's a real tragedy that more programs don't make use of any of the *four* ASCII delimiter characters (https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text) that have been available since ASCII-1965. The whole world of character-escaping problems that we programmers deal with in order to support CSV/TSV could have been avoided! Eli On 22 Oct 2017, at 12:22, Rich Roth wrote: > Since I do a lot of text handling for a number of projects, I'll add a > few more comments: > > 1) *programmed (perl,sed,script or saved regex) vs find & replace.* > I find any repeatable method far better then find/replace for a number > of reasons. > David didn't mention if this is a one time need or repeating, clearly > repeating requires more of a saved method. > Even with a one time need, F/R has a fatal flaw if you pick a bad > pattern or just mis-type, while using a saved technique > you can test your method until it's right. > A few comments about unexpected variations in data re-enforce this > idea. > > 2) *Create Tab delimited vs CSV > *He didn't say which spreadsheet, Excel and most others will accept > tab delimited and using tabs does reduce a variety of bumps that extra > commas produce. > > 3) *OpenRefine* > Leave it to HT (thanks Steve) to add a tangent idea of use to > others. I am working on a variety of text processing tasks, using > OCR and various scripts and that looks to be a useful tools. > > In one case, I have processed some 15 data sources into a common > display system for Shaker community members over the 200 years of 17 > communities and some 15,000 members. I still am working on a OCR of > a 1970 microfilm data set of 16,000 more entries. You can see some of > the results at: http://memoirs.shakerpedia.com/ > If any Shaker aficionados on HT, any help is welcome on that or > http://shakerpedia.com/ in general. > > 4) If anyone has such conversion/scanning projects for community > groups, esp historical society, please contact me. > We are now doing some work for ours: Historical society of Greenfield. > > Good luck to David - Rich > > On 10/22/2017 6:21 AM, Steven Brewer wrote: >> I see people have made all the obvious suggestions. Let me add that >> NeoOffice can do search and replace with regular expressions. >> >> But folks should also be aware of OpenRefine: It's a tool for taking >> messy data sets and cleaning them up. It's perhaps overkill for >> something like this, but maybe not: It has a bunch of tools for >> identifying classes of problems (like those that crop up with dodgy >> OCR) >> and being able to correct them all at once. It's worth being aware of >> anyway. >> >> Good luck! >> >> On 10/21/17 7:34 AM, David Greenberg wrote: >>> I have a hard copy list of names, addresses and phone numbers. I can >>> scan to PDF and then copy and paste to a text editor (BBEdit) or >>> other >>> file. I then need to manipulate the text so that I end up with a csv >>> file that can be opened by a spreadsheet program. Tools that I have >>> at >>> my disposal include BBEdit (with Grep), a MAMP stack, NeoOffice (Mac >>> version of OpenOffice) and FileMaker. >>> >>> Input looks like this: >>> >>> John Doe >>> (413) 111-1111 >>> 123 First St Greenfield 01301 >>> Jane Smith >>> 456 So Main Ln Greenfield 01301 >>> Jane Ann Smith >>> (413) 222-2222 >>> 78 Main Ct Greenfield 01301 >>> >>> Note that all addresses will include 'Greenfield 01301' and, /if >>> /the >>> data includes a phone number, it will start with '(413)'. >>> >>> Output should look like this: >>> >>> John,Doe,(413) 111-1111,123 First St,Greenfield,01301 >>> Jane,Smith,,456 So Main Ln,Greenfield,01301 >>> Jane Ann,Smith,(413) 222-2222,78 Main Ct,Greenfield,01301 >>> >>> Any suggestions greatly appreciated. Thanks. >>> >>> David >>> >>> >>> _______________________________________________ >>> Hidden-discuss mailing list - home page: http://www.hidden-tech.net >>> Hidden-discuss at lists.hidden-tech.net >>> >>> You are receiving this because you are on the Hidden-Tech Discussion >>> list. >>> If you would like to change your list preferences, Go to the Members >>> page on the Hidden Tech Web site. >>> http://www.hidden-tech.net/members >>> > > -- > Rich Roth > Webmaster/Steering Committee Member > Hidden-tech http://www.hidden-tech.net > The Talent you need is right here, > Join and share your skills > ((Sponsored by Thrives Media)) > http://www.thrivesmedia.com > http://www.welovemuseums.com > _______________________________________________ > Hidden-discuss mailing list - home page: http://www.hidden-tech.net > Hidden-discuss at lists.hidden-tech.net > > You are receiving this because you are on the Hidden-Tech Discussion > list. > If you would like to change your list preferences, Go to the Members > page on the Hidden Tech Web site. > http://www.hidden-tech.net/members -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.hidden-tech.net/pipermail/hidden-discuss/attachments/20171022/0ec44c13/attachment.html