[Hidden-tech] Text Manipulation Problem

Shel Horowitz shel at principledprofit.com
Mon Oct 23 07:31:46 EDT 2017


But those all look like strings that might show up for other reasons, which
makes global search-and-replace not an option. How do yo work around that?


Shel Horowitz - "The Transformpreneur"(sm)
________________________________________________
Watch (and please share) my TEDx Talk,
"Impossible is a Dare: Business for a Better World"
*http://www.ted.com/tedx/events/11809
<http://www.ted.com/tedx/events/11809>**
<http://www.ted.com/tedx/events/11809>*
(move your mouse to "event videos")

Contact me to bake in profitability while addressing hunger,
poverty, war, and catastrophic climate change

Twitter: @shelhorowitz

* First business ever to be Green America Gold Certified
* Inducted into the National Environmental Hall of Fame

http://goingbeyondsustainability.com
http://transformpreneur.com
mailto:shel at greenandprofitable.com * 413-586-2388
Award-winning, best-selling author of 10 books. Latest:
Guerrilla Marketing to Heal the World (co-authored with Jay Conrad Levinson)

_________________________________________________

On Sun, Oct 22, 2017 at 2:39 PM, Elijah Gwynn <eli at egwynn.com> wrote:

> Regarding tabs-vs-commas: it's a real tragedy that more programs don't
> make use of any of the *four* ASCII delimiter characters (
> https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text) that have
> been available since ASCII-1965. The whole world of character-escaping
> problems that we programmers deal with in order to support CSV/TSV could
> have been avoided!
>
> Eli
>
> On 22 Oct 2017, at 12:22, Rich Roth wrote:
>
> Since I do a lot of text handling for a number of projects, I'll add a few
> more comments:
>
> 1) *programmed (perl,sed,script or saved regex) vs find & replace.*
> I find any repeatable method far better then find/replace for a number of
> reasons.
> David didn't mention if this is a one time need or repeating, clearly
> repeating requires more of a saved method.
> Even with a one time need, F/R has a fatal flaw if you pick a bad pattern
> or just mis-type, while using a saved technique
> you can test your method until it's right.
> A few comments about unexpected variations in data re-enforce this idea.
>
> 2)
> *Create Tab delimited vs CSV *He didn't say which spreadsheet, Excel and
> most others will accept tab delimited and using tabs does reduce a variety
> of bumps that extra commas produce.
> 3) *OpenRefine*
> Leave it to HT (thanks Steve) to add a tangent idea of use to others.  I
> am working on a variety of text processing tasks, using OCR and various
> scripts and that looks to be a useful tools.
>
> In one case, I have processed some 15 data sources into a common display
> system for Shaker community members over the 200 years of 17 communities
> and some 15,000 members.  I still am working on a OCR of a 1970 microfilm
> data set of 16,000 more entries. You can see some of the results at:
> http://memoirs.shakerpedia.com/
> If any Shaker aficionados on HT, any help is welcome on that or
> http://shakerpedia.com/ in general.
>
> 4) If anyone has such conversion/scanning projects for community groups,
> esp historical society, please contact me.
> We are now doing some work for ours: Historical society of Greenfield.
>
> Good luck to David - Rich
>
> On 10/22/2017 6:21 AM, Steven Brewer wrote:
>
> I see people have made all the obvious suggestions. Let me add that
> NeoOffice can do search and replace with regular expressions.
>
> But folks should also be aware of OpenRefine: It's a tool for taking
> messy data sets and cleaning them up. It's perhaps overkill for
> something like this, but maybe not: It has a bunch of tools for
> identifying classes of problems (like those that crop up with dodgy OCR)
> and being able to correct them all at once. It's worth being aware of
> anyway.
>
> Good luck!
>
> On 10/21/17 7:34 AM, David Greenberg wrote:
>
> I have a hard copy list of names, addresses and phone numbers. I can
> scan to PDF and then copy and paste to a text editor (BBEdit) or other
> file. I then need to manipulate the text so that I end up with a csv
> file that can be opened by a spreadsheet program. Tools that I have at
> my disposal include BBEdit (with Grep), a MAMP stack, NeoOffice (Mac
> version of OpenOffice) and FileMaker.
>
> Input looks like this:
>
> John Doe
> (413) 111-1111
> 123 First St Greenfield 01301
> Jane Smith
> 456 So Main Ln Greenfield 01301
> Jane Ann Smith(413) 222-2222
> 78 Main Ct Greenfield 01301
>
> Note that all addresses will include 'Greenfield 01301' and, /if /the
> data includes a phone number, it will start with '(413)'.
>
> Output should look like this:
>
> John,Doe,(413) 111-1111,123 First St,Greenfield,01301
> Jane,Smith,,456 So Main Ln,Greenfield,01301
> Jane Ann,Smith,(413) 222-2222,78 Main Ct,Greenfield,01301
>
> Any suggestions greatly appreciated. Thanks.
>
> David
>
>
> _______________________________________________
> Hidden-discuss mailing list - home page: http://www.hidden-tech.netHidden-discuss@lists.hidden-tech.net
>
> You are receiving this because you are on the Hidden-Tech Discussion list.
> If you would like to change your list preferences, Go to the Members
> page on the Hidden Tech Web site.http://www.hidden-tech.net/members
>
>
> --
> Rich Roth
> Webmaster/Steering Committee Member
> Hidden-tech http://www.hidden-tech.net
> The Talent you need is right here,
> Join and share your skills
> ((Sponsored by Thrives Media))http://www.thrivesmedia.comhttp://www.welovemuseums.com
>
> _______________________________________________
> Hidden-discuss mailing list - home page: http://www.hidden-tech.net
> Hidden-discuss at lists.hidden-tech.net
>
> You are receiving this because you are on the Hidden-Tech Discussion list.
> If you would like to change your list preferences, Go to the Members
> page on the Hidden Tech Web site.
> http://www.hidden-tech.net/members
>
>
> _______________________________________________
> Hidden-discuss mailing list - home page: http://www.hidden-tech.net
> Hidden-discuss at lists.hidden-tech.net
>
> You are receiving this because you are on the Hidden-Tech Discussion list.
> If you would like to change your list preferences, Go to the Members
> page on the Hidden Tech Web site.
> http://www.hidden-tech.net/members
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.hidden-tech.net/pipermail/hidden-discuss/attachments/20171023/57af09cf/attachment-0001.html 


Google

More information about the Hidden-discuss mailing list