[Hidden-tech] Text Manipulation Problem

Elijah Gwynn eli at egwynn.com
Mon Oct 23 15:07:31 EDT 2017


Deep down the rabbit whole of technicalities… these characters can be 
"typed" (tediously) by using your OS' "character picker" (or 
copied/pasted) and used in Excel as field delimiters. However, they 
can't be used as row delimiters. Excel can't seem to tolerate 
non-newline row delimiters when importing data. Indeed, the real 
"tragedy" I wanted to point out earlier is that a set of dedicated 
hierarchical delimiters was added to ASCII decades ago and now they're 
borderline useless. :-\

Eli

On 23 Oct 2017, at 14:24, Don Lesser wrote:

> Unless you can type them, you cant select them as delimiters in Excel. 
> The Text to Columns operation needs you to type in the delimiter in 
> order to recognize it.
>
>
>
> Don Lesser
>
> Pioneer Training, Inc.
>
> 139B Damon Road, Ste 8
>
> Northampton, MA 01060
>
> (413) 387-1040
>
>  <mailto:dlesser at ptraining.com> dlesser at ptraining.com
>
>  <http://www.ptraining.com/> www.ptraining.com
>
>
>
> From: hidden-discuss-bounces at lists.hidden-tech.net 
> [mailto:hidden-discuss-bounces at lists.hidden-tech.net] On Behalf Of 
> Elijah Gwynn
> Sent: Monday, October 23, 2017 9:33 AM
> To: Shel Horowitz <shel at principledprofit.com>
> Cc: Hidden-Tech Tech <hidden-discuss at lists.hidden-tech.net>
> Subject: Re: [Hidden-tech] Text Manipulation Problem
>
>
>
> ASCII’s FS, GS, RS, and US chars almost certainly won’t be in the 
> text you’re dealing with. They’re not printable so they’d look 
> really out of place in a text doc. That said, I don’t know if 
> they’re actually well enough supported (e.g., in Excel) to make 
> their use “a good idea”. The theoretical use-case would be that 
> you could use them instead of their CSV “counterpart” delimiters 
> (commas and newlines) in a program generating your new file, without 
> worrying about any commas/newlines in the source text screwing up your 
> table. Search/replace would probably not work with them.
>
>
>
> Eli
>
>
>
> Sent from my iPhone
>
>
> On Oct 23, 2017, at 7:31 AM, Shel Horowitz <shel at principledprofit.com 
> <mailto:shel at principledprofit.com> > wrote:
>
> But those all look like strings that might show up for other reasons, 
> which makes global search-and-replace not an option. How do yo work 
> around that?
>
>
>
>
>
> Shel Horowitz - "The Transformpreneur"(sm)
>
> ________________________________________________
>
> Watch (and please share) my TEDx Talk,
>
> "Impossible is a Dare: Business for a Better World"
>
>  <http://www.ted.com/tedx/events/11809> 
> http://www.ted.com/tedx/events/11809 
> <http://www.ted.com/tedx/events/11809>
>
> (move your mouse to "event videos")
>
>
>
> Contact me to bake in profitability while addressing hunger,
>
> poverty, war, and catastrophic climate change
>
>
>
> Twitter: @shelhorowitz
>
>
>
> * First business ever to be Green America Gold Certified
>
> * Inducted into the National Environmental Hall of Fame
>
>
>
>  <http://goingbeyondsustainability.com/> 
> http://goingbeyondsustainability.com
>
>  <http://transformpreneur.com/> http://transformpreneur.com
>
> mailto: <mailto:shel at greenandprofitable.com> 
> shel at greenandprofitable.com *  <tel:413-586-2388> 413-586-2388
>
> Award-winning, best-selling author of 10 books. Latest:
>
> Guerrilla Marketing to Heal the World (co-authored with Jay Conrad 
> Levinson)
>
>
>
> _________________________________________________
>
>
>
> On Sun, Oct 22, 2017 at 2:39 PM, Elijah Gwynn <eli at egwynn.com 
> <mailto:eli at egwynn.com> > wrote:
>
> Regarding tabs-vs-commas: it's a real tragedy that more programs don't 
> make use of any of the four ASCII delimiter characters ( 
> <https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text> 
> https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text) that 
> have been available since ASCII-1965. The whole world of 
> character-escaping problems that we programmers deal with in order to 
> support CSV/TSV could have been avoided!
>
> Eli
>
> On 22 Oct 2017, at 12:22, Rich Roth wrote:
>
> Since I do a lot of text handling for a number of projects, I'll add a 
> few more comments:
>
> 1) programmed (perl,sed,script or saved regex) vs find & replace.
> I find any repeatable method far better then find/replace for a number 
> of reasons.
> David didn't mention if this is a one time need or repeating, clearly 
> repeating requires more of a saved method.
> Even with a one time need, F/R has a fatal flaw if you pick a bad 
> pattern or just mis-type, while using a saved technique
> you can test your method until it's right.
> A few comments about unexpected variations in data re-enforce this 
> idea.
>
> 2) Create Tab delimited vs CSV
> He didn't say which spreadsheet, Excel and most others will accept tab 
> delimited and using tabs does reduce a variety of bumps that extra 
> commas produce.
>
> 3) OpenRefine
> Leave it to HT (thanks Steve) to add a tangent idea of use to others.  
> I am working on a variety of text processing tasks, using OCR and 
> various scripts and that looks to be a useful tools.
>
> In one case, I have processed some 15 data sources into a common 
> display system for Shaker community members over the 200 years of 17 
> communities and some 15,000 members.  I still am working on a OCR of a 
> 1970 microfilm data set of 16,000 more entries. You can see some of 
> the results at: http://memoirs.shakerpedia.com/
> If any Shaker aficionados on HT, any help is welcome on that or 
> http://shakerpedia.com/ in general.
>
> 4) If anyone has such conversion/scanning projects for community 
> groups, esp historical society, please contact me.
> We are now doing some work for ours: Historical society of Greenfield.
>
> Good luck to David - Rich
>
> On 10/22/2017 6:21 AM, Steven Brewer wrote:
>
> I see people have made all the obvious suggestions. Let me add that
> NeoOffice can do search and replace with regular expressions.
>
> But folks should also be aware of OpenRefine: It's a tool for taking
> messy data sets and cleaning them up. It's perhaps overkill for
> something like this, but maybe not: It has a bunch of tools for
> identifying classes of problems (like those that crop up with dodgy 
> OCR)
> and being able to correct them all at once. It's worth being aware of
> anyway.
>
> Good luck!
>
> On 10/21/17 7:34 AM, David Greenberg wrote:
>
> I have a hard copy list of names, addresses and phone numbers. I can
> scan to PDF and then copy and paste to a text editor (BBEdit) or other
> file. I then need to manipulate the text so that I end up with a csv
> file that can be opened by a spreadsheet program. Tools that I have at
> my disposal include BBEdit (with Grep), a MAMP stack, NeoOffice (Mac
> version of OpenOffice) and FileMaker.
>
> Input looks like this:
>
> John Doe
> (413) 111-1111
> 123 First St Greenfield 01301
> Jane Smith
> 456 So Main Ln Greenfield 01301
> Jane Ann Smith
> (413) 222-2222 <tel:(413)%20222-2222>
> 78 Main Ct Greenfield 01301
>
> Note that all addresses will include 'Greenfield 01301' and, /if /the
> data includes a phone number, it will start with '(413)'.
>
> Output should look like this:
>
> John,Doe,(413) 111-1111,123 First St,Greenfield,01301
> Jane,Smith,,456 So Main Ln,Greenfield,01301
> Jane Ann,Smith,(413) 222-2222,78 Main Ct,Greenfield,01301
>
> Any suggestions greatly appreciated. Thanks.
>
> David
>
>
> _______________________________________________
> Hidden-discuss mailing list - home page: http://www.hidden-tech.net
> Hidden-discuss at lists.hidden-tech.net 
> <mailto:Hidden-discuss at lists.hidden-tech.net>
>
> You are receiving this because you are on the Hidden-Tech Discussion 
> list.
> If you would like to change your list preferences, Go to the Members
> page on the Hidden Tech Web site.
> http://www.hidden-tech.net/members
>
>
>
>
>
>
> -- 
> Rich Roth
> Webmaster/Steering Committee Member
> Hidden-tech http://www.hidden-tech.net
> The Talent you need is right here,
> Join and share your skills
> ((Sponsored by Thrives Media))
> http://www.thrivesmedia.com
> http://www.welovemuseums.com
>
> _______________________________________________
> Hidden-discuss mailing list - home page:  <http://www.hidden-tech.net> 
> http://www.hidden-tech.net
> Hidden-discuss at lists.hidden-tech.net 
> <mailto:Hidden-discuss at lists.hidden-tech.net>
>
> You are receiving this because you are on the Hidden-Tech Discussion 
> list.
> If you would like to change your list preferences, Go to the Members
> page on the Hidden Tech Web site.
>  <http://www.hidden-tech.net/members> 
> http://www.hidden-tech.net/members
>
>
> _______________________________________________
> Hidden-discuss mailing list - home page: http://www.hidden-tech.net
> Hidden-discuss at lists.hidden-tech.net 
> <mailto:Hidden-discuss at lists.hidden-tech.net>
>
> You are receiving this because you are on the Hidden-Tech Discussion 
> list.
> If you would like to change your list preferences, Go to the Members
> page on the Hidden Tech Web site.
> http://www.hidden-tech.net/members


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.hidden-tech.net/pipermail/hidden-discuss/attachments/20171023/953b229d/attachment-0001.html 


Google

More information about the Hidden-discuss mailing list