[Hidden-tech] Text Manipulation Problem

Don Lesser dlesser at ptraining.com
Mon Oct 23 14:24:26 EDT 2017


Unless you can type them, you cant select them as delimiters in Excel. The Text to Columns operation needs you to type in the delimiter in order to recognize it.

 

Don Lesser

Pioneer Training, Inc.

139B Damon Road, Ste 8

Northampton, MA 01060

(413) 387-1040

 <mailto:dlesser at ptraining.com> dlesser at ptraining.com

 <http://www.ptraining.com/> www.ptraining.com

 

From: hidden-discuss-bounces at lists.hidden-tech.net [mailto:hidden-discuss-bounces at lists.hidden-tech.net] On Behalf Of Elijah Gwynn
Sent: Monday, October 23, 2017 9:33 AM
To: Shel Horowitz <shel at principledprofit.com>
Cc: Hidden-Tech Tech <hidden-discuss at lists.hidden-tech.net>
Subject: Re: [Hidden-tech] Text Manipulation Problem

 

ASCII’s FS, GS, RS, and US chars almost certainly won’t be in the text you’re dealing with. They’re not printable so they’d look really out of place in a text doc. That said, I don’t know if they’re actually well enough supported (e.g., in Excel) to make their use “a good idea”. The theoretical use-case would be that you could use them instead of their CSV “counterpart” delimiters (commas and newlines) in a program generating your new file, without worrying about any commas/newlines in the source text screwing up your table. Search/replace would probably not work with them.

 

Eli

 

Sent from my iPhone


On Oct 23, 2017, at 7:31 AM, Shel Horowitz <shel at principledprofit.com <mailto:shel at principledprofit.com> > wrote:

But those all look like strings that might show up for other reasons, which makes global search-and-replace not an option. How do yo work around that?





Shel Horowitz - "The Transformpreneur"(sm)

________________________________________________

Watch (and please share) my TEDx Talk,

"Impossible is a Dare: Business for a Better World"

 <http://www.ted.com/tedx/events/11809> http://www.ted.com/tedx/events/11809 <http://www.ted.com/tedx/events/11809>  

(move your mouse to "event videos")

 

Contact me to bake in profitability while addressing hunger, 

poverty, war, and catastrophic climate change

 

Twitter: @shelhorowitz

 

* First business ever to be Green America Gold Certified

* Inducted into the National Environmental Hall of Fame

 

 <http://goingbeyondsustainability.com/> http://goingbeyondsustainability.com 

 <http://transformpreneur.com/> http://transformpreneur.com

mailto: <mailto:shel at greenandprofitable.com> shel at greenandprofitable.com *  <tel:413-586-2388> 413-586-2388

Award-winning, best-selling author of 10 books. Latest:

Guerrilla Marketing to Heal the World (co-authored with Jay Conrad Levinson)

 

_________________________________________________

 

On Sun, Oct 22, 2017 at 2:39 PM, Elijah Gwynn <eli at egwynn.com <mailto:eli at egwynn.com> > wrote:

Regarding tabs-vs-commas: it's a real tragedy that more programs don't make use of any of the four ASCII delimiter characters ( <https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text> https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text) that have been available since ASCII-1965. The whole world of character-escaping problems that we programmers deal with in order to support CSV/TSV could have been avoided!

Eli

On 22 Oct 2017, at 12:22, Rich Roth wrote:

Since I do a lot of text handling for a number of projects, I'll add a few more comments:

1) programmed (perl,sed,script or saved regex) vs find & replace.
I find any repeatable method far better then find/replace for a number of reasons.
David didn't mention if this is a one time need or repeating, clearly repeating requires more of a saved method.
Even with a one time need, F/R has a fatal flaw if you pick a bad pattern or just mis-type, while using a saved technique
you can test your method until it's right.  
A few comments about unexpected variations in data re-enforce this idea.

2) Create Tab delimited vs CSV
He didn't say which spreadsheet, Excel and most others will accept tab delimited and using tabs does reduce a variety of bumps that extra commas produce.

3) OpenRefine
Leave it to HT (thanks Steve) to add a tangent idea of use to others.  I am working on a variety of text processing tasks, using OCR and various scripts and that looks to be a useful tools.  

In one case, I have processed some 15 data sources into a common display system for Shaker community members over the 200 years of 17 communities and some 15,000 members.  I still am working on a OCR of a 1970 microfilm data set of 16,000 more entries. You can see some of the results at: http://memoirs.shakerpedia.com/
If any Shaker aficionados on HT, any help is welcome on that or http://shakerpedia.com/ in general.

4) If anyone has such conversion/scanning projects for community groups, esp historical society, please contact me.
We are now doing some work for ours: Historical society of Greenfield.

Good luck to David - Rich

On 10/22/2017 6:21 AM, Steven Brewer wrote:

I see people have made all the obvious suggestions. Let me add that
NeoOffice can do search and replace with regular expressions.
 
But folks should also be aware of OpenRefine: It's a tool for taking
messy data sets and cleaning them up. It's perhaps overkill for
something like this, but maybe not: It has a bunch of tools for
identifying classes of problems (like those that crop up with dodgy OCR)
and being able to correct them all at once. It's worth being aware of
anyway.
 
Good luck!
 
On 10/21/17 7:34 AM, David Greenberg wrote:

I have a hard copy list of names, addresses and phone numbers. I can
scan to PDF and then copy and paste to a text editor (BBEdit) or other
file. I then need to manipulate the text so that I end up with a csv
file that can be opened by a spreadsheet program. Tools that I have at
my disposal include BBEdit (with Grep), a MAMP stack, NeoOffice (Mac
version of OpenOffice) and FileMaker.
 
Input looks like this:
 
John Doe
(413) 111-1111
123 First St Greenfield 01301
Jane Smith
456 So Main Ln Greenfield 01301
Jane Ann Smith
(413) 222-2222 <tel:(413)%20222-2222> 
78 Main Ct Greenfield 01301
 
Note that all addresses will include 'Greenfield 01301' and, /if /the
data includes a phone number, it will start with '(413)'.
 
Output should look like this:
 
John,Doe,(413) 111-1111,123 First St,Greenfield,01301
Jane,Smith,,456 So Main Ln,Greenfield,01301
Jane Ann,Smith,(413) 222-2222,78 Main Ct,Greenfield,01301
 
Any suggestions greatly appreciated. Thanks.
 
David
 
 
_______________________________________________
Hidden-discuss mailing list - home page: http://www.hidden-tech.net
Hidden-discuss at lists.hidden-tech.net <mailto:Hidden-discuss at lists.hidden-tech.net> 
 
You are receiving this because you are on the Hidden-Tech Discussion list.
If you would like to change your list preferences, Go to the Members   
page on the Hidden Tech Web site.
http://www.hidden-tech.net/members
 





-- 
Rich Roth
Webmaster/Steering Committee Member
Hidden-tech http://www.hidden-tech.net
The Talent you need is right here,
Join and share your skills
((Sponsored by Thrives Media))
http://www.thrivesmedia.com
http://www.welovemuseums.com

_______________________________________________
Hidden-discuss mailing list - home page:  <http://www.hidden-tech.net> http://www.hidden-tech.net
Hidden-discuss at lists.hidden-tech.net <mailto:Hidden-discuss at lists.hidden-tech.net> 

You are receiving this because you are on the Hidden-Tech Discussion list.
If you would like to change your list preferences, Go to the Members
page on the Hidden Tech Web site.
 <http://www.hidden-tech.net/members> http://www.hidden-tech.net/members


_______________________________________________
Hidden-discuss mailing list - home page: http://www.hidden-tech.net
Hidden-discuss at lists.hidden-tech.net <mailto:Hidden-discuss at lists.hidden-tech.net> 

You are receiving this because you are on the Hidden-Tech Discussion list.
If you would like to change your list preferences, Go to the Members
page on the Hidden Tech Web site.
http://www.hidden-tech.net/members

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.hidden-tech.net/pipermail/hidden-discuss/attachments/20171023/8c99dec8/attachment-0001.html 


Google

More information about the Hidden-discuss mailing list