Ok Excel boffins, thinking caps on please

babs

Can't get enough of FH
Joined
Dec 30, 2003
Messages
1,595
I have a series of large text files holding details about organisations and businesses (they're basically search dumps from a website) which I want to parse into a spreadsheet ideally (database at a push). When saved as a text file each record on the page appears like this..
Code:
*ACNE SUPPORT GROUP* *Address: *P.O.Box 9, NEWQUAY, TR9 6WG. *Tel: *08708 xxxxxx. *Email: *xxxxxxxx@btopenworld.com. *Website: *www.stopspots.org. *LAST UPDATED:* 16/01/2007 ------------------------------------------------------------------------

Every record is essentially the same, but some do not have a contact name, and some don't have an email address etc.

Is there a semi-easy way to parse this information into a spreadsheet, using column headers such as company, contact, address, etc.? I imagine we're talking some form of macro/scripting, but as familiar as I am with Excel I've never got into that side of things.

Any ideas then to save me from inane googling and headaches? My best thought so far was to use a find/replace to replace the ------------ with a carriage return, and read it in as a text file using * as a delimiter and then do some manual tidying :(
 

MYstIC G

Official Licensed Lump of Coal™ Distributor
Staff member
Moderator
FH Subscriber
Joined
Dec 22, 2003
Messages
12,362
My quick and jiffy hack:
  • Import into word
  • Replace the long dash with a paragraph mark
  • Convert text to table using * as the separator
  • Cut & paste into Excel
  • Delete erroneous columns and add headers.
Hope that's of some use.
 

babs

Can't get enough of FH
Joined
Dec 30, 2003
Messages
1,595
My quick and jiffy hack:
  • Import into word
  • Replace the long dash with a paragraph mark
  • Convert text to table using * as the separator
  • Cut & paste into Excel
  • Delete erroneous columns and add headers.
Hope that's of some use.
Yup, during the day I've done exactly that too :)

The problem is the records have varying numbers of fields, so sometimes where there are gaps some columns no longer line up, as the data gets shunted to the left where you would want a blank cell to appear. Sure if it was a small number this would be doable by hand, but we're talking about files which will have 10,000+ records.

Really I'd want something which read in the field names and entered the data into the columns resepctively, leaving blanks where there was no data.

Thanks though :)
 

MYstIC G

Official Licensed Lump of Coal™ Distributor
Staff member
Moderator
FH Subscriber
Joined
Dec 22, 2003
Messages
12,362
How are the text files created? Can you not force that to produce a record with a set character when there is no specific data entered?
 

babs

Can't get enough of FH
Joined
Dec 30, 2003
Messages
1,595
It's an HTML page (created in asp) which dumps the results as a page of text. There's no choice over the output type.
 

MYstIC G

Official Licensed Lump of Coal™ Distributor
Staff member
Moderator
FH Subscriber
Joined
Dec 22, 2003
Messages
12,362
Is the HTML source code any better?
 

smurkin

Can't get enough of FH
Joined
Dec 22, 2003
Messages
560
You can import text to columns using Data>Import External Data> import data > wizrd starts [point to txt file or other format} > delimited text [here you specify how the columns will be defined, eg with tabs, colons, commas, whatever] > in the next wizzard page you can have a text qualifier > then to column format.

This is all very useful, but your main problem is sometime a field will be missing so that all of the columns to the right will be out of alignment. I'm willing to bet you can work around this by using the text qualifier (in your example, try "*" and to sacrify some of the extra fields, pushing the the extra into the previous column [idf that makes sense]

I guess what I'm saying is just have a play.
 

Jupitus

Old and short, no wonder I'm grumpy!
Staff member
Moderator
FH Subscriber
Joined
Dec 14, 2003
Messages
3,285
If it's not too late PM me and I'll let you mail the file to me to try and do this... I've done similar stuff alot in the past...
 

Users who are viewing this thread

Top Bottom