Dave's FileMaker® Pro Newsletter
FileMaker Data Parsing:
Separation, Commonality, and Distinction

by Dave Dowling, FileMaker database consultant

<< Page 2 | Page 4 >>

Distinction: What does the target field have that no other fields have?

In our example, the "target field" is the First Name field. This is the field we want to isolate so that we can add it to our form letter's salutation line. Here's our example database again:

 
FileMaker Pro Consulting & Training
 
 

· FileMaker Consulting, Training & Programming

 
 

· 15+ Years' Experience

 
 

· Expert FileMaker Training Online

 
  · FileMaker Calendar Solution free to download and try  
spacer
  Contact us for a free, no obligation consultation  
spacer

"John", "Sanders", "555-1234"
"George", "Bailey", ""
"Melissa", "Smith Waters", "555-7890"
"Billy Bob", "", "555-5678"

Let's look at another view of the database; a view that displays invisible data. When I say "invisible", I'm talking about carriage returns, as well as the beginning and end of the database. In this new view of the database, the beginning, or top, of the database will be represented by "<TOP>", carriage returns will be represented by "<CR>", and the end of the database will be represented by "<END>". Note that these are representations of invisible items, or those items that are not evident in a standard CSV data set.

<TOP>
"John", "Sanders", "555-1234" <CR>
"George", "Bailey", "" <CR>
"Melissa", "Smith Waters", "555-7890" <CR>
"Billy Bob", "", "555-5678" <END>

Now let's look at a third view of the database; a delimiter-only view. This view contains only our separators, both visible and invisible.

<TOP>
, , <CR>
, , <CR>
, , <CR>
, , <END>

Notice that all of the records have some things in common:

1. Every record has two commas in it. This tells us that each record has three fields, since each field is separated by a comma.
2. Every record ends in either a carriage return <CR> or the end of the database <END>.
3. Every record also begins in either a carriage return <CR> or the top of the database <TOP>. Here's another view that makes this clear (keep in mind that each <CR> marks the end of one record as well as the beginning of the next):

<TOP> , ,
<CR> , ,
<CR> , ,
<CR> , ,
<END>

Now that we've displayed the "skeleton" of our CSV database, we can see that each record's delimiters form identical patterns, and that those patterns are reliable throughout the database. The only pattern breakers are the beginning and end of the database, where our <CR>s do not appear. No problem there, though, since we have <TOP> and <END>, which can act as delimiters for our first and last records.

"What does this have to do with the concept of distinction?", you might ask. I'm glad you asked! Since we can identify identical repeating patterns in each record (our <CR>s and commas), we can easily identify any field by its position among the delimiters in a record. Here's a look at a sample record from our test database, again with invisibles displayed:

"Melissa", "Smith Waters", "555-7890" <CR>

Let's say we want to isolate field 3 in this record, which is the Phone field. Count all of the commas in this record which precede the Phone field. I perceive you will answer that there are two commas... and you are correct! Now count the commas that follow the Phone field and that precede the end of the record. Again, I perceive that you have guessed that zero commas follow the Phone field. Correct again! What does this tell us? I'll answer this one: field 3 of every record will always be preceded by two commas, and will always be followed directly by a <CR>.

What does all of this mean? It means that we can always locate field 3 in any record in the database by finding the second comma in the record. Everything between the second comma and the end of the record is part of the Phone field. Similarly, we know that field 2, the Last Name field, will always be found after the first comma, and will be followed by one comma before the end of the record. Finally, field 1, the First Name field, will have no preceding commas, and will always be followed by two commas before the end of the record.

Now, if you've had your Wheaties, or if you're just having a good brain day, you might be thinking, "What if there are commas in the field itself -- won't this change the number of commas in the record?" Well, there are ways CSV gets around commas, and even extra quotation marks, but that goes beyond where we want to go in this article. It's highly unlikely that you'll have to parse any CSV, since any good database application should handle CSV parsing for you in its file import functions. The reason we've been looking at CSV together is that it's a very common format, and it's a good vehicle for demonstrating the basics of parsing. We're about to move on from CSV to a couple of typical parsing examples that you might encounter in your database programming experience.

<< Page 2 | Page 4 >>

Copyright 2003, Dave Dowling. All rights reserved.

 
 
 
  FileMaker® Pro Data Parsing
Convert that tangled web into usable data!
 
FileMaker Web Publishing
CDML, Lasso (LDML) & Instant Web Publishing (IWP)