Pick For Professionals, Part 5: The ABC's of GFE's
Jonathan E. Sisk
In future releases of AP, specifically, releases 6.1 and higher with the new "Halt Tolerance" feature, Group Format Error's will be a thing of the past. In the present, however, they are still there from time to time.
GFE's happen for a reason. As a "software" kind of guy, I can assure you that they are usually caused by hardware. But power ranks way up there, too. If power drops to a busy Pick System - and there is no battery backup (UPS) - you are practically guaranteed to have GFE's when you get the system back up.
I have heard the scheme by which Pick Systems 6.1 release eliminates GFE's, and it is mostly the power-related types that are "eliminated", but this is a separate story.
This article deals with what happens when a GFE occurs in AP systems which do not yet have "Halt Tolerance." Bear in mind that there are technically several dozen types of GFE's, ranging from simple ones where the byte count field (a counter embedded in the item header that indicates the length of the item) is for some reason invalid, to the worst type: a bad linkage field (when a frame points to an invalid frame, or worse, to a frame that does not exist.)
A lot of work went into the "Automatic GFE Handler" in Advanced Pick. this routine does its' best to attempt to determine the type of GFE and then attempts to fix it. Unfortunately, it's not quite that simple. The usual effect of letting the system "fix" a GFE results in the loss of at least one item, and perhaps a bunch of items, depending upon where it occurs.
The worst case scenario is getting a GFE in the mds file, where each "item" represents an entire account, meaning that the loss of any item means the loss of an entire account. We have seen this happen. Ironically - and most recently - this was caused by a software glitch, but we have reported this to Pick and it will undoubtedly be corrected in a future release.
Locating Suspected GFE's:
The only certain method of determining if your system has GFE's is to perform a "fake save". We recommend that you do this after any situation in which your system is taken down abruptly. That is, it doesn't get the chance to go through the normal shutdown sequence. This is very easy and uses the TCL command:
:save (f
The save command walks through the entire file system and checks the integrity of every item in every file. The "f" option displays the file names as they are checked. Without the "f" option, only the account names are displayed, limiting the usefulness of the procedure.
1 2 mds [The mds file. If it's here, good luck.] 0 123 mds > production [Account MD for "production"] 0 128 mds > production > orders [Account = "production", file = DICT orders] 0 129 mds > production > orders > orders [Same account, file = DATA orders] 0 130 mds > production > orders > archive [Same account, file = DATA orders,archive] 0 131 mds > production > invoices 0 132 mds > production > invoices > invoices
If no GFE's are encountered, you are in luck, and it's a fairly safe bet that your system is intact. Otherwise, if a GFE is found, you automatically enter:
The Automatic GFE Handler:
When a GFE is encountered, several things occur invisibly before reporting the GFE to the lucky recipient (whomever happens to stumble upon it in the course of wandering through or updating a file).
First, and this important and will be discussed later, a "snapshot" (just like doing a "dump" of the frame, which is discussed later in this article) is placed into a data section of the errors,gfe file using an item-id consisting of a system date/time stamp. This is effectively a "before" picture of the actual frame in which the GFE is found.
Next, an item is placed into the normal data section of the errors file. This item can be reviewed using the list-errors command. Note that this command is actually a "pause-type" macro and when executed, displays the actual TCL command that it is about to execute. This means that you could go to the end of the TCL command, using a Ctrl+g and add " (p" or " lptr", which would provide you with a hard-copy.
:list-errors:sort dm,errors, by-dsnd date by-dsnd stime timdat r error.message last-tcl-entry userpib abs-vers-pgm id-supp dbl-spc (p Page 1 dm,errors, 09:01:14 02 Jun 1994 time. r error.message................. last tcl entry... pib/user md. abs-fid. date abs-date pgm-ctr 09:00 gfe count save (f 12 jes prod 000018 06/02 Group FID Error FID Frwrd FID 01/12/94 004488 004488 000000 2FE.3D7
Probably the two most important pieces of information provided in the sample output are the type of GFE (a bad count field, in this case), and the location (frame "4488" in this case) which is the hexadecimal address of the frame in which the GFE occurred. The hexadecimal frame address can be converted to it's decimal equivalent using the "xtd" command at TCL:
:xtd 448817544
Who Should Be Permitted to Fix a GFE?
Before we go any further in discussing the ramifications of dealing with GFE's, we should take a side trip to the users file, which contains an attribute called "options". This can be reviewed or changed from the dm account using the "u" (or "update") command:
:u users user-id name options privilegeusers 'user-id' size = nnn name user's name options privilege sys2
The options attribute allows a variety of single alphabetic codes, one of which deals with fixing GFE's. This is the "g" option. If a users' options attribute contains a "g", they are not allowed to "fix" GFE's. This is a good idea for the majority of users on your system. Only someone who is trained in GFE repair - by virtue of having attended a class, learning it on their own, or having at least skimmed this article - should be allowed to do this. Note that the default setting for users does not contain the "g" option.
To place a "g" in the options attribute, press [Return] to advance to the options attribute, type the letter g, then Ctrl+xf to file the item. (Note that you can also use the good old "ED" command and do this the old-fashioned way. Options are in attribute 9.) Now, back to our regularly scheduled programming.
Okay, We Found a GFE. Now What Happens?
When a GFE is encountered, one of two prompts appear, according to the users option code. If they have the letter "g", they get the "No Fixing Allowed" prompt. If they don't have the "g", they get the "Way Dangerous" prompt.
The "No Fixing Allowed" Prompt:
**** GFE encountered @gfe.sr - Bad count field
If the user privilege level is less than 2 (meaning either 1 or 0), that's all they see. They immediately return to TCL - and probably call you. If their privilege level is 2, they get the additional prompt:
**** GFE encountered @gfe.sr - Bad count field O=log off/C=continue/Q=Quit?
You get to decide the action you want them to take, but be advised that the "c" response to this prompt usually does not work.
If the user privilege level is 2, and the "g" option is not set, the lucky GFE-winner receives:
The "Way-Dangerous" Prompt:
This is the prompt that allows the GFE handler to resolve the GFE:
**** GFE encountered @gfe.sr - Bad count field F-Fix item/T=Truncate group & quit/Q=just quit/=go to debugger?
This is where the fun begins. Before getting into the next phase of what you should do here, I would like to stress something you should not do, namely, restore your old file-save at this point. It still surprises me how many people are simply told - usually by their software vendor or some other harried soul who just wants the user off their back for a while - that when they get a GFE they should just restore from the last file-save. This is kind of drastic, and, while it's not entirely out of the question, it may be overkill to simply abandon all hope and lose everything you've done since the last save.
A number of decisions need to be made before answering this question. The first, and most important decision is whether or not the data contained in the frame with the GFE is on the most-recent file-save tape. If it is, we usually just let the GFE Handler "fix" the GFE and then sel-restore the file from the last file-save. Otherwise, it's decision time.
Getting a "Hard-Copy" of the Frame in Which the GFE Occurred:
Even if you intend to let the system "fix" the GFE, this step is a good idea. At TCL - on a different port than the one with the GFE message - issue the command:
:sort-item dm,errors,gfe by-dsnd a0 (pPage 1 errors,gfe 10:14:32 02 Jun 1994 965032436 001 0000 00000000 11001001 6C697374 FE7661FE :........list^va^: 002 0010 61FD3537 FEFF0000 00001500 1001736F :a]57^_........so: 003 0020 7274FE76 61FE61FD 3564FEFF 00000000 :rt^va^a]5d^_....: 004 0030 17001001 636F756E 74FE7662 FE61FD35 :....count^vb^a]5: 005 0040 63FEFFFF FFFFFFFF FFFFFFFF FFFFFFFF :c^______________:
This provides a hard-copy of the "before" snapshot of the frame automatically placed there earlier in this article.
Note that the frame-id address ("fid") of the frame in which the GFE occurred is not part of the display and that the actual display you get will contain more stuff than is shown here. You can confirm that the frame-id derived earlier in this article (x'4488' or d'17544') is the same by using the dump command as follows:
:dump xp 17544or :dump xp .4488 fid: 17544 : 0 0 0 0 ( 4488 : 0 0 0 0 ) 0000 00000000 11001001 6C697374 FE7661FE 000 :........list^va^: 0010 61FD3537 FEFF0000 00001500 1001736F 016 :a]57^_........so: 0020 7274FE76 61FE61FD 3564FEFF 00000000 032 :rt^va^a]5d^_....: 0030 17001001 636F756E 74FE7662 FE61FD35 048 :....count^vb^a]5: 0040 63FEFFFF FFFFFFFF FFFFFFFF FFFFFFFF 064 :c^______________:
Decision Time
In comparing the two printouts, we see that the data shown on the rightmost column (which contains the "ASCII", or human-readable portion) is the same. This is a good sign. It means that no additional GFE's have been encountered since we began this article.The next step - and this is one that is not so easy - is to decide which file the data belongs to. Given essentially a one- frame sampling, you need to be able to recognize the data before making the decision to let the system fix the GFE. If the GFE was encountered during the "fake save", the filename in which the GFE was found should be the last one before the GFE Handler prompt. See the notes next to the filenames listed under the "fake save" for determining which level of a file the GFE is in. And pray that it's not your mds.
The guidelines we usually go by is whether or not it occurred in a dynamic file, meaning a file that we are constantly updating, or in a (relatively) static file, meaning one that doesn't change too often (like the documentation file). If it's a static file, we generally let the system fix it. If it's a dynamic file, then my staff usually calls me to get in there and fix it.
Hopefully, it's not too late to mention that this article will not deal with the actual incantations involved in fixing GFE's by hand. This would require roughly 50,000 words, or approximately a full day of class.
Our sample frame contained three items that we put there strictly for the purposes of writing this article. The three items placed into the file were simply verbs copied in from the MD of the account.
Letting The System Fix It:
Let's say that you have decided to let the system fix it. The options presented to you are:
F-Fix item/T=Truncate group & quit/Q=just quit/=go to debugger?
The "F" option does its best to resolve the GFE. For the record, this is what we used on our sample GFE. If you choose this option, you should get an "after" picture. Go back to the previous step of getting a hard-copy of the frame using the dump command so that you can compare the "before" picture with the "after" picture. After choosing the "F" option, our sample frame ended up looking like this:
:dump x 17544fid: 17544 : 0 0 0 0 ( 4488 : 0 0 0 0 ) 0000 FF000000 11001001 6C697374 FE7661FE 000 :_.......list^va^: 0010 61FD3537 FEFF0000 00001500 1001736F 016 :a]57^_........so: 0020 7274FE76 61FE61FD 3564FEFF 00000000 032 :rt^va^a]5d^_....: 0030 17001001 636F756E 74FE7662 FE61FD35 048 :....count^vb^a]5: 0040 63FEFFFF FFFFFFFF FFFFFFFF FFFFFFFF 064 :c^______________:
Note that the first character in the second column of output is a hexadecimal FF, also known in Pick lingo as a Segment Mark". In the corresponding position in the "ASCII" display, it looks like an underscore "_". The presence of this character at this position indicates that the entire group is empty. As it turns out, the "T" (Truncate Group) would have had the same effect. At least we know. Now, we can get the last file-save mounted and ready to restore the lost items, providing that they were on the backup tape. This is accomplished by logging into the account where the file that formerly had the GFE resides and issuing the command:
:sel-restore GFE-filename *Block size: nnnnn Restore from F)ull/Account, I)ncremental or T)ransaction log (f/i/t):f account name on tape:GFE-account file name:GFE-filename
The "GFE-Account" is the account which contains the file in which the GFE was found. The "GFE-filename" is the name of the file in which the GFE occurred. The "*" indicates all items. Without any additional options to the sel-restore command, this will only restore the items that are not already in the file. This will also restore items which have deliberately been deleted since the last save.
In summary, Advanced Pick's GFE Handler "fixes" usually amount to losing some data. For more information on actually getting into the System Debugger and fixing them yourself, JES & Associates offers an intense, comprehensive three- day class on System Administration and Maintenance.
