Tech Note

Filename:	Why Don't We Just Do It with FTP?.doc
Title:	Why Don't We Just Do It with FTP?
Applies To:	Windows or UNIX
Subject:	Polling via FTP or other ad hoc solutions
Keywords:	ACM MLINK polling FTP copy
Author:	Paul Neary
Date Created:	28 May 2003
Date Modified:	2 June 2003

Introduction

In these times of tight IT budgets, it becomes tempting to consider any solution that is cheap or free. In addition, the emergence of open software such as Linux, GNU and other "freeware," has engendered an expectation that software should be free, and any solution with a price tag of $0.00 gets serious consideration over costly solutions from traditional software vendors.

As many discover, the old saw that "there is no free lunch," is true more often than not. Many solutions, which are nominally "free," actually wind up costing more because of higher TCO issues such as reliability and manageability.

In this Tech Note, I will examine why there is no free lunch for polling, the transfer of data files between a central server and many remote sites on a regularly scheduled, usually daily, basis. Since G&Z has been selling polling solutions for over a decade now, beginning with the MLINK product family, we've encountered many alternative schemes which have been cooked up to avoid spending hard-earned cash on a polling environment.

While we'll go into the specifics shortly, the nub of the argument for a "homegrown"solution is usually couched in a statement like this:

Why should I have to pay for software that transfers files? FTP transfers file just great. It's fast, and best of all, it's free, or almost free...

At first glance, it's a compelling statement. For computers connected by a persistent LAN or WAN, FTP does an excellent job of transferring files. In years past when many more networks were dial-up, products like CA-MLINK provided, in addition to file transfer, a way to manage the dialing of the modem--something no one really wanted to code in house. As more companies have now moved to persistent connections, the perception is that, with the need to dial gone, a polling product like CA-MLINK loses its reason for being. It's not true.

Preparation-Introducing PollView and MLINK

Before we get into the details, I'd like to explain the buzzwords that you'll see below. If you are familiar with CA-MLINK, CA-ACM, and G&Z's PollView, then you can safely skip to the next section, entitled Two "Low-Budget" Solutions. If not, hang on for the whirlwind tour of the polling solution that we currently offer, called PollView. (More details are available at the PollView Pages of this website)

PollView Components

PollView is the name of a family of closely integrated CA and G&Z products that provide the premier polling solution for a corporation with many remote branches or outlets. The major components of PollView are:

CA-MLINK - the file transfer component. (This provides similar functionality to an FTP client and server, but with some added features.)
CA-ACM - the "brains" of the polling operation. It maintains a database of systems or remote sites, the schedules for contacting those sites, and the list of files to transfer, and processes to execute, when those sites are contacted.
CA-Unicenter - provides a console logging and event notification facility for all polling messages as well as the "2D Map" interface for managing the ACM database.
G & Z PollView Explorer - an alternative to the 2D Map, it provides an explorer-style interface for managing the ACM database.

PollView Terminology

Although these terms are specific to PollView (and ACM), in any polling solution, you will need to consider these kinds of entities:

Sites - these are the systems or remote sites to which files are sent, and from which files are received.
Task Lists - these are the lists of files to transfer to or from the sites. Think of them as scripts that are executed when a site is contacted. A good polling package like PollView will also allow some kind of process or command to be included in a task list. This might be a remote command, which runs on the site, or a local command, which runs at the central polling server.
Sessions - probably more easily thought of as schedules, these provide the timeframe, or time window, during which the sites are contacted and the task lists assigned to them are performed.
Ports - a way to control how many sites are called simultaneously. In the days of dial-up, you simply had one port per modem. With a TCP/IP network, ports lose their correspondence to a concrete object such as a modem, but are still needed to control the amount of work that the polling server tries to do at one time.

Two "Low-Budget" Solutions

Here I sketch out two proposed solutions that use "free" software to provide polling in a TCP/IP network. The first involves using FTP, while the second assumes the ability to use a copy command to transfer files over network shares.

The FTP Solution

The design points of this solution usually run something like this:

We'll run an FTP server on our central host.
At a particular time of the day or night, the remote site will execute a script (e.g. BAT, WSH, shell or Perl). This script might even be kicked off manually, e.g. by some end-of-day process.
That script will contain line commands that will send and/or receive the needed files by using the command-line FTP client available on the system (usually the ftp command).
Just so we get an idea if we've missed any sites, we'll send a dummy "flag file" to a common directory on the central host with a name like SITEnnnnOK.txt, where "nnnn" is the site number or name.
We can probably get this up and running in a week or two.

There is something to note here before we attempt to critique this design. This environment will only support what we refer to as site-initiated polling-a perfectly valid design used by many of our customers. Its complement is host-initiated polling where the central host controls the timing of the file transfer sessions. Each method has its plusses and minuses. Usually a particular customer's business needs will clearly point to the use of one way over the other. The problem with the FTP solution is that it can only support site-initiated polling. If you need host-initiated polling, then you will have to run an FTP server on every remote site, and this may not be practical for resource or security reasons. E.g., you may not wish to run an FTP server on the remote site because it consumes resources needed for other applications, or you do not wish to permit FTP access to those systems.

The `copy` Solution

If host-initiated polling is required, and an FTP server on each remote site is not practical, then a solution that uses a copy command might fit the bill. The proposal for this might be:

On each remote site, we'll share the directory or directories we need access to via Windows or NFS shares.¹
On the central server, we will kick off a script at a certain time of day or night that will go down a list of remote site host names or IP addresses, and call a second script, which will copy files to or from the shared directories.
If we get an error copying a file, we'll write a message to a text log file called Pollmmddyy.txt. You can look at this file to see where the problems were.
We can probably get this up and running in a week or two.

One gigantic assumption here is that the shares are already in place, or they can be created easily. That may not be the case. Creating new shares on hundreds of systems requires either significant manual labor or some scripting. Furthermore, the network infrastructure may not be in place to support shares over the WAN, or shares may be discouraged by network administration or security.

So, What's Wrong with This Picture?

At first glance, there seems to be nothing wrong, at least nothing that can't be fixed by adding a few more bells and whistles to the proposal. But there are dangerous problems that can only be addressed by writing serious amounts of code. Some problems may not be evident until the system is in production or until the number of sites hits a certain threshold. Then you wind up adding more scripts and more code until you find yourself in the embarrassing position of having reinvented the proverbial wheel.

Problems with these designs lurk in three areas:

Features -: the most obvious area: you'll have to give up most of the features of the commercial product. It's easy to dismiss some features at first as unnecessary, until some of these other problem areas are noticed.
Reliability -: a product that has been competing successfully for decades in this market is more battle-hardened than anything that can be developed by even a crack programming staff.
Manageability and Scalability -: these are two sides of the same coin. An ad hoc solution may work reasonably well for a handful of remote sites. However, after about two dozen sites, it starts to become tedious to figure out just what went wrong with sites that did not poll or polled partially, and to re-poll them. At around fifty sites, the task becomes error prone, and after a hundred or so, it starts to consume a significant amount of labor that's not been figured into to the TCO for this application. With many hundreds of sites, the management becomes essentially impossible.

I'll examine each of these areas in detail and point out the pitfalls awaiting in each.²

Problem #1: Features

Two main features of the MLINK file transfer method are the ability to restart an interrupted transfer from the last successful checkpoint (checkpoint-restart) and the ability to compress the data blocks before they are put onto the wire.

Both the FTP protocol (RFC 959) and the Windows copy command provide for a restart capability in the event of a network outage. FTP's restart must be supported by both client and server, but it is up to the client end to declare a restarted transfer. The Windows FTP command line client does not support this feature, nor does the WinInet API, so you would be forced to do some low-level socket coding of the FTP protocol to do this-an unappealing task by any measure. The restart feature for copy (via the /Z option) seems to be easy to use and works well. Unfortunately, use of copy brings extra overhead since your NetBIOS protocol must be encapsulated in TCP/IP over the WAN.

The FTP protocol does provide for an optional compression feature, but it is just a simple Run-Length Encoding (RLE) scheme, but I've never seen it in the wild. The copy command has no compression capabilities.

The feature that might be missed the most is the lack of a remote command capability.³ In a host-initiated environment, this means you cannot execute commands at the remote site as part of the task list. For site-initiated, you cannot issue them at the host. While there are alternatives that might help (e.g. REXEC), their security concerns are serious enough to make them undesirable.

Problem #2: Reliability

MLINK and ACM were originally developed to support dial-up networks. Few things are more unreliable than dialing over POTS lines. You have to deal with interrupted calls, noisy lines, garbled data, and finicky equipment. The success of MLINK and ACM in such an environment is testimony to the reliability of their retry and error-detection abilities.

Yes, a persistent TCP/IP connection is a reliable link, but across a WAN, there are still many bumps: downed links and routers, propagation delay, and congestion. Moreover, there are always potential hardware or software problems at the remote site to consider. Any homegrown solution must still be designed with some recovery and retry features, so the developers of this solution must ask some hard questions, such as:

How many times do I retry a site, and how long do I wait in between tries?
If a task list was interrupted, do I pick up from where I left off, or must I start at the beginning?
If a file I need is not there, can I try to get it later on?
If a file I need is not there, can I do something else, like send an email?
Can I manually recall just the sites that failed easily?

As these questions are answered, the developers will start to realize that the devil truly is in the details, and their breezy first estimates need some significant revision. Worse, testing reliability is difficult, and sometimes it's only in production that flaws in the retry logic are noticed.

Problem #3: Manageability and Scalability

Here's where the homegrown solution finally meets its Waterloo. Once the number of remote sites reaches a score or so, it becomes a nuisance to figure out how well polling went yesterday or last night. It also becomes difficult to track how a particular site did over the course of a week, or a month. Once you're into the hundreds of sites range, it becomes vital to have some reporting capability. Many fine reporting products can be purchased, and some will even feed off "flat" ASCII text or CSV files, but for a critical application, a database product is required, such as Access or SQL Server. Regardless of the superb tools available (e.g. Visual Basic), this substantially increases the amount of development required.

Aside from the reporting needs, with hundreds of sites, the control and scheduling becomes onerous as well. A large corporation likely has sites dispersed over many time zones, and to properly schedule host-initiated polling, the sites should be grouped accordingly so they can be scheduled as a unit. When reviewing the polling just completed, a way to quickly identify and isolate the sites that failed is critical.

From an operational aspect, some logging needs to be provided so that a person who is supposedly monitoring the polling process can have an idea of what is going on. Otherwise, that operator can only look for files to show up in a particular directory, seemingly by magic. Of course the low-budget way is to just write to that good, old ASCII text file, and let the operator view it in Notepad, but that's too primitive to consider.

One scalability issue is applicable only to site-initiated polling: there is no way to control how many sites are simultaneously sending files to the host. E.g. if most branches close around 9:00PM local time, then there will be a tsunami of connections clustered around 9:00PM, and then every hour after as sites in the western time zones hit their closing times.

Probably the worst management issue though is the way task lists are built and maintained. There are two problems here: the way files are named when collected into a central location, and how changes to a task list are propagated to all sites, or a large group of sites.

The first problem is illustrated by this common situation: from every site you receive a particular file, say it's SALES.DAT, into a common directory on the polling server, say C:\SALES. Obviously, they cannot all be saved at the host under the same name, so you need to rename them as they arrive at the host, probably with a site number or name in the filename, e.g. nnnn.DAT. This means that the task last for each site must be unique. There is no practical way to build unique lists for each site, so your scripting must now include a capability of generating a unique task list on the fly, given a site number or name.

So then what happens when you want to, say, start sending a new file to each site every day? You need to change the task lists for every site. If you're using host-initiated polling, this may not be a problem, assuming you're using the same task list or prototype task list for all sites. For site-initiated, this could be impossible, if you have to replace the task list on each remote site. In this case, some auto-update feature would have to be built in at the start to accommodate this kind of change.

Conclusion

At this point, it should be clear that many hidden requirements lurk within the apparently quick and easy designs mentioned above. It's not a "week or two" deal. There is no free lunch. You're faced with the classic Build or Buy? decision:

Can your programmers rise to the challenge and produce a system tailor-made for your corporation that meets all your needs and winds up costing less than buying a polling solution from G & Z
Are the programming resources available to this project now?
Are you willing to endure the growing pains of a new product in your environment?
Can you commit to the open-ended support and maintenance of this code?
Are you prepared to lose some sleep?

It might be that you can answer "Yes," to all of the above. But if not, consider that at G&Z we've been implementing, customizing, developing, and designing polling solutions for companies like yours over the last decade, and we can deliver a solution for you that will meet your needs and be cost-effective.

1 Depending on whether this is a Windows or UNIX environment.
2 To simplify the discussion a bit I will assume a Windows network in the examples.
3 FTP does support a small set of remote commands: CHDIR, MKDIR, RMDIR, DEL, DIR, and RENAME

This document © Copyright 2003 G & Z Systems, Inc., is intended for use solely by owners of legally licensed copies of the above mentioned software. No part of this documentation may be copied, photocopied, reproduced, translated, microfilmed, or otherwise duplicated without the express consent of G & Z Systems, Inc. Please email us at info@g-and-z.com for permissions. All product and service names listed on this and any other pages of this site are either registered trademarks or service marks or common law trademarks or service marks of Computer Associates International, Inc., or G & Z Systems, Inc.,. All other product names referenced herein are trademarks or service marks of their respective companies.

No HOME button on the left side of your page?
Click here for Our Home Page