Speed
Often, when dealing with large amounts of data, the transfer speed will be important. Here are some things to consider before walking off into the sun:
- Where possible, compress the data you are working with. If you’re mainly dealing with JPEGs, or other pre-compressed data this will be a waste of time and will slow down your exporting systems, but for raw data could save a lot of time. Unlike HTTP, SCP and SFTP do not have the ability to compress your data for you on the fly, and you will need to decide the best approach.
- Ensure that there is sufficient bandwidth at both endpoints to process the amount of data you need in the time required.
- Decide whether you are implementing a continuous or ‘batch mode’ process. If you have a lot of data to transfer, you may not be able to afford the downtime between batch transmissions.
- Consider whether you will allow concurrent transfers. If you send each file when it is ready (continuous) or your previous batch process has not finished by the time the next has begun, you may be fractionalising your available bandwidth and making each file take many times longer than it should. Implementing a queuing system (LIFO or FIFO) for your transfers may cause some files to arrive a little later, but at a time more predictable at the point of export. Using this method, you can advise a customer on when their changes will be ‘implemented’ by checking the queue position. Concurrent transfers can also create locking and race hazard issues (see further).
Locking down your Data
Security comes as a given with SCP and FTP-over-SSH, but to automate these processes you need to share keys first- not a major job, but another point of failure. UNIX systems are designed to reject key usage if the user credentials on the key file are too permissive.
To confuse things even further, SFTP comes in different flavours, as the concept was made concurrently by different providers. If you are working with a company who claim to support SFTP, make sure it is FTP-over-SSH, as all other forms are proprietary and will be tricky to navigate.
You should also make sure that if you are giving users access to file drop locations, then you have taken suitable steps to stop them ‘traversing’ the filesystem. On UNIX systems, this can be accomplished by dropping them into a chowned or FreeBSD Jail environment. This is a small virtual operating system, within the host system. Even if the user can traverse their filesystem, they will find nothing but an basic set of distribution binaries.
In all methods of security you will need to decide how strict your key checking will be. Most processes will totally fail if the host keys have changed (a symptom of reformatting a server). However, many systems will also die if the certificates you are using have expired.
Testing your Import/Export System
As discussed earlier in this article, there are myriad factors in testing a full import/export system. Some other general factors to consider are:
- Check that the system is not overly reliant on DNS. If your import server relies on resolving DNS for the system that is connecting, you could experience delays or transfer failures.
- Don’t just test on your development system. If the process is indeed critical, any changes will need to be tested on a live system. Make sure your import process can understand that the data is not to be used - and obviously don’t ever use real credit card numbers or data that could confuse system users.
Race Hazards and Error Factors
Again, draw flowcharts for the whole process, and ask yourself if you’ve dealt with the following cases:
- A customer uploads a supposedly unique file with an identical file hash as one previously received. If your customer call center sends you two identical sales files in one day, then without further checking, each one of those customers will receive twice as many products and be charged twice as much as they expected. The call center thought they were doing the ‘right thing’ as the transfer showed their was an error. In reality, your security certificate had expired, but the transfer continued anyway.
- A customer uploads a blank file.
- A customer uploads a file with the majority of records being correct, but two that can’t be read properly, or with some records that do not contain all the data you were expecting. Do you dismiss the file as a whole, or only import the records available to you?
- Technologies and Protocols
- Caveats and Sanitation
- Speed and Security