• Main Page
  • Related Pages
  • Modules
  • Namespaces
  • Data Structures
  • Files
  • File List
  • Globals

contrib/zlib/examples/gzlog.c

00001 /*
00002  * gzlog.c
00003  * Copyright (C) 2004, 2008 Mark Adler, all rights reserved
00004  * For conditions of distribution and use, see copyright notice in gzlog.h
00005  * version 2.0, 25 Apr 2008
00006  */
00007 
00008 /*
00009    gzlog provides a mechanism for frequently appending short strings to a gzip
00010    file that is efficient both in execution time and compression ratio.  The
00011    strategy is to write the short strings in an uncompressed form to the end of
00012    the gzip file, only compressing when the amount of uncompressed data has
00013    reached a given threshold.
00014 
00015    gzlog also provides protection against interruptions in the process due to
00016    system crashes.  The status of the operation is recorded in an extra field
00017    in the gzip file, and is only updated once the gzip file is brought to a
00018    valid state.  The last data to be appended or compressed is saved in an
00019    auxiliary file, so that if the operation is interrupted, it can be completed
00020    the next time an append operation is attempted.
00021 
00022    gzlog maintains another auxiliary file with the last 32K of data from the
00023    compressed portion, which is preloaded for the compression of the subsequent
00024    data.  This minimizes the impact to the compression ratio of appending.
00025  */
00026 
00027 /*
00028    Operations Concept:
00029 
00030    Files (log name "foo"):
00031    foo.gz -- gzip file with the complete log
00032    foo.add -- last message to append or last data to compress
00033    foo.dict -- dictionary of the last 32K of data for next compression
00034    foo.temp -- temporary dictionary file for compression after this one
00035    foo.lock -- lock file for reading and writing the other files
00036    foo.repairs -- log file for log file recovery operations (not compressed)
00037 
00038    gzip file structure:
00039    - fixed-length (no file name) header with extra field (see below)
00040    - compressed data ending initially with empty stored block
00041    - uncompressed data filling out originally empty stored block and
00042      subsequent stored blocks as needed (16K max each)
00043    - gzip trailer
00044    - no junk at end (no other gzip streams)
00045 
00046    When appending data, the information in the first three items above plus the
00047    foo.add file are sufficient to recover an interrupted append operation.  The
00048    extra field has the necessary information to restore the start of the last
00049    stored block and determine where to append the data in the foo.add file, as
00050    well as the crc and length of the gzip data before the append operation.
00051 
00052    The foo.add file is created before the gzip file is marked for append, and
00053    deleted after the gzip file is marked as complete.  So if the append
00054    operation is interrupted, the data to add will still be there.  If due to
00055    some external force, the foo.add file gets deleted between when the append
00056    operation was interrupted and when recovery is attempted, the gzip file will
00057    still be restored, but without the appended data.
00058 
00059    When compressing data, the information in the first two items above plus the
00060    foo.add file are sufficient to recover an interrupted compress operation.
00061    The extra field has the necessary information to find the end of the
00062    compressed data, and contains both the crc and length of just the compressed
00063    data and of the complete set of data including the contents of the foo.add
00064    file.
00065 
00066    Again, the foo.add file is maintained during the compress operation in case
00067    of an interruption.  If in the unlikely event the foo.add file with the data
00068    to be compressed is missing due to some external force, a gzip file with
00069    just the previous compressed data will be reconstructed.  In this case, all
00070    of the data that was to be compressed is lost (approximately one megabyte).
00071    This will not occur if all that happened was an interruption of the compress
00072    operation.
00073 
00074    The third state that is marked is the replacement of the old dictionary with
00075    the new dictionary after a compress operation.  Once compression is
00076    complete, the gzip file is marked as being in the replace state.  This
00077    completes the gzip file, so an interrupt after being so marked does not
00078    result in recompression.  Then the dictionary file is replaced, and the gzip
00079    file is marked as completed.  This state prevents the possibility of
00080    restarting compression with the wrong dictionary file.
00081 
00082    All three operations are wrapped by a lock/unlock procedure.  In order to
00083    gain exclusive access to the log files, first a foo.lock file must be
00084    exclusively created.  When all operations are complete, the lock is
00085    released by deleting the foo.lock file.  If when attempting to create the
00086    lock file, it already exists and the modify time of the lock file is more
00087    than five minutes old (set by the PATIENCE define below), then the old
00088    lock file is considered stale and deleted, and the exclusive creation of
00089    the lock file is retried.  To assure that there are no false assessments
00090    of the staleness of the lock file, the operations periodically touch the
00091    lock file to update the modified date.
00092 
00093    Following is the definition of the extra field with all of the information
00094    required to enable the above append and compress operations and their
00095    recovery if interrupted.  Multi-byte values are stored little endian
00096    (consistent with the gzip format).  File pointers are eight bytes long.
00097    The crc's and lengths for the gzip trailer are four bytes long.  (Note that
00098    the length at the end of a gzip file is used for error checking only, and
00099    for large files is actually the length modulo 2^32.)  The stored block
00100    length is two bytes long.  The gzip extra field two-byte identification is
00101    "ap" for append.  It is assumed that writing the extra field to the file is
00102    an "atomic" operation.  That is, either all of the extra field is written
00103    to the file, or none of it is, if the operation is interrupted right at the
00104    point of updating the extra field.  This is a reasonable assumption, since
00105    the extra field is within the first 52 bytes of the file, which is smaller
00106    than any expected block size for a mass storage device (usually 512 bytes or
00107    larger).
00108 
00109    Extra field (35 bytes):
00110    - Pointer to first stored block length -- this points to the two-byte length
00111      of the first stored block, which is followed by the two-byte, one's
00112      complement of that length.  The stored block length is preceded by the
00113      three-bit header of the stored block, which is the actual start of the
00114      stored block in the deflate format.  See the bit offset field below.
00115    - Pointer to the last stored block length.  This is the same as above, but
00116      for the last stored block of the uncompressed data in the gzip file.
00117      Initially this is the same as the first stored block length pointer.
00118      When the stored block gets to 16K (see the MAX_STORE define), then a new
00119      stored block as added, at which point the last stored block length pointer
00120      is different from the first stored block length pointer.  When they are
00121      different, the first bit of the last stored block header is eight bits, or
00122      one byte back from the block length.
00123    - Compressed data crc and length.  This is the crc and length of the data
00124      that is in the compressed portion of the deflate stream.  These are used
00125      only in the event that the foo.add file containing the data to compress is
00126      lost after a compress operation is interrupted.
00127    - Total data crc and length.  This is the crc and length of all of the data
00128      stored in the gzip file, compressed and uncompressed.  It is used to
00129      reconstruct the gzip trailer when compressing, as well as when recovering
00130      interrupted operations.
00131    - Final stored block length.  This is used to quickly find where to append,
00132      and allows the restoration of the original final stored block state when
00133      an append operation is interrupted.
00134    - First stored block start as the number of bits back from the final stored
00135      block first length byte.  This value is in the range of 3..10, and is
00136      stored as the low three bits of the final byte of the extra field after
00137      subtracting three (0..7).  This allows the last-block bit of the stored
00138      block header to be updated when a new stored block is added, for the case
00139      when the first stored block and the last stored block are the same.  (When
00140      they are different, the numbers of bits back is known to be eight.)  This
00141      also allows for new compressed data to be appended to the old compressed
00142      data in the compress operation, overwriting the previous first stored
00143      block, or for the compressed data to be terminated and a valid gzip file
00144      reconstructed on the off chance that a compression operation was
00145      interrupted and the data to compress in the foo.add file was deleted.
00146    - The operation in process.  This is the next two bits in the last byte (the
00147      bits under the mask 0x18).  The are interpreted as 0: nothing in process,
00148      1: append in process, 2: compress in process, 3: replace in process.
00149    - The top three bits of the last byte in the extra field are reserved and
00150      are currently set to zero.
00151 
00152    Main procedure:
00153    - Exclusively create the foo.lock file using the O_CREAT and O_EXCL modes of
00154      the system open() call.  If the modify time of an existing lock file is
00155      more than PATIENCE seconds old, then the lock file is deleted and the
00156      exclusive create is retried.
00157    - Load the extra field from the foo.gz file, and see if an operation was in
00158      progress but not completed.  If so, apply the recovery procedure below.
00159    - Perform the append procedure with the provided data.
00160    - If the uncompressed data in the foo.gz file is 1MB or more, apply the
00161      compress procedure.
00162    - Delete the foo.lock file.
00163 
00164    Append procedure:
00165    - Put what to append in the foo.add file so that the operation can be
00166      restarted if this procedure is interrupted.
00167    - Mark the foo.gz extra field with the append operation in progress.
00168    + Restore the original last-block bit and stored block length of the last
00169      stored block from the information in the extra field, in case a previous
00170      append operation was interrupted.
00171    - Append the provided data to the last stored block, creating new stored
00172      blocks as needed and updating the stored blocks last-block bits and
00173      lengths.
00174    - Update the crc and length with the new data, and write the gzip trailer.
00175    - Write over the extra field (with a single write operation) with the new
00176      pointers, lengths, and crc's, and mark the gzip file as not in process.
00177      Though there is still a foo.add file, it will be ignored since nothing
00178      is in process.  If a foo.add file is leftover from a previously
00179      completed operation, it is truncated when writing new data to it.
00180    - Delete the foo.add file.
00181 
00182    Compress and replace procedures:
00183    - Read all of the uncompressed data in the stored blocks in foo.gz and write
00184      it to foo.add.  Also write foo.temp with the last 32K of that data to
00185      provide a dictionary for the next invocation of this procedure.
00186    - Rewrite the extra field marking foo.gz with a compression in process.
00187    * If there is no data provided to compress (due to a missing foo.add file
00188      when recovering), reconstruct and truncate the foo.gz file to contain
00189      only the previous compressed data and proceed to the step after the next
00190      one.  Otherwise ...
00191    - Compress the data with the dictionary in foo.dict, and write to the
00192      foo.gz file starting at the bit immediately following the last previously
00193      compressed block.  If there is no foo.dict, proceed anyway with the
00194      compression at slightly reduced efficiency.  (For the foo.dict file to be
00195      missing requires some external failure beyond simply the interruption of
00196      a compress operation.)  During this process, the foo.lock file is
00197      periodically touched to assure that that file is not considered stale by
00198      another process before we're done.  The deflation is terminated with a
00199      non-last empty static block (10 bits long), that is then located and
00200      written over by a last-bit-set empty stored block.
00201    - Append the crc and length of the data in the gzip file (previously
00202      calculated during the append operations).
00203    - Write over the extra field with the updated stored block offsets, bits
00204      back, crc's, and lengths, and mark foo.gz as in process for a replacement
00205      of the dictionary.
00206    @ Delete the foo.add file.
00207    - Replace foo.dict with foo.temp.
00208    - Write over the extra field, marking foo.gz as complete.
00209 
00210    Recovery procedure:
00211    - If not a replace recovery, read in the foo.add file, and provide that data
00212      to the appropriate recovery below.  If there is no foo.add file, provide
00213      a zero data length to the recovery.  In that case, the append recovery
00214      restores the foo.gz to the previous compressed + uncompressed data state.
00215      For the the compress recovery, a missing foo.add file results in foo.gz
00216      being restored to the previous compressed-only data state.
00217    - Append recovery:
00218      - Pick up append at + step above
00219    - Compress recovery:
00220      - Pick up compress at * step above
00221    - Replace recovery:
00222      - Pick up compress at @ step above
00223    - Log the repair with a date stamp in foo.repairs
00224  */
00225 
00226 #include <sys/types.h>
00227 #include <stdio.h>      /* rename, fopen, fprintf, fclose */
00228 #include <stdlib.h>     /* malloc, free */
00229 #include <string.h>     /* strlen, strrchr, strcpy, strncpy, strcmp */
00230 #include <fcntl.h>      /* open */
00231 #include <unistd.h>     /* lseek, read, write, close, unlink, sleep, */
00232                         /* ftruncate, fsync */
00233 #include <errno.h>      /* errno */
00234 #include <time.h>       /* time, ctime */
00235 #include <sys/stat.h>   /* stat */
00236 #include <sys/time.h>   /* utimes */
00237 #include "zlib.h"       /* crc32 */
00238 
00239 #include "gzlog.h"      /* header for external access */
00240 
00241 #define local static
00242 typedef unsigned int uint;
00243 typedef unsigned long ulong;
00244 
00245 /* Macro for debugging to deterministically force recovery operations */
00246 #ifdef DEBUG
00247     #include <setjmp.h>         /* longjmp */
00248     jmp_buf gzlog_jump;         /* where to go back to */
00249     int gzlog_bail = 0;         /* which point to bail at (1..8) */
00250     int gzlog_count = -1;       /* number of times through to wait */
00251 #   define BAIL(n) do { if (n == gzlog_bail && gzlog_count-- == 0) \
00252                             longjmp(gzlog_jump, gzlog_bail); } while (0)
00253 #else
00254 #   define BAIL(n)
00255 #endif
00256 
00257 /* how old the lock file can be in seconds before considering it stale */
00258 #define PATIENCE 300
00259 
00260 /* maximum stored block size in Kbytes -- must be in 1..63 */
00261 #define MAX_STORE 16
00262 
00263 /* number of stored Kbytes to trigger compression (must be >= 32 to allow
00264    dictionary construction, and <= 204 * MAX_STORE, in order for >> 10 to
00265    discard the stored block headers contribution of five bytes each) */
00266 #define TRIGGER 1024
00267 
00268 /* size of a deflate dictionary (this cannot be changed) */
00269 #define DICT 32768U
00270 
00271 /* values for the operation (2 bits) */
00272 #define NO_OP 0
00273 #define APPEND_OP 1
00274 #define COMPRESS_OP 2
00275 #define REPLACE_OP 3
00276 
00277 /* macros to extract little-endian integers from an unsigned byte buffer */
00278 #define PULL2(p) ((p)[0]+((uint)((p)[1])<<8))
00279 #define PULL4(p) (PULL2(p)+((ulong)PULL2(p+2)<<16))
00280 #define PULL8(p) (PULL4(p)+((off_t)PULL4(p+4)<<32))
00281 
00282 /* macros to store integers into a byte buffer in little-endian order */
00283 #define PUT2(p,a) do {(p)[0]=a;(p)[1]=(a)>>8;} while(0)
00284 #define PUT4(p,a) do {PUT2(p,a);PUT2(p+2,a>>16);} while(0)
00285 #define PUT8(p,a) do {PUT4(p,a);PUT4(p+4,a>>32);} while(0)
00286 
00287 /* internal structure for log information */
00288 #define LOGID "\106\035\172"    /* should be three non-zero characters */
00289 struct log {
00290     char id[4];     /* contains LOGID to detect inadvertent overwrites */
00291     int fd;         /* file descriptor for .gz file, opened read/write */
00292     char *path;     /* allocated path, e.g. "/var/log/foo" or "foo" */
00293     char *end;      /* end of path, for appending suffices such as ".gz" */
00294     off_t first;    /* offset of first stored block first length byte */
00295     int back;       /* location of first block id in bits back from first */
00296     uint stored;    /* bytes currently in last stored block */
00297     off_t last;     /* offset of last stored block first length byte */
00298     ulong ccrc;     /* crc of compressed data */
00299     ulong clen;     /* length (modulo 2^32) of compressed data */
00300     ulong tcrc;     /* crc of total data */
00301     ulong tlen;     /* length (modulo 2^32) of total data */
00302     time_t lock;    /* last modify time of our lock file */
00303 };
00304 
00305 /* gzip header for gzlog */
00306 local unsigned char log_gzhead[] = {
00307     0x1f, 0x8b,                 /* magic gzip id */
00308     8,                          /* compression method is deflate */
00309     4,                          /* there is an extra field (no file name) */
00310     0, 0, 0, 0,                 /* no modification time provided */
00311     0, 0xff,                    /* no extra flags, no OS specified */
00312     39, 0, 'a', 'p', 35, 0      /* extra field with "ap" subfield */
00313                                 /* 35 is EXTRA, 39 is EXTRA + 4 */
00314 };
00315 
00316 #define HEAD sizeof(log_gzhead)     /* should be 16 */
00317 
00318 /* initial gzip extra field content (52 == HEAD + EXTRA + 1) */
00319 local unsigned char log_gzext[] = {
00320     52, 0, 0, 0, 0, 0, 0, 0,    /* offset of first stored block length */
00321     52, 0, 0, 0, 0, 0, 0, 0,    /* offset of last stored block length */
00322     0, 0, 0, 0, 0, 0, 0, 0,     /* compressed data crc and length */
00323     0, 0, 0, 0, 0, 0, 0, 0,     /* total data crc and length */
00324     0, 0,                       /* final stored block data length */
00325     5                           /* op is NO_OP, last bit 8 bits back */
00326 };
00327 
00328 #define EXTRA sizeof(log_gzext)     /* should be 35 */
00329 
00330 /* initial gzip data and trailer */
00331 local unsigned char log_gzbody[] = {
00332     1, 0, 0, 0xff, 0xff,        /* empty stored block (last) */
00333     0, 0, 0, 0,                 /* crc */
00334     0, 0, 0, 0                  /* uncompressed length */
00335 };
00336 
00337 #define BODY sizeof(log_gzbody)
00338 
00339 /* Exclusively create foo.lock in order to negotiate exclusive access to the
00340    foo.* files.  If the modify time of an existing lock file is greater than
00341    PATIENCE seconds in the past, then consider the lock file to have been
00342    abandoned, delete it, and try the exclusive create again.  Save the lock
00343    file modify time for verification of ownership.  Return 0 on success, or -1
00344    on failure, usually due to an access restriction or invalid path.  Note that
00345    if stat() or unlink() fails, it may be due to another process noticing the
00346    abandoned lock file a smidge sooner and deleting it, so those are not
00347    flagged as an error. */
00348 local int log_lock(struct log *log)
00349 {
00350     int fd;
00351     struct stat st;
00352 
00353     strcpy(log->end, ".lock");
00354     while ((fd = open(log->path, O_CREAT | O_EXCL, 0644)) < 0) {
00355         if (errno != EEXIST)
00356             return -1;
00357         if (stat(log->path, &st) == 0 && time(NULL) - st.st_mtime > PATIENCE) {
00358             unlink(log->path);
00359             continue;
00360         }
00361         sleep(2);       /* relinquish the CPU for two seconds while waiting */
00362     }
00363     close(fd);
00364     if (stat(log->path, &st) == 0)
00365         log->lock = st.st_mtime;
00366     return 0;
00367 }
00368 
00369 /* Update the modify time of the lock file to now, in order to prevent another
00370    task from thinking that the lock is stale.  Save the lock file modify time
00371    for verification of ownership. */
00372 local void log_touch(struct log *log)
00373 {
00374     struct stat st;
00375 
00376     strcpy(log->end, ".lock");
00377     utimes(log->path, NULL);
00378     if (stat(log->path, &st) == 0)
00379         log->lock = st.st_mtime;
00380 }
00381 
00382 /* Check the log file modify time against what is expected.  Return true if
00383    this is not our lock.  If it is our lock, touch it to keep it. */
00384 local int log_check(struct log *log)
00385 {
00386     struct stat st;
00387 
00388     strcpy(log->end, ".lock");
00389     if (stat(log->path, &st) || st.st_mtime != log->lock)
00390         return 1;
00391     log_touch(log);
00392     return 0;
00393 }
00394 
00395 /* Unlock a previously acquired lock, but only if it's ours. */
00396 local void log_unlock(struct log *log)
00397 {
00398     if (log_check(log))
00399         return;
00400     strcpy(log->end, ".lock");
00401     unlink(log->path);
00402     log->lock = 0;
00403 }
00404 
00405 /* Check the gzip header and read in the extra field, filling in the values in
00406    the log structure.  Return op on success or -1 if the gzip header was not as
00407    expected.  op is the current operation in progress last written to the extra
00408    field.  This assumes that the gzip file has already been opened, with the
00409    file descriptor log->fd. */
00410 local int log_head(struct log *log)
00411 {
00412     int op;
00413     unsigned char buf[HEAD + EXTRA];
00414 
00415     if (lseek(log->fd, 0, SEEK_SET) < 0 ||
00416         read(log->fd, buf, HEAD + EXTRA) != HEAD + EXTRA ||
00417         memcmp(buf, log_gzhead, HEAD)) {
00418         return -1;
00419     }
00420     log->first = PULL8(buf + HEAD);
00421     log->last = PULL8(buf + HEAD + 8);
00422     log->ccrc = PULL4(buf + HEAD + 16);
00423     log->clen = PULL4(buf + HEAD + 20);
00424     log->tcrc = PULL4(buf + HEAD + 24);
00425     log->tlen = PULL4(buf + HEAD + 28);
00426     log->stored = PULL2(buf + HEAD + 32);
00427     log->back = 3 + (buf[HEAD + 34] & 7);
00428     op = (buf[HEAD + 34] >> 3) & 3;
00429     return op;
00430 }
00431 
00432 /* Write over the extra field contents, marking the operation as op.  Use fsync
00433    to assure that the device is written to, and in the requested order.  This
00434    operation, and only this operation, is assumed to be atomic in order to
00435    assure that the log is recoverable in the event of an interruption at any
00436    point in the process.  Return -1 if the write to foo.gz failed. */
00437 local int log_mark(struct log *log, int op)
00438 {
00439     int ret;
00440     unsigned char ext[EXTRA];
00441 
00442     PUT8(ext, log->first);
00443     PUT8(ext + 8, log->last);
00444     PUT4(ext + 16, log->ccrc);
00445     PUT4(ext + 20, log->clen);
00446     PUT4(ext + 24, log->tcrc);
00447     PUT4(ext + 28, log->tlen);
00448     PUT2(ext + 32, log->stored);
00449     ext[34] = log->back - 3 + (op << 3);
00450     fsync(log->fd);
00451     ret = lseek(log->fd, HEAD, SEEK_SET) < 0 ||
00452           write(log->fd, ext, EXTRA) != EXTRA ? -1 : 0;
00453     fsync(log->fd);
00454     return ret;
00455 }
00456 
00457 /* Rewrite the last block header bits and subsequent zero bits to get to a byte
00458    boundary, setting the last block bit if last is true, and then write the
00459    remainder of the stored block header (length and one's complement).  Leave
00460    the file pointer after the end of the last stored block data.  Return -1 if
00461    there is a read or write failure on the foo.gz file */
00462 local int log_last(struct log *log, int last)
00463 {
00464     int back, len, mask;
00465     unsigned char buf[6];
00466 
00467     /* determine the locations of the bytes and bits to modify */
00468     back = log->last == log->first ? log->back : 8;
00469     len = back > 8 ? 2 : 1;                 /* bytes back from log->last */
00470     mask = 0x80 >> ((back - 1) & 7);        /* mask for block last-bit */
00471 
00472     /* get the byte to modify (one or two back) into buf[0] -- don't need to
00473        read the byte if the last-bit is eight bits back, since in that case
00474        the entire byte will be modified */
00475     buf[0] = 0;
00476     if (back != 8 && (lseek(log->fd, log->last - len, SEEK_SET) < 0 ||
00477                       read(log->fd, buf, 1) != 1))
00478         return -1;
00479 
00480     /* change the last-bit of the last stored block as requested -- note
00481        that all bits above the last-bit are set to zero, per the type bits
00482        of a stored block being 00 and per the convention that the bits to
00483        bring the stream to a byte boundary are also zeros */
00484     buf[1] = 0;
00485     buf[2 - len] = (*buf & (mask - 1)) + (last ? mask : 0);
00486 
00487     /* write the modified stored block header and lengths, move the file
00488        pointer to after the last stored block data */
00489     PUT2(buf + 2, log->stored);
00490     PUT2(buf + 4, log->stored ^ 0xffff);
00491     return lseek(log->fd, log->last - len, SEEK_SET) < 0 ||
00492            write(log->fd, buf + 2 - len, len + 4) != len + 4 ||
00493            lseek(log->fd, log->stored, SEEK_CUR) < 0 ? -1 : 0;
00494 }
00495 
00496 /* Append len bytes from data to the locked and open log file.  len may be zero
00497    if recovering and no .add file was found.  In that case, the previous state
00498    of the foo.gz file is restored.  The data is appended uncompressed in
00499    deflate stored blocks.  Return -1 if there was an error reading or writing
00500    the foo.gz file. */
00501 local int log_append(struct log *log, unsigned char *data, size_t len)
00502 {
00503     uint put;
00504     off_t end;
00505     unsigned char buf[8];
00506 
00507     /* set the last block last-bit and length, in case recovering an
00508        interrupted append, then position the file pointer to append to the
00509        block */
00510     if (log_last(log, 1))
00511         return -1;
00512 
00513     /* append, adding stored blocks and updating the offset of the last stored
00514        block as needed, and update the total crc and length */
00515     while (len) {
00516         /* append as much as we can to the last block */
00517         put = (MAX_STORE << 10) - log->stored;
00518         if (put > len)
00519             put = (uint)len;
00520         if (put) {
00521             if (write(log->fd, data, put) != put)
00522                 return -1;
00523             BAIL(1);
00524             log->tcrc = crc32(log->tcrc, data, put);
00525             log->tlen += put;
00526             log->stored += put;
00527             data += put;
00528             len -= put;
00529         }
00530 
00531         /* if we need to, add a new empty stored block */
00532         if (len) {
00533             /* mark current block as not last */
00534             if (log_last(log, 0))
00535                 return -1;
00536 
00537             /* point to new, empty stored block */
00538             log->last += 4 + log->stored + 1;
00539             log->stored = 0;
00540         }
00541 
00542         /* mark last block as last, update its length */
00543         if (log_last(log, 1))
00544             return -1;
00545         BAIL(2);
00546     }
00547 
00548     /* write the new crc and length trailer, and truncate just in case (could
00549        be recovering from partial append with a missing foo.add file) */
00550     PUT4(buf, log->tcrc);
00551     PUT4(buf + 4, log->tlen);
00552     if (write(log->fd, buf, 8) != 8 ||
00553         (end = lseek(log->fd, 0, SEEK_CUR)) < 0 || ftruncate(log->fd, end))
00554         return -1;
00555 
00556     /* write the extra field, marking the log file as done, delete .add file */
00557     if (log_mark(log, NO_OP))
00558         return -1;
00559     strcpy(log->end, ".add");
00560     unlink(log->path);          /* ignore error, since may not exist */
00561     return 0;
00562 }
00563 
00564 /* Replace the foo.dict file with the foo.temp file.  Also delete the foo.add
00565    file, since the compress operation may have been interrupted before that was
00566    done.  Returns 1 if memory could not be allocated, or -1 if reading or
00567    writing foo.gz fails, or if the rename fails for some reason other than
00568    foo.temp not existing.  foo.temp not existing is a permitted error, since
00569    the replace operation may have been interrupted after the rename is done,
00570    but before foo.gz is marked as complete. */
00571 local int log_replace(struct log *log)
00572 {
00573     int ret;
00574     char *dest;
00575 
00576     /* delete foo.add file */
00577     strcpy(log->end, ".add");
00578     unlink(log->path);         /* ignore error, since may not exist */
00579     BAIL(3);
00580 
00581     /* rename foo.name to foo.dict, replacing foo.dict if it exists */
00582     strcpy(log->end, ".dict");
00583     dest = malloc(strlen(log->path) + 1);
00584     if (dest == NULL)
00585         return -2;
00586     strcpy(dest, log->path);
00587     strcpy(log->end, ".temp");
00588     ret = rename(log->path, dest);
00589     free(dest);
00590     if (ret && errno != ENOENT)
00591         return -1;
00592     BAIL(4);
00593 
00594     /* mark the foo.gz file as done */
00595     return log_mark(log, NO_OP);
00596 }
00597 
00598 /* Compress the len bytes at data and append the compressed data to the
00599    foo.gz deflate data immediately after the previous compressed data.  This
00600    overwrites the previous uncompressed data, which was stored in foo.add
00601    and is the data provided in data[0..len-1].  If this operation is
00602    interrupted, it picks up at the start of this routine, with the foo.add
00603    file read in again.  If there is no data to compress (len == 0), then we
00604    simply terminate the foo.gz file after the previously compressed data,
00605    appending a final empty stored block and the gzip trailer.  Return -1 if
00606    reading or writing the log.gz file failed, or -2 if there was a memory
00607    allocation failure. */
00608 local int log_compress(struct log *log, unsigned char *data, size_t len)
00609 {
00610     int fd;
00611     uint got, max;
00612     ssize_t dict;
00613     off_t end;
00614     z_stream strm;
00615     unsigned char buf[DICT];
00616 
00617     /* compress and append compressed data */
00618     if (len) {
00619         /* set up for deflate, allocating memory */
00620         strm.zalloc = Z_NULL;
00621         strm.zfree = Z_NULL;
00622         strm.opaque = Z_NULL;
00623         if (deflateInit2(&strm, Z_DEFAULT_COMPRESSION, Z_DEFLATED, -15, 8,
00624                          Z_DEFAULT_STRATEGY) != Z_OK)
00625             return -2;
00626 
00627         /* read in dictionary (last 32K of data that was compressed) */
00628         strcpy(log->end, ".dict");
00629         fd = open(log->path, O_RDONLY, 0);
00630         if (fd >= 0) {
00631             dict = read(fd, buf, DICT);
00632             close(fd);
00633             if (dict < 0) {
00634                 deflateEnd(&strm);
00635                 return -1;
00636             }
00637             if (dict)
00638                 deflateSetDictionary(&strm, buf, (uint)dict);
00639         }
00640         log_touch(log);
00641 
00642         /* prime deflate with last bits of previous block, position write
00643            pointer to write those bits and overwrite what follows */
00644         if (lseek(log->fd, log->first - (log->back > 8 ? 2 : 1),
00645                 SEEK_SET) < 0 ||
00646             read(log->fd, buf, 1) != 1 || lseek(log->fd, -1, SEEK_CUR) < 0) {
00647             deflateEnd(&strm);
00648             return -1;
00649         }
00650         deflatePrime(&strm, (8 - log->back) & 7, *buf);
00651 
00652         /* compress, finishing with a partial non-last empty static block */
00653         strm.next_in = data;
00654         max = (((uint)0 - 1) >> 1) + 1; /* in case int smaller than size_t */
00655         do {
00656             strm.avail_in = len > max ? max : (uint)len;
00657             len -= strm.avail_in;
00658             do {
00659                 strm.avail_out = DICT;
00660                 strm.next_out = buf;
00661                 deflate(&strm, len ? Z_NO_FLUSH : Z_PARTIAL_FLUSH);
00662                 got = DICT - strm.avail_out;
00663                 if (got && write(log->fd, buf, got) != got) {
00664                     deflateEnd(&strm);
00665                     return -1;
00666                 }
00667                 log_touch(log);
00668             } while (strm.avail_out == 0);
00669         } while (len);
00670         deflateEnd(&strm);
00671         BAIL(5);
00672 
00673         /* find start of empty static block -- scanning backwards the first one
00674            bit is the second bit of the block, if the last byte is zero, then
00675            we know the byte before that has a one in the top bit, since an
00676            empty static block is ten bits long */
00677         if ((log->first = lseek(log->fd, -1, SEEK_CUR)) < 0 ||
00678             read(log->fd, buf, 1) != 1)
00679             return -1;
00680         log->first++;
00681         if (*buf) {
00682             log->back = 1;
00683             while ((*buf & ((uint)1 << (8 - log->back++))) == 0)
00684                 ;       /* guaranteed to terminate, since *buf != 0 */
00685         }
00686         else
00687             log->back = 10;
00688 
00689         /* update compressed crc and length */
00690         log->ccrc = log->tcrc;
00691         log->clen = log->tlen;
00692     }
00693     else {
00694         /* no data to compress -- fix up existing gzip stream */
00695         log->tcrc = log->ccrc;
00696         log->tlen = log->clen;
00697     }
00698 
00699     /* complete and truncate gzip stream */
00700     log->last = log->first;
00701     log->stored = 0;
00702     PUT4(buf, log->tcrc);
00703     PUT4(buf + 4, log->tlen);
00704     if (log_last(log, 1) || write(log->fd, buf, 8) != 8 ||
00705         (end = lseek(log->fd, 0, SEEK_CUR)) < 0 || ftruncate(log->fd, end))
00706         return -1;
00707     BAIL(6);
00708 
00709     /* mark as being in the replace operation */
00710     if (log_mark(log, REPLACE_OP))
00711         return -1;
00712 
00713     /* execute the replace operation and mark the file as done */
00714     return log_replace(log);
00715 }
00716 
00717 /* log a repair record to the .repairs file */
00718 local void log_log(struct log *log, int op, char *record)
00719 {
00720     time_t now;
00721     FILE *rec;
00722 
00723     now = time(NULL);
00724     strcpy(log->end, ".repairs");
00725     rec = fopen(log->path, "a");
00726     if (rec == NULL)
00727         return;
00728     fprintf(rec, "%.24s %s recovery: %s\n", ctime(&now), op == APPEND_OP ?
00729             "append" : (op == COMPRESS_OP ? "compress" : "replace"), record);
00730     fclose(rec);
00731     return;
00732 }
00733 
00734 /* Recover the interrupted operation op.  First read foo.add for recovering an
00735    append or compress operation.  Return -1 if there was an error reading or
00736    writing foo.gz or reading an existing foo.add, or -2 if there was a memory
00737    allocation failure. */
00738 local int log_recover(struct log *log, int op)
00739 {
00740     int fd, ret = 0;
00741     unsigned char *data = NULL;
00742     size_t len = 0;
00743     struct stat st;
00744 
00745     /* log recovery */
00746     log_log(log, op, "start");
00747 
00748     /* load foo.add file if expected and present */
00749     if (op == APPEND_OP || op == COMPRESS_OP) {
00750         strcpy(log->end, ".add");
00751         if (stat(log->path, &st) == 0 && st.st_size) {
00752             len = (size_t)(st.st_size);
00753             if (len != st.st_size || (data = malloc(st.st_size)) == NULL) {
00754                 log_log(log, op, "allocation failure");
00755                 return -2;
00756             }
00757             if ((fd = open(log->path, O_RDONLY, 0)) < 0) {
00758                 log_log(log, op, ".add file read failure");
00759                 return -1;
00760             }
00761             ret = read(fd, data, len) != len;
00762             close(fd);
00763             if (ret) {
00764                 log_log(log, op, ".add file read failure");
00765                 return -1;
00766             }
00767             log_log(log, op, "loaded .add file");
00768         }
00769         else
00770             log_log(log, op, "missing .add file!");
00771     }
00772 
00773     /* recover the interrupted operation */
00774     switch (op) {
00775     case APPEND_OP:
00776         ret = log_append(log, data, len);
00777         break;
00778     case COMPRESS_OP:
00779         ret = log_compress(log, data, len);
00780         break;
00781     case REPLACE_OP:
00782         ret = log_replace(log);
00783     }
00784 
00785     /* log status */
00786     log_log(log, op, ret ? "failure" : "complete");
00787 
00788     /* clean up */
00789     if (data != NULL)
00790         free(data);
00791     return ret;
00792 }
00793 
00794 /* Close the foo.gz file (if open) and release the lock. */
00795 local void log_close(struct log *log)
00796 {
00797     if (log->fd >= 0)
00798         close(log->fd);
00799     log->fd = -1;
00800     log_unlock(log);
00801 }
00802 
00803 /* Open foo.gz, verify the header, and load the extra field contents, after
00804    first creating the foo.lock file to gain exclusive access to the foo.*
00805    files.  If foo.gz does not exist or is empty, then write the initial header,
00806    extra, and body content of an empty foo.gz log file.  If there is an error
00807    creating the lock file due to access restrictions, or an error reading or
00808    writing the foo.gz file, or if the foo.gz file is not a proper log file for
00809    this object (e.g. not a gzip file or does not contain the expected extra
00810    field), then return true.  If there is an error, the lock is released.
00811    Otherwise, the lock is left in place. */
00812 local int log_open(struct log *log)
00813 {
00814     int op;
00815 
00816     /* release open file resource if left over -- can occur if lock lost
00817        between gzlog_open() and gzlog_write() */
00818     if (log->fd >= 0)
00819         close(log->fd);
00820     log->fd = -1;
00821 
00822     /* negotiate exclusive access */
00823     if (log_lock(log) < 0)
00824         return -1;
00825 
00826     /* open the log file, foo.gz */
00827     strcpy(log->end, ".gz");
00828     log->fd = open(log->path, O_RDWR | O_CREAT, 0644);
00829     if (log->fd < 0) {
00830         log_close(log);
00831         return -1;
00832     }
00833 
00834     /* if new, initialize foo.gz with an empty log, delete old dictionary */
00835     if (lseek(log->fd, 0, SEEK_END) == 0) {
00836         if (write(log->fd, log_gzhead, HEAD) != HEAD ||
00837             write(log->fd, log_gzext, EXTRA) != EXTRA ||
00838             write(log->fd, log_gzbody, BODY) != BODY) {
00839             log_close(log);
00840             return -1;
00841         }
00842         strcpy(log->end, ".dict");
00843         unlink(log->path);
00844     }
00845 
00846     /* verify log file and load extra field information */
00847     if ((op = log_head(log)) < 0) {
00848         log_close(log);
00849         return -1;
00850     }
00851 
00852     /* check for interrupted process and if so, recover */
00853     if (op != NO_OP && log_recover(log, op)) {
00854         log_close(log);
00855         return -1;
00856     }
00857 
00858     /* touch the lock file to prevent another process from grabbing it */
00859     log_touch(log);
00860     return 0;
00861 }
00862 
00863 /* See gzlog.h for the description of the external methods below */
00864 gzlog *gzlog_open(char *path)
00865 {
00866     size_t n;
00867     struct log *log;
00868 
00869     /* check arguments */
00870     if (path == NULL || *path == 0)
00871         return NULL;
00872 
00873     /* allocate and initialize log structure */
00874     log = malloc(sizeof(struct log));
00875     if (log == NULL)
00876         return NULL;
00877     strcpy(log->id, LOGID);
00878     log->fd = -1;
00879 
00880     /* save path and end of path for name construction */
00881     n = strlen(path);
00882     log->path = malloc(n + 9);              /* allow for ".repairs" */
00883     if (log->path == NULL) {
00884         free(log);
00885         return NULL;
00886     }
00887     strcpy(log->path, path);
00888     log->end = log->path + n;
00889 
00890     /* gain exclusive access and verify log file -- may perform a
00891        recovery operation if needed */
00892     if (log_open(log)) {
00893         free(log->path);
00894         free(log);
00895         return NULL;
00896     }
00897 
00898     /* return pointer to log structure */
00899     return log;
00900 }
00901 
00902 /* gzlog_compress() return values:
00903     0: all good
00904    -1: file i/o error (usually access issue)
00905    -2: memory allocation failure
00906    -3: invalid log pointer argument */
00907 int gzlog_compress(gzlog *logd)
00908 {
00909     int fd, ret;
00910     uint block;
00911     size_t len, next;
00912     unsigned char *data, buf[5];
00913     struct log *log = logd;
00914 
00915     /* check arguments */
00916     if (log == NULL || strcmp(log->id, LOGID) || len < 0)
00917         return -3;
00918 
00919     /* see if we lost the lock -- if so get it again and reload the extra
00920        field information (it probably changed), recover last operation if
00921        necessary */
00922     if (log_check(log) && log_open(log))
00923         return -1;
00924 
00925     /* create space for uncompressed data */
00926     len = ((size_t)(log->last - log->first) & ~(((size_t)1 << 10) - 1)) +
00927           log->stored;
00928     if ((data = malloc(len)) == NULL)
00929         return -2;
00930 
00931     /* do statement here is just a cheap trick for error handling */
00932     do {
00933         /* read in the uncompressed data */
00934         if (lseek(log->fd, log->first - 1, SEEK_SET) < 0)
00935             break;
00936         next = 0;
00937         while (next < len) {
00938             if (read(log->fd, buf, 5) != 5)
00939                 break;
00940             block = PULL2(buf + 1);
00941             if (next + block > len ||
00942                 read(log->fd, (char *)data + next, block) != block)
00943                 break;
00944             next += block;
00945         }
00946         if (lseek(log->fd, 0, SEEK_CUR) != log->last + 4 + log->stored)
00947             break;
00948         log_touch(log);
00949 
00950         /* write the uncompressed data to the .add file */
00951         strcpy(log->end, ".add");
00952         fd = open(log->path, O_WRONLY | O_CREAT | O_TRUNC, 0644);
00953         if (fd < 0)
00954             break;
00955         ret = write(fd, data, len) != len;
00956         if (ret | close(fd))
00957             break;
00958         log_touch(log);
00959 
00960         /* write the dictionary for the next compress to the .temp file */
00961         strcpy(log->end, ".temp");
00962         fd = open(log->path, O_WRONLY | O_CREAT | O_TRUNC, 0644);
00963         if (fd < 0)
00964             break;
00965         next = DICT > len ? len : DICT;
00966         ret = write(fd, (char *)data + len - next, next) != next;
00967         if (ret | close(fd))
00968             break;
00969         log_touch(log);
00970 
00971         /* roll back to compressed data, mark the compress in progress */
00972         log->last = log->first;
00973         log->stored = 0;
00974         if (log_mark(log, COMPRESS_OP))
00975             break;
00976         BAIL(7);
00977 
00978         /* compress and append the data (clears mark) */
00979         ret = log_compress(log, data, len);
00980         free(data);
00981         return ret;
00982     } while (0);
00983 
00984     /* broke out of do above on i/o error */
00985     free(data);
00986     return -1;
00987 }
00988 
00989 /* gzlog_write() return values:
00990     0: all good
00991    -1: file i/o error (usually access issue)
00992    -2: memory allocation failure
00993    -3: invalid log pointer argument */
00994 int gzlog_write(gzlog *logd, void *data, size_t len)
00995 {
00996     int fd, ret;
00997     struct log *log = logd;
00998 
00999     /* check arguments */
01000     if (log == NULL || strcmp(log->id, LOGID) || len < 0)
01001         return -3;
01002     if (data == NULL || len == 0)
01003         return 0;
01004 
01005     /* see if we lost the lock -- if so get it again and reload the extra
01006        field information (it probably changed), recover last operation if
01007        necessary */
01008     if (log_check(log) && log_open(log))
01009         return -1;
01010 
01011     /* create and write .add file */
01012     strcpy(log->end, ".add");
01013     fd = open(log->path, O_WRONLY | O_CREAT | O_TRUNC, 0644);
01014     if (fd < 0)
01015         return -1;
01016     ret = write(fd, data, len) != len;
01017     if (ret | close(fd))
01018         return -1;
01019     log_touch(log);
01020 
01021     /* mark log file with append in progress */
01022     if (log_mark(log, APPEND_OP))
01023         return -1;
01024     BAIL(8);
01025 
01026     /* append data (clears mark) */
01027     if (log_append(log, data, len))
01028         return -1;
01029 
01030     /* check to see if it's time to compress -- if not, then done */
01031     if (((log->last - log->first) >> 10) + (log->stored >> 10) < TRIGGER)
01032         return 0;
01033 
01034     /* time to compress */
01035     return gzlog_compress(log);
01036 }
01037 
01038 /* gzlog_close() return values:
01039     0: ok
01040    -3: invalid log pointer argument */
01041 int gzlog_close(gzlog *logd)
01042 {
01043     struct log *log = logd;
01044 
01045     /* check arguments */
01046     if (log == NULL || strcmp(log->id, LOGID))
01047         return -3;
01048 
01049     /* close the log file and release the lock */
01050     log_close(log);
01051 
01052     /* free structure and return */
01053     if (log->path != NULL)
01054         free(log->path);
01055     strcpy(log->id, "bad");
01056     free(log);
01057     return 0;
01058 }

Generated on Wed Oct 20 2010 11:12:17 for APBS by  doxygen 1.7.2