By Lee Zion lzion@plains.net
Early every morning the RootsWeb server that stores the USGenWeb Archives runs a program to walk through the USGenWeb Archives FTP directory and generate a tabular listing of the total number of disk file blocks* used in each of the standard state subdirectories. Another part of the program checks the "last modified" date on each of the files found and tabulates the number of file blocks in each of the standard subdirectories that are less than 30 days old.
The daily report generated by the program is posted for public view at http://www.rootsweb.com/~usgenweb/newstats.html. The upper section of the daily report reflects the size (in file blocks) of the files that were found in each standard subdirectory. The bottom, 30 day utilization section, of the report reflects the number of file blocks that were found to have a recent "last modified" date. A subdirectory found with a name that doesn't match one of the "standard names" is counted in the miscellaneous column of both sections.
On the first day of each month, we grab an "end of previous month" copy of the report and post it in the Archives Monthly Stats directory. I use the data from the report to generate the graphs you see on these pages.
(* The server "disk file block" is 1024 bytes = 1 kbyte.)
Read the holdings reported and view the graphs with a very large dose of salt.
Since the report tabulates the number of blocks used by each file and not the true size of the file stored in those blocks it inflates the size of the holdings by the amount of space used for "disk overhead" in each file block and the space in the partly empty block at the end of each file. The actual amount the total size of our holdings is inflated depends on a number of factors including the number of files and the distribution of the size of the files in each subdirectory.
Informal tests comparing true file size totals with total file blocks reported in individual state directories having a large number of files with a "normal size distribution" indicate the true size of our total holdings probably runs about 80-85% of the total size reported in the tabulation.
On the other hand, states that differ from the norm and have a large number of very small files may have true holdings of less than 25% of the total reported size of the file blocks used.
30 day utilization section.
The 30 day numbers suffer from the same true size vs file block size inflation factors as do the total numbers. This is exaggerated by our perception that "last modified" equates to "new." However, a file that is simply modified to change the e-mail address of the contributor also shows up in the 30 day numbers and can't be distinguished from a brand new upload.
My graph showed the January 1, 2004 holdings as 9.67 Gbytes but the report shows 10,144,884 1k-blocks. There are two items that are factored into the total holdings to generate the monthly total shown-