Welcome back to our study of data sources.
Our first order of business today is to make a correction: Both the BetterInvesting version of Take Stock AND the ICLUBcentral version of Take Stock use the BetterInvesting Stock Data Service. The StockCentral version of Take Stock uses the StockCentral data.
What first aroused our interest, and then provided the keen motivation for doing this study was the fact that the Complete Roster of Quality Companies (CRQC), a monthly stock screen published by ICLUBcentral and based upon the Take Stock Quality Index Rating (QI), which at the time of this data study used BetterInvesting S&P data but has since this study been changed to use Hemscott data from StockCentral, differed so radically from the screen results using the StockCentral Screener to generate a list of companies that were either desirable or at least acceptable. What was so vastly different was the Quality Index (QI), a metric used in both cases to score the quality of a company on a scale from 1 to 10.
Being certain that the algorithms in each of the screening processes was the same—one, in fact, copied directly from the other—it could mean only that the data that produced the QI differed sufficiently to alter the result.
Since the data items that were used to produce that index were the most significant items in our data files, and since the calculations used to make up the components of the QI would tend to magnify any differences in the actual data points, we determined that a good sample of data to use for the study would be the data for those companies that were included in either the CRQC or the StockCentral Screening product.
Typically, there would be upwards of 120 companies in either group. However, combining the two lists, we found differences sufficient to make the list grow to more than 160 companies. This meant that approximately 40 companies made each list that did not appear on the other! Another way to view this was that fully a third of the companies on one list did not make the cut for the other. They were too small, too young, unavailable, or the data used in the calculation of the QI was sufficiently different to produce a poor result in one instance and a good result in the other.
This was a serious concern as it would undermine any normal user’s confidence in these products, in the validity of those lists, and ultimately in the quality of the data that was used for those lists. And, it was difficult to overcome a bias, since the NAIC data had been used for so long by so many—having replaced the Value Line data as the “gold standard” for most users. Thus, the StockCentral data, as the “new kid on the block,” would likely be viewed as the data most flawed.
It was necessary, therefore, to come up with a valid, unbiased, and completely objective way to measure the quality of each data source. This is the result of that effort.
Basis for Comparison
There are three basic areas where the data may differ:
- Static Data – Data that is reported by the company with little or no opportunity for changes to be made by the provider. This includes such items as revenues, pre-tax profit, and balance sheet items. The differences that can occur in the static data include:
- accuracy – typos, etc.
- quantity of data – the number of years, quarters reported
- whether or not the data is restated or reclassified
- “freshness” – how long before it reaches the user
- interpretation or definition of items – (Example: whether or not consolidated income statements should include, as their own, revenues from companies a majority of whose stock they own, just show the equity in that company as an asset, or show the market value of their stock as an asset.)
- currency reported – some ADRs are reported by some providers in the native currency, others consistently report in US Dollars.
- Dynamic Data – Data whose inclusion is at the discretion of the distributor, such as data from discontinued operations, extraordinary items, number of shares applied for dilution, etc. Such choices, made by StockCentral.com, BetterInvesting, or AAII, as to which line items should comprise the data presented in the files, are applied consistently in each data product but may differ from product to product and can often be found by examining the expanded data of each.
- Subjective Data – Data about which the providers’ analysts’ opinions may differ as to which line items are non-recurring (infrequent or unusual), which are extraordinary (infrequent and unusual), and how that data affects pre-tax profit, net available to the shareholders, and ultimately earnings per share. These are usually applied consistently by each electronic data provider but may differ between them and can also be spotted by examining the expanded data of each.
The data files we use have a common layout. That is to say, the SSG data file for each company consists of 537 individual data items or fields, listed in the same order (See the attached file, Exhibit A). For example, the first ten fields of each file contain ten years of high prices beginning with the oldest and ending with the most recent year. The second ten fields contain ten years of the lowest prices in the same order.
The original files contained, perhaps, 100+ fields, since they were required to provide the data for the SSG alone. Now, with more products doing more things such as portfolio management and balance sheet analysis, the number and variety of data items that comprise these files has grown to where it is today. So there is a great deal more opportunity for data differences to occur.
We have concerned ourselves with only the data that is significant for arriving at a conclusion about the quality of a company as defined by our common methodology, since this is the most important determination we make. When George Nicholson bestowed upon us his wonderful gift—this simple approach for analyzing companies—the criteria he proposed evaluating were essentially the strength and stability of the growth of sales, pre-tax profit, and earnings per share, and the trend in profit margins and return on equity. This was based upon the important distinction he made between the need to study just the results a company’s management achieved, versus the common, intimidating misconception that one had to also look at all of the tools management used to achieve those results. This makes our job much easier.
As it happens, the Quality Index, a metric produced by the Take Stock software, evaluates just those items. We have therefore focused on differences in the Quality Index produced by the various data sources to unearth the differences in data we have analyzed. And, we have selected for our sample only those companies surviving screening by both the engine producing a “Complete Roster of Quality Companies” from NAIC’s data (the Complete Roster has since this study been changed to use StockCentral data from Hemscott), and those surviving screening by StockCentral’s “Screener” which produced either a satisfactory or a desirable Quality Index.
Note that we have not attempted to fine-tune those differences beyond the basic categories: desirable (green), acceptable (yellow), unacceptable (red), or not available (black). The last category includes a) data that is not included in the database, b) data that indicates that a company is too small or too young to be of interest, and c) companies that might be in the database but which are identified by different ticker symbols.
And, we have not considered as “different” basic data items whose values depart from those of their counterparts by less than 5 percent.
The Method
At first, we simply imported the data files from StockCentral.com, BetterInvesting and AAII into a spreadsheet, side-by-side and visually inspected them, highlighting those fields that differed from one or the other, or both by more than 5 percent. It soon became apparent that the time we would consume in visually analyzing all the differences for 160 companies would be interminable, and that the differences were greatly magnified because of the timing. Because we used the data from around the time that most companies were reporting the end of their fiscal years, there were too many differences caused by the fact that some companies had already reported new data, and others had not.
While we waited for the providers to catch up with each other, we built a mechanism using Microsoft’s Excel that passes each provider’s data through an identical process by which the Quality Index is calculated, detects and measures all of the comparative parameters, and produces a grade for each data file for each company. It then weighs and aggregates the results, and produces a grade for each provider. And it points to most of the anomalies in each file so we can reasonably quickly answer the question, “Which was incorrect and why?” And a new data sample can be entered and the run completed in about an hour.
The files are graded on the following parameters:
Quantity and Timeliness of Data
Number of Data Years – Ideally, the user would have ten years of data to work with. Some companies have not reported that many years. Providers are graded according to the number of years below the maximum reported by any provider.
Freshness of Data (Annual) – Some differences between data results arise due to the difference in policy regarding the speed with which the data is published. Providers lose points when their data is a year behind the timeliest.
Freshness of Data (Quarterly) – Essentially the same as annual data except points are taken for each quarter behind. Preliminary data is rewarded; but the provider is penalized when preliminary data is not complete.
Peer Agreement (Raw Data)
| Sales
Pretax Profit
Earnings per Share
|
Each provider’s data is compared with the data from the other two sources. The number of data points that differ by more than 5 percent from either of their peers are counted against each. When all agree, each receives a 10. |
Peer Agreement (Derived and Calculated Data)
| EPS R-Squared (7-yr.)
Relevant Historical Sales Growth
Relevant Historical EPS Growth
Recent (TTM) Sales Growth
Recent (TTM) EPS Growth
Profit Margin Trends
Return on Equity Strength
|
The items that comprise the Quality portion of Take Stock are roughly compared. Using certain thresholds, each is assigned a color identifying it as “Desirable,” “Acceptable,” “Unacceptable,” or “Other” (missing, too young, or too small). Each is compared with the best of the three results and is penalized 5 points for each color step it is away from the highest. |
This is the analysis methodology. Tomorrow we will discuss the results of the analysis.