From Sink Estates to SQL: Estimating Value : Now and Tomorrow (Part 1)

When you get a cab home after a night out, do you ever get the feeling - maybe five minutes after you're home and comfortable - that perhaps the £20 you spent wasn't quite worth it? I mean, you could have got the night-bus for £2 or walked for free, but you didn't. You valued the comfort, warmth or security of a cab over twenty pounds. Or at least, you did half an hour ago. Once you're home such things don't necessarily seem as clear.

Or perhaps you've ever bought something ludicrously expensive in an impulse. Maybe a stereo, or some other shiny electrical goods. Seconds before hand it all seemed like a good idea, but now...who knows? I suppose the question is basically : how much are things really worth?

These things have certainly happened to me. And trying to determine what something is worth is something which we as individuals struggle with on a daily basis. And beyond the personal we have to do the same in relationships and at work. And beyond the mundane and day-to-day things this topic is at the heart the most challenging debates in economics or philosophy.

I do not claim to be able to shed much light on things as an economist or philosopher. I do however wish to consider things from the perspective of an individual, or an individual business when, in this case, an investment is being made.

Consider recruitment. In some places, recruitment is a long and arduous journey, fraught with perils, difficulties and barely comprehensible procedures. My workplace is no exception. Part of this journey a question is asked - perhaps the most important of all : "How much does it pay?"

And so my department have just started recruiting for two new posts. And, as expected, the topic of salary inevitably arises. Now, the final rate the successful applicants ends up getting will depend on their specifics but obviously we need some indication now so we can advertise the job. And as usual this will depend on input from management, HR and finance. And in this case it's also needs input from your truly.

Can't We Use A Random Number Generator?

To be quite honest with you, I'm not sure I know how to approach the question.

How much is this position "worth"? God knows. Of course, analytically I can think of a few approaches.

1. Market Analysis
We write the job description and person spec and then simply compare with the rest of the market. What do other organisations pay for similar roles? What do similar jobs in our own organisation pay?

But then what does that mean in terms of individual variance. What price should we pay for the premium of the better candidate? And how do we know the market has got this right? Sure, if other companies are struggling to recruit or retain certain positions then we might assume they're not paying enough. But what if they're paying "too much"? How might this be determined? The overall impact on their cost structure might not be significant and it's not like many businesses publish statistics on how many staff are "too good" for the role they do.

2. Utility Value
If we find that similar jobs are paying £100k then we would of course abandon recruitment. "It just wouldn't be worth it" we might say. But what do mean by this?

Well, one way of looking at it would be that value generated by such a position would not justify the expense of hiring someone. But again, what do we mean by "value" here?

Sure, with some jobs it's easier - a salesman might be able to show how much revenue he or she has personally produced over their peers. What about where it's not so easy?

The only simple answer is that we make an educated guess. Are we likely to boost revenues by hiring person x? Cut other costs? Increase productivity? Meet statutory obligations? Reduce risk? Etc. etc.

3. Labour Value
One way to assess what an individual's work is worth would be the costs involved in someone else replicating this work with freely available resources. Picasso's paintings are worth so much partially because it is impossible to properly duplicate. Conversely, a brick layer will find his work easily copied by most with a basic training. The "utility" produced by a staff member is less valuable if anyone else can do the same.

And so the value of a man's work will depend on how much other men will do it for, or for what an outsourcing company will do it for or even what a machine can do it for - this is his labour-value.

4. Morality
It is silly to assume that discussions of pay do not involve some notion of morality - that is not the case in any real-world example I have encountered. At least rhetorically the level of pay a person receives will depend on :

- How much they need to live on.
- The presumed difficulty of what they do.
- The presumed unpleasantness of what they do.
- What other people get (both doing similar jobs and beyond)
- What their pay could alternatively be providing
- Their personal characteristics.

Some of these will be more common in some environments than in others but I think it suffices to say that "moral" arguments over pay will probably be more important in the public sector or indeed anywhere with a particularly high profile.

Of course, all of these arguments will bleed together. If I feel my pay is unfairly low then I will probably think this for a given reason - i.e. that my work is very important to the company, or that other people in the same sector earn much more for similar jobs or that my pay is not enough to live on. Similarly, even if you generate a vast amount of utility-value for your employer then if there are many thousands of other employees willing to do your job for a very low pay then it is likely you will not earn much.

How much is data worth?

So much for employees. But what about other things? In my last entry I discussed buying OS map data at work. I suppose my overall point was that I didn't feel it was worth it, in the current form data is supplied. This topic was discussed in a bit more detail in another blog, on which you can find some of my comments. See : http://giscussions.blogspot.com/2007/01/gi-is-worthless.html

But can the schema above help us to evaluate how much other investments like this are worth? How much is this mapping data really "worth"?

By definition, the market rate is the price we were quoted - £16k. It's probable for reasons discussed elsewhere that this is probably too high. For our purposes here we'll use a much lower £5k value.

Would this be worth it? Well, analysing the investment in terms of utility-value, we are back to the problems of measurement. We might assume (or hope) that costs can be lowered by GIS data - but how do we quantify this? The OS give some copy that we might expect to save £2 for every £1 in the first year. Which sounds pretty good, but as I've said, this is what everyone says. Realistically, can we prove it?

Probably not. However, given that our overall budget is many many times larger than a £5k we would only need a modest reduction in costs for our investment to be repaid. If for instance, we had a 0.05% reduction in maintenance costs across the board then this would represent double our spend saved in one year. For our purposes here, we'll say this is feasible, although I have some real concerns about where time saved "goes" after changes in processes and systems. That's the subject for another time however.

What about labour value? Well, hopefully here the link with my previous entry should be clear. Costs quoted by a company for a dataset like this must be compared not only to commercial rivals and the cost saving expected but also the cost of internally producing the data (or doing a similar thing in a different way).

Indeed, almost any dataset like this could be evaluated by the total number of labour hours it would take to make a satisfactory reproduction of it (not everything is reproducible of course).

I will provide a very crude guess and say that to map our housing stock to the quality we realistically require at least a thousands hours of someone's time (this number should not be taken as a serious analysis I hasten to add). Which we might redefine as £20,000. And, in this very simplistic example the £5k investment wins easily.

The morality of the investment is perhaps not the same as with recruitment, but there are still concerns. I would personally suggest it is immoral to spend such a sum on data which should be in the public domain, or which has so many restrictions on it. These can likely be ignored here. Other moral concerns (as to whether we should "waste" money on maps at all) could easily be rebuffed if we could demonstrate likely savings as a result.

Ongoing Costs

And so it all seems straight-forward. The £5k figure we're saying is a market-generated figure and it's easily justified when looking at the labour-cost of the project or the likely utility we'll receive.

Unfortunately, things do not end here. We have only considered barely half the issue yet. For a start, there will be implementation costs and the like, but we shall ignore those for now. What we have not considered is the ongoing costs of data itself.

The £5k is not to buy the data. It is to licence it. Or, to put another way - to borrow it. After two years we have to pay £5k again. And then again, two years later. This is not layaway - the data never becomes ours and if we stop paying we lost all functionality immediately.

And so our cost-benefit analysis becomes more complex. The utility-value will remain largely unaffected while we keep paying, but the labour-value changes.

Essentially, we did not want to do it in-house because the costs were too high. Our own map would have cost £20k. Hugely uncompetitive. But, looking forward :

	In House	External Map Data
Year 0	£20k	£5k
Year 2	£21k	£10k
Year 4	£22k	£15k
Year 6	£23k	£20k
Year 8	£25k	£25k

...and so on. The specifics of the table are of course completely speculative, but one can see the point in any case.

Simply put, the issue is that after the initial production of the dataset (drawing the map in this case) the labour-cost drops. Maps do require investment to keep them up-to-date, but except for specialist maps, this investment will not be the same amount as took to draw them in the first place. Places change, but not that fast. To repeat : most of the estates we own and manage have had the same street names in and on them since the 1960's and 1970's. Where there are changes in road names or the location of green areas these would almost always be :
- involving us anyway
- well publicised
- with a large amount of prior notice

And so for these types of amendments (surely 1-2% of the full dataset per year) we would be well placed to make changes ourselves.

To take another example. The Royal Mail address manager software is licensed at £1,250 for the first year and £500 for subsequent years. But how many of Royal Mail's post-codes change a year? For our new properties there are new post-codes certainly, but this is handled through a separate arrangement when they're built. So of the remaining, how many change on our estates? 5%? Less? I suspect the figure is closer to half a percent.

To look another way, if we buy the data to check our own database, we might find we make correction to 10% of all records in the first year. In the second, barring some training or data integrity issues we're unlikely to make corrections to more than half a percent of all our addresses (there's no reason to assume we couldn't get it down to 0.01%). Paying £1 to correct each wrong address (as in the first year) is probably justifiable. Paying more than £20 per wrong address seems less so.

Whichever analysis you feel is more convincing (or relevant) you hopefully get the idea that products like become less valuable as time goes on - both in terms of labour or utility-value. The cost also thankfully declines with the Royal Mail software, but no way near as quickly as the value does.

Subscription Models

The above examples might be thought of as rather silly but I'm sure you can see the point. If payment for something does not reflect either (or potentially both) the value obtained from it or the (labour) costs to the supplier then irrespective of our personal morality the product will become less attractive to clients.

Now, if you are fortunate enough to be a monopoly or state backed entity this might not be a concern. The BBC's value might have declined for some customers in recent years but while you can still be imprisoned for not paying the fee this may not be as important for the BBC Governors as it otherwise might be.

Where you do not enjoy such state protection or where consumers enjoy choice things will be different. And by "choice" here, I do not necessarily just mean competition from other commercial entities offering a similar service or product. That is a common misunderstanding.

For example, you make a widget which allows corked wine bottles to open easily without the use of a corkscrew. You enjoy revenues of £100m. This revenue is potentially threatened not just by competition but by choice in a range of areas :

- Consumers could buy another firms widget which does the same thing.
- Consumers could buy another device which achieves the same thing (e.g. a corkscrew)
- Producers of wine could stop using corks in their bottles.
- Consumers could switch to beer instead of wine.
- Consumers could switch to Islam and forsake alcohol completely.
- Consumers could borrow their friends widget and not buy their own.
- Consumers could make something in their own home which achieves the same thing as your widget.
...and so on.

And so our mapping dataset is not just threatened (as we've seen) by amateurish, slightly crazed DIY projects like mine. There are already projects which are producing public domain maps of varying quality. A steep decline in basic GPS equipment combined better free software to generate maps the idea of user-produced maps is much more of a viable option. If an organisation were to seriously contribute to such an effort then mapping data for a given area could feasibly be collected in months. And as stated, such Open Source projects already exist.

The analogy here with Wikipedia is of course obvious. And Wikipedia has many detractors, but these usually centre how easy it is to make amendments and how (theoretically) anything could be wrong at any moment. I personally feel these concerns are misplaced but if we so desired we could easily avoid them here. Wikipedia's "problem" is that needs (or at least, permits) very fast updates to account for changes in current events, etc. With a map of a local authority there is little reason to think things will change very quickly at all, short of the Rapture.

And with such conditions we could restrict updates to certain persons, or require all updates first be approved by a given set of individuals (again, if we found this necessary). And speaking selfishly, for those of who us live in high population density areas the task goes from merely achievable to almost trivial. London borough's (in which many thousands of people live and work, and whose local authorities have many millions to spend) usually only cover something like 20 to 60 square miles - which even given the patchwork of streets would be easy if even a few people from each area got involved.

Of course, you could argue that the map would never be as up-to-date as a professionally produced centrally authorised map. While this might be true, I would point out that the maps we use in my place of work were produced in the 1970s. And they suffice - we simply do not need (for day-to-day purposes) information that is razor accurate. Similarly, there might be a higher error rate in an open source map, but for any non-trivial query a surveyor, architect or planner would visit a site and make their own measurements. People do not build houses without visiting sites (or at least, they shouldn't) and we shouldn't over-emphasise how accurate such data needs to be.

And in this way open source projects can reduce the commercial value (to nothing) of certain products. Unless things change, I am unlikely to buy a commercial video file player for my home PC because the open source (and thus free) VideoLAN (VLC) is by far the best product I have tried. The utility value of such player remains as high, but the market value for them has collapsed, or to put another way, the labour-value of installing a competitor has been reduced to zero.

Of course, with data (and software) this model will not necessarily work for everything. I would not want stock quotes that were possibly out-of-date or edited incorrectly. Now, I'm sure any project could work given the right set of people involved, but it'll be made much harder depending on the level of accuracy required and the volatility of the dataset in question on one side versus the ease in collecting data and the number of likely reliable volunteers on the other.

Where the model does work, a small irony might be enjoyed. Open source projects (whose principles are described as either libertarian or communist) will help, over time to seriously diminish the "profits" of government suppliers through competitive market forces.

Who'd have thought, hey?

In the next article I want to look at how these sort of arguments affect the procurement of software and also how musicians and other workers are paid.

Postscript : Additional Notes On Terminology
I use the term 'Open Source' here to refer to projects where anyone can make a contribution, rather than the more common software reference to visible source code. In most datasets there is no "source code" as such, we are only interested in raw data. I use Open Source here because it is a commonly understood term.

However As Richard Stallman and others are quick to point out "Open Source" is not the same as "Free Software" (which is about guaranteeing the right to view source code and modify it). Stallman also emphasises the importance of terminology when discussing these issues. I would agree with this general point, but I think it suffices if when discussing in a non-technical sense we use any term, so long it is properly defined.

In general I am referring to (and saluting) projects which :
- do not charge for their end product (irregardless of whether they receive money)
- where the project puts no substantial restrictions on the use of the datasets.
- are developed at least partially through contributions from users.

In the above by "substantial" I am not including restrictions placed by licences like the General Public Licence or some of the Creative Commons licences.

While I understand how important legally the licence issue is, I do not find it an intellectually stimulating issue. I feel in a sane society most licences over data would simply be unenforceable and as such meaningless.

From Sink Estates to SQL

Wednesday, 24 January 2007

Estimating Value : Now and Tomorrow (Part 1)

Blog Archive

Me