Big data and urban rail- uneven partners? An answer.

Urban Rail measurement vehicles produce a lot of data. But is a lot really big? An answer after 15 years experience.

Now, big data. Consumers complete e.g on average 10,000 credit card transactions every second (The Top 5 Trends for Big Data in Financial Services). That’s quite a lot, but still Data and not information. Still an amount good analytical software (SAS, R, etc.) could eventually handle. So what is the news on big data for the railway sector, where huge amounts of measuresments are common for decades? Is it quality? Quantity? Both? Or is it something not mentioned now?

In my opinion its all of it, more or less. My experience with the viennese Tram and Metro network over the last 15 years shows the following results:

First, the amount is neither the real problem nor a big deal.  The amount calls for vital algorithms, compression, filtering and clever storage management. No additional value from amount itself.For example, our rail measurement cars (Imagemap) running a few times a year, measuring track geometry every 25 cm with an output of 16 channels, produce 2bn records each in its 15 years lifetime.

Still you can handle this with excel, SPSS or R. Produce time series, comparison plots etc. this is not big data at all. The same data automatically localised and matched to a reference network, combined with operational characteristics (timetables, annual load), vehicle fleet (wheelsets, type of car), building date, estimations on useful life spans or maintenance activities (grinding, tamping) and wear rates tells you a lot more but still not the whole truth.

But still data. first information. Wear of grooved rail over time. Data above is from Paul Mittermayer from

BIG means here BIG effort on techniqs to make it possible to combine lots of data sources in a consistent way, so that the result tells you more than the sum of all individual datasets alone. The above example opend for us a new insight because of the proper setup. The insight adressed annual replacement rates over longer periods (20y) and annual peaks, maybe not visible before. Even more, it builds a basis for scenarios, so the mentioned replacement rates can be drawn in a minimum/maximum box. This tells us, what to  expect and how to deal with eg. peaks in negotiations with our sponsor.

BIG is for us another word for complete, consistent, properly related data, that can (semi) automatically generate information that supports and widen our professional knowledge.

The way to this kind of process and software landscape is long and at first sight expensive. But measurement vehicles only sum up to maybe 1/7 of 15 years TCO.

But, delivered Information pays the bill more than 7 times. Ever produced an alternative scenario of a 20year replacement plan in 3 hours at no cost? Believe me, a professional engineers bill has a lot of nulls for such a duty. Not to forget internal work and unique character of results. So Big Data is more of a big question demanding simple answers.

%d Bloggern gefällt das: