Tuesday, February 14, 2012

Relational vs Key/Value Databases and their potential application to construction

"The reasons for the dominance of relational databases are not trivial. They have continually offered the best mix of simplicity, robustness, flexibility, performance, scalability, and compatibility in managing generic data."


- "Is the Relational Database Doomed?", Bain, Tony


Jumping right in this week... The first article that I read offered a pretty unique, though possibly outdated persepective on the topic of databases. In 2009, and still today to a certain extent, relational databses were the most effective type of database for pretty much most applications. As the quote above demonstrates, these widely popular relational databases provided the most well-rounded data management service. The "relational logic" (as I'll call it) behind the database is quite complex, needing to trace back and forth between tables that housed different characteristics that correspond to the same entity. To be clear, the coding required to succesfully map between characteristic is complex, but the logic/methodology is quite simple. Its essentially an adaptation of how we normally stored data: in tables for quick referential iinquiry or checking. However, the need to house each characteristic seperately, though linked using the primary key, is where the weaknesses of the relational database lie. As more and more data is stored in the database the necessary mapping between data becomes more and more cumbersome, not to mention the shear size of the database being enough to slow even the most powerful server. Obviously, as data storage is the main premise of a database, and generally common users hope to be able to continue to put more and more data into them without sacrificing speed, more and more infrastructure and hardware is necessary for these larger relational databases. This defeats the purpose of cloud computing, and is the premise of the first article; the scalability limitations of relational databases led to the emergence of "key/value", primarily object oriented databases.
The methodology of these databases is quite different from that of relational databases. Several parameters or characteristics can be stored to the same key. Also, the same key can exist twice in the database, whereas normalization techniques remove duplicates from relational databases. The best analogy that I can provide is like a function in mathematics; it is a function if there is one y for every x. Relational databases are like functions in this case, while key/value databases can have multiple y's for each x: many entries for the same key, each storing different data. Data storage in these key/value databases is "cheaper" or less consumptive, and fewer references are needed between data, allowing them to be much larger in the amount of data they store. Around 2009 and through today, this is the most important thing, especially as cloud computing becomes more desireable; we want infinitely large databases, that will chug along despite how much data we through into them. The article serves to discuss the advantages and disadvatages of relational and key/value databases. My premise is different and more general, and I intend to tie it in with my other article: databases with better scalability are more useful, I feel, for CAE applications.


The second article was a technical paper that was presented at the 25th International Symposium on Automation and Robotics in Construction. Several Korean students from the Seoul National University developed a database that allowed for more accurate cost estimating of buildings, with the intention that it be useful in determining cost estimates for complex buildings. The database was loaded with defining characteristics, such as unit size, number of floors, general size and shape, and void space, as  well as the material estimates and costs for completed projects. As the estimates for new projects were desired, the same defining characteristics were loaded into the database, for comparison with the existing data. The inquiry would return the cost estimates for projects that matched the current characteristics. If perfect matches were no available, then approximate values or data the matched certain parameters would be returned, sort of like an 'accuracy average'. the concept is pretty neat, and they 4 students were actually able to show a good deal of accuracy. On top of that, the database stores all new inquiries and results, for use in future inquiries. Thus, theoretically the database would become more accurate the more traffic it got.


Tying this all in to what we have discussed so far in the course, such a database seems like it would work pretty seamlessly with applications like Revit or other BIM software. Object schedules and area and volume measurements are defining characteristics of these parametric packages. Transfering that data to a database chock-full of cost estimates from completed projects could generate a pretty accurate cost estimate, which could automatically update as the parameters of the model as changed, all during the design phase. This could help reduce the time that is consumed on project estimating, and help smooth the transition between design and construction even more.


Reading 1: Is the Relational Database Doomed?


Reading 2: Cost Estimate Methodology Using Database Layer in Construction Projects


No comments:

Post a Comment