Thursday, February 09, 2006

EAV Returns: The Concrete Elephant approach

Anyone who has read Tales Of The Oak Table, not to mention Tony Andrews' blog or any of the countless articles and discussions on the subject on AskTom and elsewhere, will know two things about the fabled "Entity-Attribute-Value" approach to database design, in which you model all "things" in one table with a "thing ID" and a "thing type", plus a second table holding one row per "attribute", and thus create an application that can model any conceivable type of thing, ever:

  1. It seems like a clever idea at first.
  2. It isn't.

But wait. A poster on OTN forums ("SIMPLE Database Design Problem") has solved the major problems inherent in the original Entity-Attribute-Value approach, by simply denormalising away the Attribute-Value part.

Now the ENTITIES table will have all the columns for every entity type. Maybe a lot of them will be null because "INVOICE" rows will use mainly different columns to "TROPICAL_DISEASE" rows, but disk space is cheap, and look at the simplification we have achieved by not having to babysit all those old-fashioned tables. And it's not a generic design any more, is it? It's concrete.

The table would look something like this:

ENTITYID ENTITYTYPE NAME      PRICE DIET  COLOUR ANNUAL_TURNOVER
-------- ---------- --------- ----- ----- ------ ---------------
1        PERSON     William
2        FRUIT      Banana                Yellow
3        COMPANY    Megacorp                     100000000
4        ANIMAL     Fruitbat        Fruit
5        SNACK      Snickers  0.4

accompanied by a generic RELATIONS table like this:

ENTITY1 ENTITY2 RELATIONSHIP
------- ------- ------------
3       1       EMPLOYS
1       2       EATS
1       5       EATS

Want to list the snacks eaten by Megacorp employees? Simple:

SELECT emp.entityid, emp.name, snack.name, snack.price
FROM   entities emp
       JOIN relations emprel
       ON  emprel.entity2 = emp.entityid
       AND emprel.relationship = 'EMPLOYS'
 
       JOIN entities com
       ON com.entityid = emprel.entity1
       AND com.entitytype = 'COMPANY'
 
       JOIN relations snrel
       ON  snrel.entity1 = emp.entityid
       AND snrel.relationship = 'EATS'
 
       JOIN entities snack
       ON  snack.entityid = snrel.entity2
       AND snack.entitytype = 'SNACK'
 
WHERE  emp.entitytype = 'PERSON'
AND    com.name = 'Megacorp';

Want to make FRUITBAT an employee of SNICKERS?

INSERT INTO relations VALUES (5, 4, 'EMPLOYS');

The thread becomes increasingly surreal as more and more posters suggest likely issues, from performance (he's prototyped it and the slowdown is insignificant) to complexity (the code will be generated dynamically from an object library) and the limited number of columns per table in Oracle (he might go with MySQL) while Erdem remains cheerfully confident that it will work (it won't).

My thanks to 3360 for sharing this. Send your WTFs to us at OracleWTF@bigfoot.com.

2 comments:

Rab Boyce said...

There is a genuinely difficult problem that guy is trying to solve here.

Not all application development is bespoke to a clients needs, so in the situations where flexibility is required while maintaining some sort of 'core' code base, some sort of extensibility needs to be built into the solution. It just isn't always practical or desirable to redesign the application schema for every additional column.

One of the key things that customers nearly always say in their requirements or RFPs is that they require the flexibility without resorting to the vendor for code changes.

The 'Concrete Elephant' solution proposed here is clearly nonsense, however an approach where the core entities, their attributes and entities are modelled in the schema itself, but whereby some felixibility is introduced by allowing the addition of attributes using an attribute-value approach does have some merit.

Anonymous said...

Comment