Thursday, April 3, 2014

Entity Attribute Value Patterns: A Toy App Exploration

Lately I have had to work with EAV (Entity Attribute Value) patterns in Python, one thing that has bothered me is the number of types defined in the code base. There is whole inheritance hierarchy based on an Entity type that is three generations deep but contains nearly 30 sub types. Then there is a namespace concept mixed in with the hierarchy and the Entity type at the root of the hierarchy is a kind of attribute hash. Anyway the final product is very inelegant with many layers of code and an insufficient interface which does little to encapsulate the complexity of the architecture. So last night I needed to sharpen-the-saw, I've been pounding away at this code base for two months, I'm beyond burnt out at this point, and decided to get out my frustration by toying around with the concepts I've been grappling with.

The first concept is the EAV pattern, a simple enough idea which can become complicated, but the basic idea of EAV when considering persisted types is that the table has been pivoted. Instead of the attributes of the type aligning horizontally with the names of the columns in the database

Id Name Age Description
1 Jason 36 Python Developer
2 ... ... ...

the attributes align vertically with the entries in the 'attributes' column.

Id Entity Attribute Value
1 person name Jason
2 person age 36
3 person description Python Developer

With this pattern a new attribute for the type person can be created and defined without having to alter the table, just insert a new row

Id Entity Attribute Value
1 person name Jason
2 person age 36
3 person description Python Developer
4 person favorite_song The Chicken Dance

and now we are all aware of the author's questionable taste in music. There are some limitations to this design as used here, we can only have one person in the table, and there will be performance concerns as this table becomes longer and longer as new entities are defined and added. I still find this pattern interesting and want to explore the pattern further, and in this quest I created the following class in Python.

class EAVType(object):
    def __init__(self, entity, attribute, value, is_method=False):
        '''Stable state constructor, the combination of all members constitutes the unique                                        
        constraint. The addition of the namespace member allows for the duplication of the                                        
        other members.                                                                                                            
                                                                                                                                  
        Arguments:                                                                                                                
        entity,     The entity the instance will represent.                                                                       
        attribute,  The attribute the instance will represent.                                                                    
        value,      The value the instance will represent.                                                                        
        is_method,  A boolean indicating if the attribute is a method.                                                            
        '''
        self.entity = entity
        self.attribute = attribute
        self.value = value
        self.is_method = is_method

    def __repr__(self):
        return "%s:%s='%s'" % (entity, attribute, value,)

    def __str__(self):
        return "%s:%s='%s'" % (entity, attribute, value,)

    def __eq__(self,other):
        if self.entity == other.entity and self.attribute == other.attribute and self.value == other.value:
            return True
        else:
            return False

I don't find this class to compelling on its own, but the class becomes more compelling as I start to do things with it (the boolean field hints at what I might try.) First I map the class to a table using SQLAlchemy. Here is the table definition.

eav_type = Table(
    'eav_types', metadata,
    Column('id', Integer, primary_key=True),
    Column('entity', String),
    Column('attribute', String),
    Column('value', String),
    Column('is_method',Boolean),)

The types the attributes map to are simple, there are no foreign keys pointing to other persisted types, having foreign keys to other types would couple strongly to the database and I want to keep the layers here as loose as possible. This code defines a package that I have created called dyneav which you can see the source of on git hub here. In the source is an example script with the following code.

class DynamicType(object):
    '''A type that can dynamically set its attributes based on the values set in a                                                
    EAV table for a given entity.                                                                                                 
    '''
    def __init__(self,entity):
        '''Constructor that sets the type of the instance.                                                                        
                                                                                                                                  
        Arguments:                                                                                                                
        entity,  The object that indicates the type of the instance.                                                              
        '''
        self.entity = entity

    def __repr__(self):
        return str(self.entity)

    def __str__(self):
        return str(self.entity)

    def __eq__(self,other):
        try:
            if self.entity == other.entity:
                return True
            else:
                return False
        except AttributeError:
            return False

    @classmethod
    def with_uri(cls,uri):
        '''Creates an object using a URI to retrieve the attribute of the object from the DB.                                     
                                                                                                                                  
        Arguments,                                                                                                                
        uri,  String URI                                                                                                          
        '''
        instance = cls(uri)
        attributes = DBSession.query(EAVType).filter_by(entity=uri).all()
        for attribute in attributes:
            if attribute.is_method:
                setattr(instance,attribute.attribute,types.MethodType(pickle.loads(attribute.value),instance,EAVType))
            else:
                setattr(instance,attribute.attribute,attribute.value)
        return instance

if __name__ == "__main__":
    metadata.create_all()

    def do_something(self):
        return self.title

    #Load up the DB                                                                                                               
    entry10 = EAVType("/rowlings/j/harry_potter/philosophers_stone","date","3/5/2014")
    entry11 = EAVType("/rowlings/j/harry_potter/philosophers_stone","title","The Philosopher's Stone")
    entry12 = EAVType("/rowlings/j/harry_potter/philosophers_stone","do_something",
                      pickle.dumps(do_something),is_method=True)
    entry20 = EAVType("/rowlings/j/harry_potter/chamber_of_secrets","date","3/5/2014")
    entry21 = EAVType("/rowlings/j/harry_potter/chamber_of_secrets","title","The Chamber of Secrets")
    entry30 = EAVType("/rowlings/j/harry_potter/prisoner_of_azkaban","date","3/5/2014")
    entry31 = EAVType("/rowlings/j/harry_potter/prisoner_of_azkaban","title","The Prisoner of Azkaban")
    entry40 = EAVType("/rowlings/j/harry_potter/goblet_of_fire","date","3/5/2014")
    entry41 = EAVType("/rowlings/j/harry_potter/goblet_of_fire","title","The Goblet of Fire")
    entry50 = EAVType("/rowlings/j/harry_potter/order_of_the_phoenix","date","3/5/2014")
    entry51 = EAVType("/rowlings/j/harry_potter/order_of_the_phoenix","title","The Order of the Pheonix")
    entry60 = EAVType("/rowlings/j/harry_potter/half_blood_prince","date","3/5/2014")
    entry61 = EAVType("/rowlings/j/harry_potter/half_blood_prince","title","The Half Blood Prince")
    entry70 = EAVType("/rowlings/j/harry_potter/deathly_hallows","date","3/5/2014")
    entry71 = EAVType("/rowlings/j/harry_potter/deathly_hallows","title","The Deathly Hallows")
    DBSession.commit()

    dynamic_book01 = DynamicType.with_uri("/rowlings/j/harry_potter/prisoner_of_azkaban")
    print(dynamic_book01)
    print(dynamic_book01.title)
    print(dynamic_book01.date)

    dynamic_book02 = DynamicType.with_uri("/rowlings/j/harry_potter/philosophers_stone")
    print(dynamic_book02)
    print(dynamic_book02.title)
    print(dynamic_book02.date)
    print(dynamic_book02.do_something())

The choice of String for the columns on the entity, attribute and value was deliberately generic and exposes my predilection for the style constraint of using the simplest and most generic types possible. I'm able to do a lot with EAV, I even dynamically add a method to a type and persist the method's definition into the table by using pickle to serialize the function object. I then monkey-patch the method onto the instance during the instantiation of my dynamic type in the classmethod with_uri. I also have my entities defined as URIs, something else I get for free by using the string type. Using URI's allows me to add more than one instance of a type, each instance will have a unique URI based off of a common root namespace, but I don't couple my classes to the URI namespace idea explicitly and I definitely don't want artifacts of namespaces in my database schema. If necessary I can put logic specific to URI parsing in my DynamicType. In ReST using URIs to uniquely identify all resources is a style constraint and I can easily accommodate that style constraint without any modification to my original EAVType class.

I'm kind of tickled with what I was able to do here, but admit it is a solution in search of a problem which is why I refer to it as a toy experiment. My experience is that dynamic typing schemes are often employed to deal with an inability to identify the types needed for a project, changes in the types result in churn on your database schema with lots of painful data migrations. Even when you don't use EAV it is possible to keep your model very flexible by not using types any more complex than they have to be. The use of URIs and namespaces is an example, I don't believe it is necessary to create a namespace class to handle URIs in most languages. A URI is a formatted string and many languages have functionality to help you parse and deal with managing URIs, use the existing functionality of your platform and then concentrate on encapsulating your application specific logic concerning URIs behind a contextual API to hide the complexity of your specific implementation.

No comments:

Post a Comment