The doubly-mutable antipattern
I have seen this in both python and C#:
class Spam: def __init__(self): self.eggs = list() ...more code here... def clear(): self.eggs = list()
What do I consider wrong here? The variable ‘eggs’ is what I call ‘doubly mutable’. It is a mutable attribute of a mutable type. I hate this because I don’t know a) what to expect and b) what the usage pattern for the variable is. This confusion applies to both people using your class (if they take references to the value of the attribute, either knowingly or unknowingly), as well as coders who are maintaining your class who will have to look every time to know which pattern you follow (clearing the list, or reassigning to a new list).
IMO you should have one of the following. This first example is preferred but as it involves an immutable collection, isn’t always practical as a replacement for this antipattern (which usually involves mutating a collection).
class Spam: def __init__(self): self.eggs = tuple() #Now immutable, so it must be reassigned to change ...more code here... def clear(): self.eggs = tuple()
in which case, you know it is a mutable attribute of an immutable type. So you can’t change the value of the instance, you must reassign it.
Or you can do:
class Spam: def __init__(self): self.eggs = list() ...more code here... def clear(): del self.eggs[:] #Empty the list without reassigning the attribute.
in which case, you understand it is to be considered an immutable attribute of a mutable type. So you would never re-set the attribute, you always mutate it. If you can’t use the tuple alternative, you should always use this alternative- there’s never a reason for the original example.
In C#, we could avoid this problem entirely by using the ‘readonly’ field keyword for any mutable collection fields. In python, since all attributes are mutable, we must do things by convention (which is fine by me).
I always fix or point out this pattern when I find it and I consider it an antipattern. Does it bother you as much, or at all?
del self.eggs[:]
This actually does a reassignment. See help about list. Check the id of eggs before and after.
Clearing the list can be done with:
del self.eggs[0:]
Check the id before and after to confirm.
It worth pointing out that fixing the issue you pointed out doesn’t necessarily fix the underlying issue: that the class does not properly define what operations invalidate previous values.
Consider a collection that provides an iterator for accessing the values stored in the collection. Neither of your “fixes” avoid potentially invalidating the iterator when the clear() method is called. If you replace the underlying storage, then the iterator continues to access the old storage (which may or may not be a bug). If you mutate the underlying storage, even if the collection holds the storage in an “immutable” pointer, then the iterator could return invalid data or break in other ways (think of what happens when the underlying collection is grown or reordered).
The only way to fix this is for the “clear()” method to explicitly state in the documentation what happens to values previously returned by the class. At which point, your particular antipattern becomes somewhat less important. It’s probably worth fixing in C# simply because the language allows you to encode your intent, but OTOH, a programmer who didn’t use it the first time around isn’t likely to do so in the future without some instruction.
Remember that readonly / final on a class attribute is only a local promise not to change what object the attribute refers to. This can be useful as a programming aid for the class author, but really little else. Clients can’t really determine anything of value from such a declaration alone.
It doesn’t bother me.
Keeping long-lived references to attributes of random objects would bother me, I think. I.e. don’t do
self.spammy_eggs = some_spam.eggs
if some_spam.eggs is expected to change during the lifetime of some_spam.
I think in a type-safe environment I feel stronger about it. But isn’t the Python approach that we’re all consenting adults? I’d think relying on the tuple’s immutability wouldn’t necessarily be an airtight approach for many purposes. For example, what if you need to assign a sorted sequence? You’d have to remember every time you assign to the attribute to cast it as a tuple.
some_instance = Spam()
tup = (x for x in some_sequence if some_comparison(x))
some_instance.eggs = tuple(sorted(tup))
It’s of course going to depend upon context, but I’d think if I were genuinely concerned then I’d make it an internal attribute and create only the specific interfaces I want to allow.
If I had to summarize my general approach, I guess I’d use a tuple if I only expect the collection to be read (e.g., it is a frequency distribution associated with specific input parameters when I created the instance), but I’d be fine with a list unless I need to rely on the immutability for some reason (e.g., I need a hashable object for some purpose like a set or dict on the instance). In most such cases though I’m only relying on the instance’s immutability itself, since I’m much more likely to use it than I am one of its attributes for this type of purpose. (I mean immutable in the pedantic sense here, inasmuch the instance’s value is its identity).
Why shouldn’t class users and maintainers just use already defined interface /i.e. clear()/ no matter how is it implemented?
Ol Redhead:
That’s not the behavior I am getting (CPython 2.6). What version are you using? On your version, even if the ID changes, does the result appear to work (ie, if you say ‘a=[]; b=a; a.append(1); del a[:]; print b’ does it print [1] or [])? I can’t find what you’re talking about in the docs.
Adam Skutt: Not entirely true. If a client gets an iterator from a tuple, even if you reassign it, why would that pointer become invalid? On the other hand, if a client gets an iterator from a list- bad of you to expose your internals or bad of the client to not make a copy of it (depending on docs/contracts). Regarding your last C# paragraph, I don’t understand or don’t agree- readonly is what allows you to make immutable types. ‘readonly’ becomes part of the interface (or rather, your type’s contract). It is much more than just a programming aid.
Metchley: If ‘eggs’ should be a sorted collection, that should be part of Spam’s contract, not the responsibility of the caller. Also, generally if I have an immutable value, I’d pass it into the initializer, not mutate the attr- hence my caveat about it not being a great replacement for the pattern in question.
—-
Remember this is a greatly pared down example. There are legitimate reasons you may expose some of your class members and not hide everything behind your own methods, namely that it increases time to write, and code size, for potentially little benefit (especially if this is internal code for your company). There are ways to avoid the issue, and TBH it doesn’t come up in a way that will likely cause problems very often. But I don’t see the point in allowing it since it is so easy to change (and makes the code make ‘more sense’ conceptually, at least to me).
It’s invalid because the contract of the class explicitly states you’re either always iterating over the most recent version of the storage, or your iterator is invalid. Your pattern is possibly indicative of a larger design flaw, and fixing the code in the fashion you suggest doesn’t necessarily fix the flaw. Moreover, it may not be a flaw at all: I can’t implement a mutable linked list without your antipattern, nor an ArrayList in most languages.
There’s nothing in my text that suggests I’m returning an iterator from an internal list whatsoever. The problem persists even if the iterator class is entirely my own: go look at how ArrayList is implemented in Java and/or C#, or std::vector is implemented in C++[1]. Likewise, the same problem occurs if the class were a linked list implementation.
It’s worth pointing out that your “doubly-mutable” antipattern is really just called a plain pointer (in C: T*). Plain pointers are obviously incredibly useful, though they should be used judiciously. It should also be readily apparent that they are sometimes necessary.
You’re quite mistaken about what readonly means in C#. ‘readonly’ creates an immutable variable, not an immutable object (type). It means we have a constant pointer (in C: T* const) to an object, not a pointer to a constant object (in C: T const*). The underlying object can be freely mutated through the variable, so an object created solely out of readonly fields is not immutable.
Consider a trivial example: I write a proxy class for collection objects, that provides synchronization so the underlying collection may be accessed safely from multiple threads. I can obviously use a readonly field to store the reference to the underlying collection. Has the collection somehow become immutable? Obviously not. Is the proxy immutable? I don’t think so, since it doesn’t provide the same guarantees as System.String[2], which is immutable.
An immutable object means that its state never changes after instantiation. Necessarily, this means the state of any objects it refers to cannot change either. As such, readonly really only has value to the class author, not users of the class.
This is pretty trivial to test in C#, I encourage you to write test code to verify this behavior. Likewise, final in Java has the exact same semantics as it relates to fields and local variables.
[1] Or even how list is implemented in Python, though it’s implemented in C and different from the others (with good reason). All provide custom iteration over a contiguous block of memory. In C++, the block is managed through a few pointers (typically); in C#/Java, an array and a few variables.
[2] Or java.lang.String, or Python’s str and friends.
I know how ArrayLists are implemented. I don’t suppose to lay down a global law to be (or that can be) followed in all circumstances. See the last line of the blog post. I consider this an antipattern because in most cases I am of the opinion it introduces unnecessary differences of usage. The pattern has legitimate usages. Obviously. Thank you for pointing them out explicitly. No need to rage about it. Also I still don’t understand your first sentence. Conceptually or actually invalid?
Please reread what I said about readonly. It allows you to make immutable types, it obviously does not make any reference/value declared with it immutable. As for ‘readonly really only has value to the class author, not users of the class,’ I think that’s patently untrue. There is significant value (especially in asynchronous/multithreaded environments) in knowing that a field will not be reassigned. Saying “this field will not be reassigned” is an important part of a contract. Anyway, I don’t want to argue this particular point with you any longer since you’ve stated your opinion and I am so far away from it we’re not going to come to any agreement on it here. Feel free to post on your blog about it or maybe I’ll do a post about it in the future.
I don’t know how to rephrase my first statement to be any clearer. If the contract of the class says, “Iterators are invalidated when clear()” is called, then your code fix will not ensure that is true.
“Please reread what I said about readonly. It allows you to make immutable types,”
No, it does not. It permits mutation of the underlying object, ergo the type itself is not ipso facto immutable. If it did, then how could my proxy example possibly work? Are you claiming my proxy example cannot work as described?
If what you were saying is true, then why do C and c++ support all four of: T*, T const*, T* const, T const* const?
“It allows you to make immutable types, it obviously does not make any reference/value declared with it immutable.”
Which means it doesn’t make an immutable type. Immutability means the state of an object never changes. That obviously includes its child objects. If what you said were true, then admonishments such as the one at http://docs.python.org/tutorial/datastructures.html#dictionaries:
“Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key.”
would be unnecessary.
“There is significant value (especially in asynchronous/multithreaded environments) in knowing that a field will not be reassigned.”
What is that value, exactly? I think there’s considerable more value in knowing that the value of the field will not change, even if the field itself can be reassigned. The only value I can think of has to do with specific features of the Java memory model and don’t apply elsewhere.
I don’t see a particular problem with the doubly-mutable example in Python, at least. Everything goes there, and frequently does, although I absolutely agree that there is value in explicitly stating your intent to either hold different objects under the same name or hold a single object that can change underneath you.
Naturally I’d rather nothing was mutable in the first place, but purity is hard for me to reconcile with making useful things.
The comparison to a plain pointer in C is confusing. The semantics for pointer comparison mean there’s a huge difference between fiddling with the thing to which you hold a pointer and fiddling with the pointer to point at a different thing, whilst the antipattern is about conflating these. The sensible thing to do would probably be to use a pointer (hopefully to a constant object) for the second case and a reference in the first. Whilst this doesn’t entirely capture the intent (because all programming languages are horrible), it provides more information than just throwing one’s hands in the air and proclaiming “It’s all just pointers anyway!”
I don’t see how your explanation is any more clear than stating (not comparing) that Python variables are pointers[1], especially seeing as your explanation uses the word “pointer” anyway. The antipattern is simply, “Using a plain pointer where it is unnecessary is bad.” You’re going to have a hard time being more succinct than that.
I should also point out that I realize plenty of languages define immutable collections as those whose properties cannot be changed, even if the elements can be changed. I think they’re fundamentally flawed for doing so, as it creates precisely the problem we’re having right now. It’s important to distinguish between the transitive and non-transitive case: this object and all of its children cannot change versus this object cannot change, but its children can. Objects of the former kind can be used as hashmap keys and other contexts where immutability is absolutely required. Objects of the latter kind cannot necessarily be used as hashmap keys and provide weaker guarantees overall.
[1] To further compound the matter, the collections really are immutable since they’re collections of T*! However, since all you can get in the languages in question is a T*, I consider that a pretty hallow explanation.
@Rob Galanakis
Sorry I realized what I wrote was ambiguous. I was thinking more in terms of “if order of the sequence matters, and I as a user happen to care that it is sorted in many cases” rather than “it is a usage requirement that the sequence be sorted.”
On the one hand, we could say “it’s incumbent upon the user to know it needs to be a tuple before reassigning if they’re going to be particular about it” but we could also say “if order matters, do we really care if the user can just shuffle the list around, or should we try to accomodate that use?”
As a general point though, I certainly wouldn’t disagree with your suggestion to just basically make it immutable unless it needs to be otherwise, since the number and complexity of problems it solves is probably much greater than the number and complexity of problems it introduces.