Why bother with python and config files?
I’ve never understood why people design systems in python to use config files. IMO there are two types of data and we can handle them both in dead simple ways:
First: Read/write data, like UI/user preferences. I say, just use python dictionaries, and serialize them however the hell you want (json, yaml, pickle, cPickle, I don’t care). I don’t understand why anyone would build anything more complex (except maybe a wrapper over this functionality), and especially why anyone would bother using a non-serialized format like ini or cfg. Can there be a good reason?
Second: Read-only data, like application configuration (nothing that is edited by your app). This, too, is very commonly in config files with special like xml or ini. Well in this case I say, why bother even with serialized data at all? Just make it code! This data, by definition, does not have to be statically known and serialized, it is only needed at runtime. So why not make it a python module, have your code import/exec the python file, and access attributes from the python module? There’s a good deal of data we may only know at runtime, or we cannot statically represent, so we end up making the systems that consume this configuration data much more complex. (For an example, see Sphinx’s config, I think it does it right).
An example of where I think python-as-configuration has huge power is with complex configuration data like for a plugin system. Instead of relying on some static functionality, the configuration can even be callables that return a plugin instance- so now you don’t have to refer to globals inside of your plugins, you can have your own initializer with a much larger signature, and the plugin system can have its well-defined initialization interface.
I’m not sure I did a great job explaining that example- it deserves its own post. We’ve used it extensively for new tools and systems we’ve developed and it seems to work great (and results in zero globals or singletons used by the plugins, which is an artifact of most plugin systems I’ve seen).
These are my opinions and practices and I know there are plenty good modules which rely on ini/xml/config files. Other than legacy decisions or to be compatible with pre-existing systems, why would someone choose a config file/format over 1) serialized python objects and/or 2) python code?
Granted it only applies to a couple of very specific situations – but where I work we have a very real need for text-based configuration files.
Due to policy / procedural issues –
1) Programmers aren’t allowed to know certain information that gets encapsulated in configuration files (database IDs and passwords, for example). The programmers can write the code, but the “elevation team” controls the contents of the configuration files.
2) Auditors and management needs to be able to read / authorize / validate configuration files.
3) A common code base may need to work with multiple configurations, where the location of the configuration file needs to be passed to the code via command-line parameters.
4) Putting executable code in a directory reserved for configuration information is strictly prohibited. (Different groups are responsible for those different directories – it’s a separation-of-duties requirement.)
Again – you can probably consider this entire environment / situation to be an edge case – but it’s something we need to live with.
Hi Rob,
Thanks for the awesome blog!
The reasons I can see are validation, error reporting, versioning and security. You’re building in-house tools so ignore security for now.
So if someone:
– Injects some white space into their config file, what feedback to they get?
– Puts in a value that is out of range. What happens?
– Wants to build a config editor, how do they know what values are acceptable? Or know they can display a colour picker for a certain field?
– If we add or remove a field. How do we default it to something reasonable? Or gracefully handle the missing field.
Some of this can be handled post-hoc, you validate the dictionary and give feedback. But if it fails to load, then you’re giving feedback via the python compiler which would freak most non-coders out. And you have no way to add semantics/attributes to the data, although you could have a DDL that sits along side the config file.
But totally agree that in most cases, especially for in-house tools, simple is good and you don’t need all this stuff.
Hmm. Good points and I’m not entirely sure why. I think I’ve always made Python ConfigParser style files because, well, “that’s how it’s done!” Then, to fight the shortcomings, I end up inventing a whole new ways to include subconfigs, create “little languages” for setting values, etc. Plugins I usually just make simple package.module.Class values, but why not let Python resolve that for me normally rather than using __import__ and praying? Hmm.
I can think of one very compeling reason to use config files. Change Control. In an enterprise environment, it is far easier to modify config files than endure a potentially gruelling change control and code review process when all you want to do is change data or existing program behavior. Even a very minor change can cause a tidal wave of paperwork best avoided if possible. Its far better to design an app with the maximum amount of flexibility possible so future code changes are minor and infrequent.
The answer to first is simple: other people may want to modify configuration outside your application. There are tons of legitimate reasons to do this, and this ability (Group Policy) is one of the reasons Windows is so far ahead in enterprise desktop administration. Failing GP, text files are vastly easier to edit and manage than binary blobs.
The answer to the second is equally simple: just because something is read-only doesn’t make it constant forever. The application I wrote today doesn’t edit the database connection data supplied in its INI file, but the two alternatives: a) make the user supply it every time, or b) hard code it, are both plainly silly.
I think you also vastly overestimate the difficulty in using ini or xml configuration files in Python. It’s very simple, even if your needs are complicated. Lots of people have written useful and easy to use libraries if the built-in ones do not meet your needs.
I pretty much wanted to write what Adam said. The other reason is: ini/cfg files are simple!
Key = value is just so simple, there’s very little danger of messing it up if you manually edit it, unlike XML or JSON (oh no, a bracket is missing! ;) ).
I can just tell other people, even extremely un-technical artists, “hey open that file and edit value so-and-so”. It allows for easy troubleshooting and config changing.
Also since we use Qt a lot we can use QSettings, which works with ini files (or the registry). It’s a pretty neat system – set a flag and it uses the registry, or the user folder, or the program’s own folder. Works on OS X, Linux and Windows. Works the same on C++ and Python. I really like it.
There’s probably better and more sophisticated and elegant ways, I agree, but often simple is best.
I think that Adam Skutt misses the point:
1- serialization != binary
yaml (a text file) is as user friendly as any other ini file, and much more so than an xml
I fail to understand how this might (not) work with Windows’ Group Policy… In theory it should be exactly the same, but obviously every tools has its limitations (if you have such constraints you can design your application around it, but this doesn’t detracts from Rob’s idea)
2- shipping a default config.py and importing from it isn’t hardcoding it at all imho… even if you’re obfuscating your python code, you can just keep the config.py in the clear to let the user modify it
I never heard anyone say something along the lines of: “I wish Django had an INI instead of its settings.py”
Did you guys read the article?
It doesn’t say to “hardcode” values throughout a program rather than use config files – rather, that config files should be .py files.
The main reason to use .ini etc. files is to enable non-developers to change the values. So if that is not required then yeah why not just use python files with simple dicts etc.
I have done both. I have used configuration files that contain
var_name = value
and used “exec” to read it. It’s dead simple, quick to introduce,
and allows you to do things e.g. dependent on user name, host name
with no effort, but people may not ever notice that it’s Python code at all. Same works for shell code too.
I have also used configuration files that are “ini” files or XML, and I believe that XML sucks for configuration, but ini files have a nice Python module that handles them.
I found that XMl and “ini” are better for structured configuration. If e.g. the ini sections can reference another in values, that’s nice, or the XML you can easily find lists, etc. with lxml and xpath queries, so it’s easier to consume.
One thing that code is typically very bad at, is to check if the configuration values actually have any effect. For that, you would have to check the values it produces, so that typos do not go unnoticed. But e.g. a duplicate definition later in the file, you typically cannot warn about it.
I believe configuration files have their place. But really the best is to avoid configuration entirely. For my compiler, I e.g. only have command line arguments. Put a call to it in a script, that is the configuration file format, that I consider sufficient.
Yours,
Kay
For IPython, we switched to using Python files for configuration, but I think this was a bit hasty. So long as the user is happy to edit a text file to change settings, Python code is fine. But if you want any way to change settings inside the application (like a preferences dialog), there’s no good way to automatically write a correct Python file. With INI files, the application can easily write out a modified form.
Of course, other formats like JSON can do that as well. But INI files are very easy to work with manually.
I’ve tried .py configuration files as a replacement for .ini files in the past. Without success.
First of all I ran into the problem that I used __import__ to load the users Python script. The scripts path was passed via command line. I had a support case where the imported script didn’t seem to contain the expected settings. Reason was an old .pyc file with an identical name lying next to my start script.
That was the reason, that I switched to the “evil” exec.
But some time later I had other support request with a user complaining that a settings like temp_file = “C:\temp\log.txt” doesn’t work! Why!? – Backslash needs to be quoted (or use raw-string syntax).
There have been other issues like missing # coding: declarations if Non-ASCII characters are used.
Finally I discontinued the use of .py files as configuration file and didn’t have any trouble with this approach.
berdario,
Saying ‘serialization!=binary’ means you’re in tacit agreement with me anyway. It’s perfectly possible to serialize dictionaries into INI files. My point was hopefully obvious: how and where you store your configuration matters a lot: you can’t just shove serialized data wherever, you want in whatever format you please. I did say ‘text file’ in my post for a reason, after all. (Aside: YAML is a hot mess.)
Putting a .py file in /etc or $HOME is an obviously bad idea, which is why you don’t replace read-only configuration with a Python source file. Expecting an admin to go to /usr/lib/python2.6/site-packages/myapp/config.py just to edit the DB host because it was moved doesn’t do anything but make me a jerk. Especially when it doesn’t save me any code! So yes, putting it in a .py file is hardcoding and effectively making it “constant forever”.
Django’s settings.py isn’t meant for end-users, so it’s not the least bit relevant here.
Kay,
Reading a file and feeding it to exec is an awesome way to get hacked if the program does anything privileged. There are packages out there that allow you to have python expressions be safely limited and evaluated. The original was ‘unrepr’ which was integrated into various other things.
Code and data aren’t loaded the same way, so replacing a config file with an INI file is hardcoding the values, like it or not.
Adam, it is possible to serialize dictionaries into INI files. But I don’t think there is an equivalent of json.dumps() or json.loads(). Am I wrong? This introduces another layer – like I did here https://bitbucket.org/steko/totalopenstation/src/af7248e3e4f0/totalopenstation/utils/upref.py and I intend to get rid of it soon. Of course my code might not be perfect, but it tries to deal with the shortcomings of ConfigParser.
Dealing with the “shortcomings”, in this case, seems to you take about 20 lines, and even generalized, I think it wouldn’t be too much more. Writing a load/dump pair to take an INI file and convert to/from hierarchical dictionaries is easy.
The biggest “shortcoming” over JSON and a few other libraries is that ConfigParser will not perform automatic type conversion, but there are other libraries out there that will do that if you absolutely must have it. I’m not entirely sold on that as a must-have feature, but it’s hardly difficult to achieve.
Of course, the other libraries have their own shortcomings: ConfigParser has built-in support for loading configuration from multiple files, something I would have to write for the other libraries.
Personally I just use YAML and writing the equivalent to the simple .dump and .load calls for every possible structure and set of data types I’ve used would have been a significant amount of work. Serialization is not an inherently easy feature to implement. I like YAML because with the Python library I get serialization built-in, plus it’s decently human readable/editable, and if I ever have to import them from an app that isn’t written in Python it’s in a standard format that probably won’t result in a lot more work writing a new parser. Like with anything else I am going to go with the solution that takes two lines of code over the solution that takes 20 lines of code if the 20 line version doesn’t have any significant benefits.
I’m not sure why you believe that you would have to write a dump and load call for every structure and datatype for an INI file. It’s only a significant amount of work because you’re almost certainly going about solving the problem the wrong way.
Plus, a YAML configuration file is not a magic bullet. Sure, you get automatic type marshaling, but you lack automatic merging of files (at least 10 lines you have to write, easily more depending on your needs), built-in defaults, and case-insensitive parsing (list is not exhaustive).
My real theory, though, is that whatever you’re serializing probably really isn’t configuration data or isn’t just configuration data.
Adam Skutt:
I’ll pick only this among the many things I don’t agree you with (the other are less interesting/important or have been already dealt with by others):
you say that is bad to have .py files inside /etc, I assume this is due to security reasons:
1- almost all the files inside /etc are sensible
even if they’re not executable, changing something inside of them may expose sensitive data or be a security breach in some other way… just think about ssh, fstab, sudoers
2- for this reason, all the files have write permissions only for the root user
I agree that with a non-executable file you may have a smaller attack-surface, and I don’t think that it’s wise to use .py files everywhere willy-nilly
but in the end: I find that you should be able to trust your (read only) configuration files as much as you trust your executable code
The problem is more the fact you have no idea what you will end up adding to your search path. That’s bad. Loading an absolute path is problematic enough that I can’t recommend it (i.e., all the examples you get from a quick googling are utterly wrong) and it isn’t technically portable anyway.
As for trust, the “should” is the entire problem with that statement. Lots of websites out there about what happens when one “should” be able to trust something and actually cannot…
Even if I bought solutions to those issues, you’re still left with a file that’s a pain to manipuate externally, especially graphically. Regenerating python source is a PITA unless the source is very trivial. In which case, just about everything else (including an INI) file will be less code for everyone. They’re just simply an inferior solution. The only times they have any merit is for library “configuration” files which are never seen and edited by end-users/admins, but only developers.
I didn’t word that very clearly, but I meant that rolling my own dump and load for INI instead of using the already existing ones for YAML would have been more lines of code because I have used various random data types including dictionaries inside lists, all sorts of different types of literals that need to be parsed correctly, etc. Look at the code for YAML’s dump and load routines if you want proof of it doing things that you can’t do in 20 lines of code. My point is that dump/load for arbitrary Python structures is a feature that I am using and someone has already written code to do it so I might as well use it rather than trying to re-invent the wheel.
Right nothing’s a magic bullet, and since I’m using a general purpose YAML library for config files it doesn’t have any config file specific stuff in it. There’s always trade-offs. If I was doing more complicated types of things for config files that needed finer grained control I might look into ConfigParser and whatnot but generally I just need to dump my internal settings data into a file that I can edit so YAML works for that.
I’ve used YAML for all sorts of things in Python, the same way that I used XML for all sorts of things in C# and whatnot, but the serialization aspect makes for a lot less boilerplate code in most cases in general. If I can avoid writing parsing and serialization code in a lot of places where I would otherwise have to write it I might as well save myself the time. YAML’s a tool that’s there for me to use and one of the things I use it for is config files because it works pretty well for that for a lot of different structures and types of config files. If someday I run into a config file situation where YAML doesn’t work well then I’ll use something else or write some code to make YAML do it, whichever makes more sense to do, but for now it does what I need it to do and doesn’t get in my way.
As I already mentioned, there are several libraries out there that parse INI files and do type marshalling. cherrypy.reprconf is one. ConfigObj is another. If that’s your killer feature, it is trivial to have.
@Thomas Kluyver
After reading this comment I asked “How to save Python in-memory dictionary to a file as a Python source code?” question at Stackoverflow (http://stackoverflow.com/q/11594304/95735).
Ignacio Vazquez-Abrams answered:
“Assuming it uses only “basic” Python types, you can write out the repr() of the structure, and then use ast.literal_eval() to read it back in after.”
It looks like saving a config dict is rather easy…
@Ken Whitesell
The first 3 rules are being obeyed the same for configuration kept in python files as with files in any other format.
The opposite view: “Python is Not a Configuration File Format” by Roberto Alsina – http://lateral.netmanagers.com.ar/weblog/posts/python-is-not-a-configuration-file-format.html