Code, data and interactive programming

"Are code and data the same thing?" I haven't conducted a poll, but I think the following answers are the most popular ones:

  • "What?" (I don't know about universal Turing machines)
  • "Sure!" (I know about universal Turing machines)

My answer is "No". I'll now try to explain briefly why my opinion makes sense in general. After that, I plan to get to the point, which is how the code/data distinction matters in interactive programming environments.

I think that, um, everything is data, at least everything I can think about. I mean, the only things that can technically enter my brain are either data or cigarette smoke, because I don't do drugs. And I hope that the effect of passive smoking is negligible, so it's just data.

In particular, code is data. But not all data is code. Code is a special kind of data, that looks like this:

  • There are lots of blocks.
  • Each block defines the value of something.
  • The blocks depend on each other, and the dependencies can be cyclic.

What this means, and of course everybody knows it, is that you can't make any sense of code in the general case. That is, the only way to compute the values defined by the blocks of code is to "run" the code - keep chasing the links between the blocks, computing the values they define as you go. You can't even prove that this process will terminate given an arbitrary bulk of code, not to mention proving its correctness.

Now, an image, for example, isn't code. Well, philosophically, it is, because if they show you an image and it's really ugly, you'll say "ewww". So the image was in fact a program giving instructions to your brain. The output of your brain's image compiler is the following code in the human body assembly language:

MOV R0, dev_mouth
MOV R1, disgust_string
JMP write
RET
disgust_string:
.asciz "ewww"

More interestingly, you can write a program that processes images, and this particular image may be the one that makes your program so confused that it never terminates. However, this doesn't mean that the image itself is "code". The image doesn't have interconnected blocks defining values. Even if the image is a screenshot of code.

An image is a two-dimensional array of pixels, a nice, regular data structure. You don't have to "run" it in order to do useful things with it, like computing its derivatives or e-mailing it to your friends so they'll go "ewww". And programs doing that can be proven to terminate, unless you have an infinitely slow connection to the outgoing mail server.

So what I'm saying is, code is a special kind of data, having blocks which define values and depend on each other. Does it really matter whether a particular piece of data is "code" according to this definition? I think it does. One reason is the above-mentioned fact that you can't really make sense of code. Many people realize the practical drawbacks of this, and so in many contexts, they use data-driven programming instead of the arguably more natural "code-driven" programming.

Everything you represent as "data" can be processed by many different programs, which is good. Everything you represent as "code" can only be processed by a kind of interpreter, which is bad. I'm not talking about the difficulty of parsing the syntax, which doesn't exist with Lisp or Forth, isn't a very big deal with C or Java and is a full-featured nightmare with C++ or Perl. I'm talking about the semantics - for many purposes, you can't really "understand" what you've parsed without running it, and this is common to all Turing-complete languages.

But this isn't going to be about the inability to analyze code. This is going to be about the somewhat more basic problem with code - that of blocks which point to each other. In order to explain what I mean, I'll use the example of 3 interactive programming environments - Matlab, Unix shells and Python, listed in decreasing order of quality (as interactive environments, not programming languages).

Interactive programming is the kind of programming where the stuff you define is kept around without much effort on your behalf. The other kind of programming is when you compile and run your code and it computes things and exits and they are gone. Clearly interactive programming is nicer, because it makes looking at data and trying out code on it easy.

Or so it should be; in practice, it looks like more people prefer "batch programming", so there might be some drawbacks in the actual interactive environments out there. What makes for a good interactive environment, and what spoils the fun? Let's look at some well-known gotchas with existing environments.

Some of the most upset people I've seen near computers were the ones that had a Matlab session running for months when their machine crashed. It turned out that they had a load of data there - measurements, results of heavy computations, symbolic equations and their solutions - and now it's all gone. GAAA!! This doesn't happen with batch programming that much, because you send the output of programs to persistent storage.

This problem, nasty as it may be, looks easy to fix - just have the system periodically save the workspace in the background. Perhaps Matlab already has this. I wouldn't know, because I tend to manually save things once in a few minutes, since my childhood trauma of losing a file I loved. Anyway, this doesn't look like an inherent problem of interactive computing, just an awfully common implementation problem. For example, do Unix shells, by default, save the command history of each separate concurrent session you run? I think you know the answer.

Speaking of Unix shells. Ever had the pleasure of typing "rm -rf *" in the wrong directory because of command completion from history? GAAA!! OK. Ought to calm down. Let's do Fault Analysis. Why did this happen? The command string with "rm" in it is, basically, code; shell code invokes processes. This code depends on another piece of code, the one that determines the current directory. The command string doesn't have a fixed meaning - you must run getcwd in order to figure it out.

The shell couldn't really warn us about the problem, either. That's because the meaning of "rm" is defined by the code at /bin/rm (or by some other program in the $PATH which happens to be called "rm"). Since the shell can't understand that code without running it, it doesn't have an estimation of the potential danger. And if the shell warned us about all commands completed from history that originally ran in a different directory than the current one, the completion would likely be more annoying than useful.

At some point I've got fed up with Unix shells, and attempted to switch to a Python shell. I tried IPython and pysh, and I still use IDLE at home on my XP box. I ought to say that Python shells suck, and I don't just mean "suck as a replacement for a Unix shell", but also "suck as a way to develop Python code". The single biggest problem is that when you change your code, you must reload modules. It's unclear which modules should be reloaded, there's no way to just reload everything, and ultimately you end up with a mix of old code and new code, which does something, but you aren't quite sure what exactly.

Die-hard Pythonistas refuse to acknowledge there's a problem, though they do bend their programming style to work around it. What they do is they write all of their code in one file, and use execfile instead of import to make sure everything is indeed redefined, the Zen of Python with its love of namespaces be damned. Sure, an interesting project in Python can be just 5000 lines worth of code, but I don't like to navigate in a file that big. And sometimes you do need more lines, you know.

Another thing they do is implement __repr__ in their classes so that print displays their objects, and they'll invest a lot of effort into making eval(repr(obj)) work. The fact that eval'able strings aren't necessarily the most readable way to produce debug prints doesn't seem to bother them. Nor do the contortions they have to go through to solve the prosaic problem of making references to other objects display reasonably. One way to do it is to use dictionary keys instead of pointers, so that member object references aren't expanded into a full object description when they are printed. If you don't know why they're doing this, you'll find their code fairly puzzling.

I find the struggle to make interactive Python programming work very depressing. It reminds me of me, before the invincible idiocy of C++ crushed my spirit. People have a tendency to assume that things are well thought-out and hence should work.

We have this extremely widespread language called C++, and it's centered around type-based static binding. And it's easy to see how this could help a compiler spot errors and optimize the code. Therefore, this programming style can be a good way of writing software, if applied consistently. Ha!

We have this Python language, and several shells for it. Quite obviously, interactive programming is a good way to speed up the development cycle. Therefore, adapting our Python code for interactive programming will pay off, if we do it consistently. Ha!

But I digress. This isn't about the trusting nature of software developers, nor is it a comparison between C++ and Python, mind you. They're hard to compare, since they are very different beasts: Python is a programming language, and C++ is a karmic punishment. So I should get back to the topic of interactive programming.

Here's my opinion on the example programming environments I used in this entry.

Matlab is a great one, unless you lose your workspace. I used it for a while several times and it just never itched, and nothing went wrong.

Unix shells are good in terms of their ability to preserve your data (everything is a flat, self-contained string of bytes). I'd love them if they didn't suck so badly as programming languages. Since they do, I only use shell scripting for one-shot throwaway things, like debugging (fiddling with log files and core dumps).

Python is awful. So when I'm on Unix, I run Python processes from the shell, and sometimes use Python's reflection to make my batch programming just a bit more interactive. For example, if you have a Python function f(a,b,c), you can have your command line parser introspecting its arguments and adding the command line options -a, -b and -c.

So much for specific examples. What's the generic rule? I think it's this: pointer-happy systems can't be interactive. That's because interactive programming is about saving your data objects. And this is only useful when the current value of a preserved object is clear to you. Otherwise, you can't use the object, so what's the point?

When you have pointers in your objects, the objects aren't self-contained, and when the pointed objects are redefined, it isn't clear what should happen with the pointing objects. Should they point to the new thing or the old thing? Either answer can be counter-intuitive to you, and the whole point of interactive programming is to let you enter a state of flow, and if you scratch your head and can't easily guess what the old object means right now, you aren't in a state of flow.

In particular, pointers to code are the worst kind of pointers, because code is the most intertwined data of your program, and a pointer to a single block of code basically points to the entire code base. When an object points to an old function, and the function was redefined, and the system keeps the old definition around, you may easily get a call sequence with both the new function and the old function, which is probably ridiculous. And if you make the old object point to the new function, the function might simply fail to work with that object, and you just can't tell whether it will work or not without running it, remember?

For example, Python is a good interactive calculator, because arithmetic expressions are self-contained. Even if they do contain references to variables, it's fairly clear what happens when you change a variable - all expressions mentioning it will now recompute differently. Note that arithmetic expressions aren't Turing-complete and can't have cyclic references. Now, if you use Python's object-oriented features, then you have objects which point to their class definition which is a bunch of code pointers, and now when you reload the module defining the class, what good are your old objects?

This is why I don't believe in Numeric Python. The whole point of using Python is to use its pointer-happy features, like classes and hairy data structures and dynamically defined functions and stuff. Numeric programming of the kind you do in Matlab tends to use flat, simple objects, which has the nice side-effect of making interactive programming work very well. If you use a numeric library inside a pointer-happy language like Python, quite soon the other libraries you use will make interactive programming annoying. So you'll either move to batch programming or suffer in denial like the die-hard Python weenie you are. Someone using Matlab will be better off, since interactive programming is more productive than batch programming, when it works well.

So at the bottom line, I think that interactive programming has limited applicability, since "general-purpose" programming environments pretty much have to be pointer-happy. That is, if a language doesn't make it very easy to create a huge intertwined mess of code and data pointers, I don't see how it can be usable outside of a fairly restricted domain. And even in the "flat" environments like Matlab or Unix, while old data objects can be useful, old commands are, and ought to be, a two-edged sword. Because they are code, and code is never self-contained and thus has a great potential to do the wrong thing when applied in a new context.

This whole claim is one of those things I'm not quite sure about. From my experience, it took me quite some time to realize which interactive features help me and which get in the way with each environment I tried. So I can't know what happens in Lisp or Smalltalk or Tcl or Excel or Emacs, in terms of (1) applicability to "general-purpose" tasks, (2) the amount of self-contained data compared to the kind with pointers, especially pointers to code and (3) the extent to which the thing is itchy and annoying at times. So comments are most welcome. In particular, if you know of an environment that, put simply, isn't more itchy than Matlab but isn't less general-purpose than Python, that would be very interesting.

18 comments ↓

#1 cmm on 02.16.08 at 10:12 am

when the pointed objects are redefined, it isn’t clear what should happen with the pointing objects. Should they point to the new thing or the old thing? Either answer can be counter-intuitive to you

Weeeell. I was completely with you until this passage. This passage resembles the following typical sentiment oftentimes uttered by FP weenies: "Side effects are awful! You never know what will change as a result of a side effect. Should the change affect the running code or shoud it not? Either answer can be counter-intuitive to you!". Well: nope. If side-effects are properly contained (or, say, wrapped in transactions or in a dataflow-based consistency-preserving mechanism, or how about just being explicit and clearly labeled?), there is no problem.

Same with redefinition. Redefinition can have clearly defined semantics. Imagine that: it can be that you can know exactly what will happen when you redefine something!

I don't know how things are in the Python world, but in the reportedly autistic and ghettoish but still quite comfortable Lisp micro-universe you have basically two semantic models of redefinition.

One model is based on the concept of symbols. For instance, calling a function by name (as opposed to FUNCALL'ing an object, which is slightly more syntactically cumbersome — the cornerstone of usability is having right defaults, see) always indirects through the symbol function slot (I'm glossing over some insignificant performance-oriented details here, but never mind). This means that when you redefine a function, all the call sites magically switch to using the new definition. Kwadratisch, praSimple, convenient, interaction-friendly.

The other model is used in CLOS. It in turn has two facets:

Class redefinition is explicitly supported, by having all existing instances become of the new class. All of them, effective immediately. The process is customizable through a protocol. The referers, of course, see the changes immediately.

Likewise, you can change the class of a specific instance. This, too, will be immediately reflected everywhere the instance is used, and this process too can be customized to your liking.

The take-home message is, I suppose, that many seemingly scary things can work just fine if they are explicitly designed into the language/environment/whatever.

That said, I do value the ability to restart my Lisp session once in a while. It's not the redefinition semantics that is the problem here, but rather the fact that starting over can often be radically simpler than cleaning the mess up from inside, as it were. Not to mention the need to check whether your stuff will even load up given a clean starting image.

#2 Yossi Kreinin on 02.16.08 at 10:43 am

Foist, a bunch of random reactions, and then, to the point.

* Side effects are evil :) Technically, I ain't no FP weenie because everything I actually use has side effects. But when you do compilers or debug stuff, side effects sure are damn evil. It's a problem… I'm still thinking about it.

* Lisp isn't Python or Unix or Matlab; Lisp isn't punk technology (hence the lack of popularity). I'll blog about it some day. Anyway, the fact that CLOS has a "customizable protocol" to help redefine classes during interactive development convinces me again that no, it ain't no punk technology… I think that what happens in Python basically is, everything is an object and when there are no references to it, it's garbage-collected; classes are just one kind of object, and there's no such thing as "redefeinition" - "class A" is basically assignment, just like "A = 5". Makes interactive work a problem, but is very straightforward to implement. Three-chord music. Welcome to the future.

Now, to the point:

First of all, I think that Lisp is less pointer-happy than Python, Perl and their ilk; AFAIK, class objects don't point to methods, for one thing. But, what does #'sym do? Doesn't it give you a handle to use for funcall, and doesn't it force Lisp to keep the old code block around? Or did you actually mean to say just that (funcall effectively calls by pointer, normal calls by name?)

I don't really understand how code works in Lisp; I know (or feel that I know) the C model (where you compile and link everything and don't even dream about "redefining" anything at run time), and the Python model (where code is one kind of object, basically, and you can point to it and the system collects the garbage). Is the Lisp model much different from the Python model? It seems to be, if CLOS can speak about "redefining". Or is it about the same, with CLOS classes simply keeping track of living objects and intercepting redefinition and running code on those objects upon redefinition?

BTW, I think that Python handles function calls just the way Lisp does according to your description: when you call functions by name, it locates the most updated definition at run time, and when you push function references to data structures (the way you'd do with #'sym), it pushes a reference to the definition rather than the name.

So currently the picture to me is:

* Lisp is less pointer-happy than Python because objects don't point to code blocks.
* Lisp is otherwise similar to Python in the sense that you can end up having multiple definitions of the function hanging around; maybe it's not a problem because you don't have that much function-valued variables.

Is this about right?

#3 cmm on 02.16.08 at 11:13 am

Lisp isn’t Python or Unix or Matlab; Lisp isn’t punk technology (hence the lack of popularity). I’ll blog about it some day.

Just link to Worse is Better, that's gonna save you the trouble. :)

Anyway, the fact that CLOS has a “customizable protocol” to help redefine classes during interactive development

Well, the protocol is mostly a show-off item (and the result of the fact that the CLOS designers were being rather unusually thorough). Pretty much no practical situation needs it, because things Just Work.

But, what does #’sym do? Doesn’t it give you a handle to use for funcall, and doesn’t it force Lisp to keep the old code block around?

#'sym gives you whatever sits in the function slot of 'sym. You can also use FUNCALL on 'sym itself, though, if you'd like possible redefinitions of the function to be picked up at that particular call site.

Is this about right?

Yes. Indeed, Lisp's runtime model is basically the same as Python's (manifestly-typed objects, lots of pointers, GC, the works); the interactive usability differences are mostly due to peculiarities of the evaluation semantics and the object system.

(I'm not sure how the fact that in Python methods "belong" to classes changes the picture one way or the other, though, but perhaps the understanding will strike me a minute after I submit this comment. Happens all the time, no reason to wait).

#4 Yossi Kreinin on 02.16.08 at 11:37 am

If (funcall 'sym) works, what's the point of having #'sym - to get in the way of interactive work? :)

The thing with Python classes that I meant wasn't the single-dispatch thing (one way to interpret "methods belong to classes"), but the fact that objects point to classes and classes point to code blocks, so when you "redefine" classes, the old objects keep pointers to the old code blocks.

#5 Yossi Kreinin on 02.16.08 at 11:57 am

Now that I think of it. So CLOS updates all objects upon class redefinition (I wonder how it keeps track of them and what happens if you have threads in the background, but it's just perverse, forget it).

Well, I add a slot to a class vector3d, which previously kept x, y, z, and in this slot I cache the vector magnitude. How will CLOS know to set the new slot to the right value in the old object? What if I had a bug in the magnitude computation code, and now I reload the definition? How will it know to recompute that slot? I mean, how is it going to reestablish class invariants for old instances when class invariants change? I don't think it can be mechanically solved. Is it "really" a non-problem, or just seems to be a non-problem in practice?

The thing is, if you have untyped data, then "it means the same thing all the time", but if you have types and type-based binding, no matter how you do it, it basically means that objects /point/ to code which expects certain things and these things are no longer true and stuff. And it gets in the way of redefining. I think.

#6 cmm on 02.16.08 at 12:02 pm

If (funcall ’sym) works, what’s the point of having #’sym - to get in the way of interactive work?

I see the smilie, but let me indulge at being boring: there are anonymous functions (made with LAMBDA), there are locally-scoped functions (introduced by FLET etc.) — in short, there is no sense at all to restrict FUNCALL to just symbols.

when you “redefine” classes, the old objects keep pointers to the old code blocks.

Er. Don't they redirect through some sort of vtable that sits in the class object?

#7 Yossi Kreinin on 02.16.08 at 12:11 pm

Oh yeah. LAMBDAs. Should have thought about those. They ought to be kept in multiple versions, BTW, since you can't redirect via persistent something like a symbol, right?

Python classes - yeah, they redirect via the class dictionary, so instance objects keep indirect pointers to code blocks. Nobody keeps track of the class objects; if something did, it could change the class object without changing all the instance objects.

#8 cmm on 02.16.08 at 12:25 pm

They ought to be kept in multiple versions, BTW, since you can’t redirect via persistent something like a symbol, right?

Sorry, who ought to be kept in multiple versions by whom?  You mean what happens if you "redefine" an anonymous function?  But you cannot redefine something that is anonymous.  By, um, definition.  You can only make a new object, and perhaps strike it lucky and make sure to change all the references to the old one by references to the new one. :)

If you are gently prodding me to admit that Lisp cannot support every possible redefinition one can think of — well, that's obviously true.  All it does is making sure that when you want something to be redefinable, you can easily have that, and that the syntax and evaluation rules are designed to make the redefinable cases the natural (that is, more concise and readable than the alternative) choice in most cases.

they redirect via the class dictionary, so instance objects keep indirect pointers to code blocks. Nobody keeps track of the class objects; if something did, it could change the class object without changing all the instance objects.

So basically the main missing thing is a way to redefine a class while keeping the identity of the class dictionary?

#9 Yossi Kreinin on 02.16.08 at 12:31 pm

If you are gently prodding me to admit that Lisp cannot support every possible redefinition one can think of — well, that’s obviously true.

No, I sincerely wanted to make sure I understand what happens :)

So basically the main missing thing is a way to redefine a class while keeping the identity of the class dictionary?

Yep. But then I think you have more complicated problems, that of updating the old instances according to the new class invariants, and I wonder how CLOS can handle that; I sneaked in a comment about it above. The bottom line is that I think type-based binding is basically a way of linking objects to code, and it can make objects "lose meaning" at the logical level no matter how redefinition is handled.

#10 cmm on 02.16.08 at 12:57 pm

(Found no way to reply to an individual comment.  Oh, and some sort of comment notification thing would be great.  Hey, ever heard of LiveJournal? :))

So CLOS updates all objects upon class redefinition (I wonder how it keeps track of them and what happens if you have threads in the background, but it’s just perverse, forget it).

No "keeping track" needed.  The usual implementation strategy is to check whether the class is current upon each access to an instance, so you get a certain penalty for working with CLOS (which is why frobbing slots in inner loops is not a very good idea) but most of the magic is lazy.

Well, I add a slot to a class vector3d, which previously kept x, y, z, and in this slot I cache the vector magnitude. How will CLOS know to set the new slot to the right value in the old object?

This is where that fancy redefinition protocol comes in.  This margin is sort of too small for the details to fit, but basically you have four main entry points: fresh object initialization, reinitialization, update for redefined class and update for a different class.  You are told what class changed and to what (where applicable), which slots are initialized explicitly, etc.

What if I had a bug in the magnitude computation code, and now I reload the definition? How will it know to recompute that slot? I mean, how is it going to reestablish class invariants for old instances when class invariants change? I don’t think it can be mechanically solved. Is it “really” a non-problem, or just seems to be a non-problem in practice?

I don't know whether this can be objectively measured to be a problem or a non-problem, but of course it's not done by magic.  You need a dataflow mechanism to automate it (such things exist, and even come with convenient CLOS metaclasses).  I think Excel gets this right out of the box.

it basically means that objects /point/ to code which expects certain things and these things are no longer true and stuff. And it gets in the way of redefining. I think.

The general redefinition problem is, of course, intractable.  I think it is not a good reason to give up on interactive programming altogether, because certain kinds of disciplined redefinition are tractable, useful, can be made convenient and happen to cover most of the need.

#11 Yossi Kreinin on 02.16.08 at 1:02 pm

I guess the redefinition protocol is for the cases where you perform a surgery on a living system that you really don't want to restart - otherwise I'd probably restart the shell at this point… It's cool that it is supported though, for the surgical cases - no good way to do that in most (all?) environments.

#12 kragen on 04.24.08 at 5:25 am

So in general the code-update problem is the same as the serialization problem; a brute-force way to do a code update is to serialize your universe and then deserialize it. In fact, that's pretty much the normal way that people upgrade software in the C world.

There's a tradeoff between losing too much state at upgrade time and having to write schema migration scripts for too many classes.

The equivalence works the other way too; if you can locate all of the objects that somebody might want to update to new code, and run a migration script on each one with introspective knowledge of what was in the object before, then you can serialize the objects in the migration script.

Anyway, I think your proposed "pointer-happy systems can't be interactive" rule is bosh. It's true that it can be pretty hard to figure out how an object got into a weird state after you change the code underneath it a few times, but that's a pretty rare frustration, in my limited experience of Squeak and SBCL. Squeak and Erlang are two of the most pointer-happy programming environments, and also two of the most interactive. Elisp is full of objects that refer to each other, but it's radically interactive. Supposedly the old LispM systems were even more interactive without being any less pointer-happy.

I'd like to suggest an alternative explanation. Maybe interactivity is a sine qua non for novice programming environments, and so the best interactive programming environments happen to be novice programming environments; and novices are allergic to pointers. (At least that's what Joel claims, and he helped design one of the best ones: Excel.)

#13 Yossi Kreinin on 04.24.08 at 11:21 am

Well, there are different kinds of pointer; I meant, first and foremost, code pointers, which have the biggest redefinition/serialization problems. I don't think Erlang is that pointer-happy in this regard.

I've worked a lot with Matlab, tcsh and IDLE/IPython. The redefinition problems in Python tend to grow as your program grows, unless you bend and twist your style to work around it, and even then it still sucks. Allergic to pointers I hardly am.

#14 kragen on 04.24.08 at 10:21 pm

In Squeak, almost all of the code pointers are in classes; although it has first-class closures, they almost never live very long. Classes live in a global dictionary, so they're easy to find. When you define a menu item that invokes some arbitrary code, you generally define it by giving it a class and a method name. So changing the definition of a method pretty much just does the right thing.

Similarly, in Lisp, most functions live in some module namespace, and are referred to with an indirection through that namespace (just as in Python). This makes function calling slow but means that you can change what function gets called.

In Erlang, which is a language I don't actually know, I think you're encouraged to pass module names and function names, instead of actual functions, when you define long-running processes, specifically so that if you reload the module, the long-running process will start running the new version of the code after the next time it loops at the top level.

#15 wtetzner on 05.03.09 at 6:26 pm

"The newbie realizes that the difference between code and data is trivial. The expert realizes that all code is data. And the true master realizes that all data is code."

http://blogs.msdn.com/sriram/archive/2006/01/15/lisp_is_sin.aspx

#16 Peufeu on 09.28.09 at 1:22 am

It comes down to the fact that Python has no become() method, like smalltalk has, so if you change a class at runtime, new instanciations will find the old class, but already instanciated objects will point to the old code.

So, reloading code works if you access this code through its name (module.attribute) but not otherwise (if a pointer to the old code was kept).

Even so, there are problems :

- What if you update the code for a class, and add a data member ? Are you going to update all the existing objects ? Are you going to create a runtime code versioning scheme ? What if the information to put in this data member is only available at object construction ?

- What if you update 2 different pieces of code that depend on each other and you change the interface between them ? Old bits of this code that still linger and want to call the new code will fail. New code that happens to call an old bit of code will fail. And there is the question of doing the update atomically.

So it looks good in practice, until you consider these evil dependencies.

Reloading code is very easy in Python, I've used it successfully several times, but it only works well in certain circumtances :

- you don't change interfaces, or you reload atomically all the stuff that depends on each other

- whatever contains pointers is short lived and will very soon use the new code

For instance : in a Python web server, you can easily reload the code for a page, and it's essential that you can do so (who would want to restart the server everytime a page is modified ?). Lock all requests for this page (and wait for current requests to finish), reload the code, unlock.

You can't easily reload the web server logic itself, though, because it lives for a long time.

And if you have session objects stored in RAM, you can't reload the code for them either, unless you implement code versioning, or your session objects are very simple mappings (with no code like class instances in them).

I wrote a bittorrent tracker as an asynchronous HTTP server. Some parts of it are reloadable at run-time (error strings, stats pages, the parts that control the policy applicable to users, etc). For simplicity I didn't put this reloadable stuff in classes, they're just a bunch of functions. Since functions are called by name (module.attribute) in Python, and I didn't keep function pointers around, it worked very well.

I wrote this 4 years ago. It could handle more than 1000 HTTP requests per second. Yeah I could have written it in C++, lol, what for ? In Python it took me a week, and once debugged, the thing just runs, never crashes (Python is a managed environment, it will log an exception, not crash). I've left this team since 4 years now, and none of the guys know Python. All they know is how to stop and restart the tracker when there is a server migration, lol. Last time I checked, this python code has been running for 6 months straight without crashing or manual restart. Try that, C++.

In Python, you could pickle (serialize) your entire environment, save it, reload your code, and reload all your environment. Code is not saved : class instances are serialized with the module.classname in them to be able to instantiate them back. If you want to update the code for all current in-memory objects, you could do it like that (and it provides a simple way to save a running session).

Don't wait until PyPy catches up, though, it can serialize a running thread and teleport it to another box…

#17 Peufeu on 09.28.09 at 1:23 am

Oops, I mixed "old" and "new" in some places.

Pickling replaces pointers to code with name-based references, that's what you need I guess.

#18 Clos initialization protocol - Topic Research, Trends and Surveys on 07.16.10 at 7:43 pm

[...] enter my brain are either data or cigarette smoke, because I don't do drugs. And I hope … Read More RECOMMENDED BOOKS REVIEWS AND OPINIONS computer abbreviations market research, [...]

Leave a Comment