Code, data and interactive programming
"Are code and data the same thing?" I haven't conducted a poll, but I think the following answers are the most popular
ones:
- "What?" (I don't know about universal Turing machines)
- "Sure!" (I know about universal Turing machines)
My answer is "No". I'll now try to explain briefly why my opinion makes sense in general. After that, I plan to get to the
point, which is how the code/data distinction matters in interactive programming environments.
I think that, um, everything is data, at least everything I can think about. I mean, the only things that can technically
enter my brain are either data or cigarette smoke, because I don't do drugs. And I hope that the effect of passive smoking is
negligible, so it's just data.
In particular, code is data. But not all data is code. Code is a special kind of data, that looks like this:
- There are lots of blocks.
- Each block defines the value of something.
- The blocks depend on each other, and the dependencies can be cyclic.
What this means, and of course everybody knows it, is that you can't make any sense of code in the general case. That is, the
only way to compute the values defined by the blocks of code is to "run" the code – keep chasing the links between the blocks,
computing the values they define as you go. You can't even prove that this process will terminate given an arbitrary bulk of
code, not to mention proving its correctness.
Now, an image, for example, isn't code. Well, philosophically, it is, because if they show you an image and it's really ugly,
you'll say "ewww". So the image was in fact a program giving instructions to your brain. The output of your brain's image
compiler is the following code in the human body assembly language:
MOV R0, dev_mouth
MOV R1, disgust_string
JMP write
RET
disgust_string:
.asciz "ewww"
More interestingly, you can write a program that processes images, and this particular image may be the one that makes your
program so confused that it never terminates. However, this doesn't mean that the image itself is "code". The image doesn't have
interconnected blocks defining values. Even if the image is a screenshot of code.
An image is a two-dimensional array of pixels, a nice, regular data structure. You don't have to "run" it in order to do
useful things with it, like computing its derivatives or e-mailing it to your friends so they'll go "ewww". And programs doing
that can be proven to terminate, unless you have an infinitely slow connection to the outgoing mail server.
So what I'm saying is, code is a special kind of data, having blocks which define values and depend on each other. Does it
really matter whether a particular piece of data is "code" according to this definition? I think it does. One reason is the
above-mentioned fact that you can't really make sense of code. Many people realize the practical drawbacks of this, and so in
many contexts, they use data-driven programming instead of the arguably more natural "code-driven" programming.
Everything you represent as "data" can be processed by many different programs, which is good. Everything you represent as
"code" can only be processed by a kind of interpreter, which is bad. I'm not talking about the difficulty of parsing the syntax,
which doesn't exist with Lisp or Forth, isn't a very big deal with C or Java and is a full-featured nightmare with C++ or Perl.
I'm talking about the semantics – for many purposes, you can't really "understand" what you've parsed without running it, and
this is common to all Turing-complete languages.
But this isn't going to be about the inability to analyze code. This is going to be about the somewhat more basic problem
with code – that of blocks which point to each other. In order to explain what I mean, I'll use the example of 3 interactive
programming environments – Matlab, Unix shells and Python, listed in decreasing order of quality (as interactive
environments, not programming languages).
Interactive programming is the kind of programming where the stuff you define is kept around without much effort on your
behalf. The other kind of programming is when you compile and run your code and it computes things and exits and they are gone.
Clearly interactive programming is nicer, because it makes looking at data and trying out code on it easy.
Or so it should be; in practice, it looks like more people prefer "batch programming", so there might be some drawbacks in
the actual interactive environments out there. What makes for a good interactive environment, and what spoils the fun? Let's
look at some well-known gotchas with existing environments.
Some of the most upset people I've seen near computers were the ones that had a Matlab session running for months when their
machine crashed. It turned out that they had a load of data there – measurements, results of heavy computations, symbolic
equations and their solutions – and now it's all gone. GAAA!! This doesn't happen with batch programming that
much, because you send the output of programs to persistent storage.
This problem, nasty as it may be, looks easy to fix – just have the system periodically save the workspace in the background.
Perhaps Matlab already has this. I wouldn't know, because I tend to manually save things once in a few minutes, since my
childhood trauma of losing a file I loved. Anyway, this doesn't look like an inherent problem of interactive computing, just an
awfully common implementation problem. For example, do Unix shells, by default, save the command history of each separate
concurrent session you run? I think you know the answer.
Speaking of Unix shells. Ever had the pleasure of typing "rm -rf
*" in the wrong directory because of command completion from history? GAAA!! OK. Ought to calm down. Let's do
Fault Analysis. Why did this happen? The command string with "rm" in it is, basically, code; shell code invokes processes. This
code depends on another piece of code, the one that determines the current directory. The command string doesn't have a fixed
meaning – you must run getcwd in order to figure it out.
The shell couldn't really warn us about the problem, either. That's because the meaning of "rm" is defined by the code at
/bin/rm (or by some other program in the $PATH which happens to be called "rm"). Since the shell can't understand that code
without running it, it doesn't have an estimation of the potential danger. And if the shell warned us about all commands
completed from history that originally ran in a different directory than the current one, the completion would likely be more
annoying than useful.
At some point I've got fed up with Unix shells, and attempted to switch to a Python shell. I tried IPython and pysh, and I
still use IDLE at home on my XP box. I ought to say that Python shells suck, and I don't just mean "suck as a replacement for a
Unix shell", but also "suck as a way to develop Python code". The single biggest problem is that when you change your code, you
must reload modules. It's unclear which modules should be reloaded, there's no way to just reload everything, and ultimately you
end up with a mix of old code and new code, which does something, but you aren't quite sure what exactly.
Die-hard Pythonistas refuse to acknowledge there's a problem, though they do bend their programming style to work around it.
What they do is they write all of their code in one file, and use execfile
instead of import
to make
sure everything is indeed redefined, the Zen of Python with its love of
namespaces be damned. Sure, an interesting project in Python can be just 5000 lines worth of code, but I don't like to navigate
in a file that big. And sometimes you do need more lines, you know.
Another thing they do is implement __repr__
in their classes so that print
displays their objects,
and they'll invest a lot of effort into making eval(repr(obj))
work. The fact that eval'able strings aren't
necessarily the most readable way to produce debug prints doesn't seem to bother them. Nor do the contortions they have to go
through to solve the prosaic problem of making references to other objects display reasonably. One way to do it is to use
dictionary keys instead of pointers, so that member object references aren't expanded into a full object description when they
are printed. If you don't know why they're doing this, you'll find their code fairly puzzling.
I find the struggle to make interactive Python programming work very depressing. It reminds me of me, before the invincible
idiocy of C++ crushed my spirit. People have a tendency to assume that things are well thought-out and hence should work.
We have this extremely widespread language called C++, and it's centered around type-based static binding. And it's easy to
see how this could help a compiler spot errors and optimize the code. Therefore, this programming style can be a good way of
writing software, if applied consistently. Ha!
We have this Python language, and several shells for it. Quite obviously, interactive programming is a good way to speed up
the development cycle. Therefore, adapting our Python code for interactive programming will pay off, if we do it consistently.
Ha!
But I digress. This isn't about the trusting nature of software developers, nor is it a comparison between C++ and Python,
mind you. They're hard to compare, since they are very different beasts: Python is a programming language, and C++ is a karmic
punishment. So I should get back to the topic of interactive programming.
Here's my opinion on the example programming environments I used in this entry.
Matlab is a great one, unless you lose your workspace. I used it for a while several times and it just never itched, and
nothing went wrong.
Unix shells are good in terms of their ability to preserve your data (everything is a flat, self-contained string of bytes).
I'd love them if they didn't suck so badly as programming languages. Since they do, I only use shell scripting for one-shot
throwaway things, like debugging (fiddling with log files and core dumps).
Python is awful. So when I'm on Unix, I run Python processes from the shell, and sometimes use Python's reflection to make my
batch programming just a bit more interactive. For example, if you have a Python function f(a,b,c)
, you can have
your command line parser introspecting its arguments and adding the command line options -a, -b and -c.
So much for specific examples. What's the generic rule? I think it's this: pointer-happy systems can't be
interactive. That's because interactive programming is about saving your data objects. And this is only useful when the
current value of a preserved object is clear to you. Otherwise, you can't use the object, so what's the point?
When you have pointers in your objects, the objects aren't self-contained, and when the pointed objects are redefined, it
isn't clear what should happen with the pointing objects. Should they point to the new thing or the old thing? Either answer can
be counter-intuitive to you, and the whole point of interactive programming is to let you enter a state of flow, and if you
scratch your head and can't easily guess what the old object means right now, you aren't in a state of flow.
In particular, pointers to code are the worst kind of pointers, because code is the most intertwined data of your
program, and a pointer to a single block of code basically points to the entire code base. When an object points to an old
function, and the function was redefined, and the system keeps the old definition around, you may easily get a call sequence
with both the new function and the old function, which is probably ridiculous. And if you make the old object point to the new
function, the function might simply fail to work with that object, and you just can't tell whether it will work or not
without running it, remember?
For example, Python is a good interactive calculator, because arithmetic expressions are self-contained. Even if they do
contain references to variables, it's fairly clear what happens when you change a variable – all expressions mentioning it will
now recompute differently. Note that arithmetic expressions aren't Turing-complete and can't have cyclic references. Now, if you
use Python's object-oriented features, then you have objects which point to their class definition which is a bunch of code
pointers, and now when you reload the module defining the class, what good are your old objects?
This is why I don't believe in Numeric Python. The whole point of using Python is to use its pointer-happy features, like
classes and hairy data structures and dynamically defined functions and stuff. Numeric programming of the kind you do in Matlab
tends to use flat, simple objects, which has the nice side-effect of making interactive programming work very well. If you use a
numeric library inside a pointer-happy language like Python, quite soon the other libraries you use will make
interactive programming annoying. So you'll either move to batch programming or suffer in denial like the die-hard Python weenie
you are. Someone using Matlab will be better off, since interactive programming is more productive than batch
programming, when it works well.
So at the bottom line, I think that interactive programming has limited applicability, since "general-purpose" programming
environments pretty much have to be pointer-happy. That is, if a language doesn't make it very easy to create a huge intertwined
mess of code and data pointers, I don't see how it can be usable outside of a fairly restricted domain. And even in the "flat"
environments like Matlab or Unix, while old data objects can be useful, old commands are, and ought to be, a
two-edged sword. Because they are code, and code is never self-contained and thus has a great potential to do the wrong thing
when applied in a new context.
This whole claim is one of those things I'm not quite sure about. From my experience, it took me quite some time to realize
which interactive features help me and which get in the way with each environment I tried. So I can't know what happens in Lisp
or Smalltalk or Tcl or Excel or Emacs, in terms of (1) applicability to "general-purpose" tasks, (2) the amount of
self-contained data compared to the kind with pointers, especially pointers to code and (3) the extent to which the thing is
itchy and annoying at times. So comments are most welcome. In particular, if you know of an environment that, put simply, isn't
more itchy than Matlab but isn't less general-purpose than Python, that would be very interesting.
when the pointed objects are redefined, it isn’t clear what should
happen with the pointing objects. Should they point to > the new
thing or the old thing? Either answer can be counter-intuitive to
you
Weeeell. I was completely with you until this passage. This passage
resembles the following typical sentiment oftentimes uttered by FP
weenies: "Side effects are awful! You never know what will change as a
result of a side effect. Should the change affect the running code or
shoud it not? Either answer can be counter-intuitive to you!". Well:
nope. If side-effects are properly contained (or, say, wrapped in
transactions or in a dataflow-based consistency-preserving mechanism, or
how about just being explicit and clearly labeled?), there is no
problem.
Same with redefinition. Redefinition can have clearly defined
semantics. Imagine that: it can be that you can know exactly what will
happen when you redefine something!
I don't know how things are in the Python world, but in the
reportedly autistic and ghettoish but still quite comfortable Lisp
micro-universe you have basically two semantic models of
redefinition.
One model is based on the concept of symbols. For instance, calling a
function by name (as opposed to FUNCALL'ing an object, which is slightly
more syntactically cumbersome — the cornerstone of usability is having
right defaults, see) always indirects through the symbol function slot
(I'm glossing over some insignificant performance-oriented details here,
but never mind). This means that when you redefine a function, all the
call sites magically switch to using the new definition.
Kwadratisch, praSimple, convenient, interaction-friendly.
The other model is used in CLOS. It in turn has two facets:
Class redefinition is explicitly supported, by having all existing
instances become of the new class. All of them, effective immediately.
The process is customizable through a protocol. The referers, of course,
see the changes immediately.
Likewise, you can change the class of a specific instance. This, too,
will be immediately reflected everywhere the instance is used, and this
process too can be customized to your liking.
The take-home message is, I suppose, that many seemingly scary things
can work just fine if they are explicitly designed into the
language/environment/whatever.
That said, I do value the ability to restart my Lisp session once in
a while. It's not the redefinition semantics that is the problem here,
but rather the fact that starting over can often be radically simpler
than cleaning the mess up from inside, as it were. Not to mention the
need to check whether your stuff will even load up given a clean
starting image.
Foist, a bunch of random reactions, and then, to the point.
* Side effects are evil :) Technically, I ain't no FP weenie because
everything I actually use has side effects. But when you do compilers or
debug stuff, side effects sure are damn evil. It's a problem... I'm
still thinking about it.
* Lisp isn't Python or Unix or Matlab; Lisp isn't punk technology
(hence the lack of popularity). I'll blog about it some day. Anyway, the
fact that CLOS has a "customizable protocol" to help redefine classes
during interactive development convinces me again that no, it ain't no
punk technology... I think that what happens in Python basically is,
everything is an object and when there are no references to it, it's
garbage-collected; classes are just one kind of object, and there's no
such thing as "redefeinition" – "class A" is basically assignment, just
like "A = 5". Makes interactive work a problem, but is very
straightforward to implement. Three-chord music. Welcome to the
future.
Now, to the point:
First of all, I think that Lisp is less pointer-happy than Python,
Perl and their ilk; AFAIK, class objects don't point to methods, for one
thing. But, what does #'sym do? Doesn't it give you a handle to use for
funcall, and doesn't it force Lisp to keep the old code block around? Or
did you actually mean to say just that (funcall effectively calls by
pointer, normal calls by name?)
I don't really understand how code works in Lisp; I know (or feel
that I know) the C model (where you compile and link everything and
don't even dream about "redefining" anything at run time), and the
Python model (where code is one kind of object, basically, and you can
point to it and the system collects the garbage). Is the Lisp model much
different from the Python model? It seems to be, if CLOS can speak about
"redefining". Or is it about the same, with CLOS classes simply keeping
track of living objects and intercepting redefinition and running code
on those objects upon redefinition?
BTW, I think that Python handles function calls just the way Lisp
does according to your description: when you call functions by name, it
locates the most updated definition at run time, and when you push
function references to data structures (the way you'd do with #'sym), it
pushes a reference to the definition rather than the name.
So currently the picture to me is:
* Lisp is less pointer-happy than Python because objects don't point
to code blocks.
* Lisp is otherwise similar to Python in the sense that you can end up
having multiple definitions of the function hanging around; maybe it's
not a problem because you don't have that much function-valued
variables.
Is this about right?
Lisp isn’t Python or Unix or Matlab; Lisp isn’t punk technology
(hence the lack of popularity). I’ll blog about it some day.
Just link to Worse is Better, that's gonna save you the trouble.
:)
Anyway, the fact that CLOS has a “customizable protocol” to help
redefine classes during interactive development
Well, the protocol is mostly a show-off item (and the result of the
fact that the CLOS designers were being rather unusually thorough).
Pretty much no practical situation needs it, because things Just
Work.
But, what does #’sym do? Doesn’t it give you a handle to use for
funcall, and doesn’t it force Lisp to keep the old code block
around?
#'sym gives you whatever sits in the function slot of 'sym. You can
also use FUNCALL on 'sym itself, though, if you'd like possible
redefinitions of the function to be picked up at that particular call
site.
Is this about right?
Yes. Indeed, Lisp's runtime model is basically the same as Python's
(manifestly-typed objects, lots of pointers, GC, the works); the
interactive usability differences are mostly due to peculiarities of the
evaluation semantics and the object system.
(I'm not sure how the fact that in Python methods "belong" to classes
changes the picture one way or the other, though, but perhaps the
understanding will strike me a minute after I submit this comment.
Happens all the time, no reason to wait).
If (funcall 'sym) works, what's the point of having #'sym – to get in
the way of interactive work? :)
The thing with Python classes that I meant wasn't the single-dispatch
thing (one way to interpret "methods belong to classes"), but the fact
that objects point to classes and classes point to code blocks, so when
you "redefine" classes, the old objects keep pointers to the old code
blocks.
Now that I think of it. So CLOS updates all objects upon class
redefinition (I wonder how it keeps track of them and what happens if
you have threads in the background, but it's just perverse, forget
it).
Well, I add a slot to a class vector3d, which previously kept x, y,
z, and in this slot I cache the vector magnitude. How will CLOS know to
set the new slot to the right value in the old object? What if I had a
bug in the magnitude computation code, and now I reload the definition?
How will it know to recompute that slot? I mean, how is it going to
reestablish class invariants for old instances when class invariants
change? I don't think it can be mechanically solved. Is it "really" a
non-problem, or just seems to be a non-problem in practice?
The thing is, if you have untyped data, then "it means the same thing
all the time", but if you have types and type-based binding, no matter
how you do it, it basically means that objects /point/ to code which
expects certain things and these things are no longer true and stuff.
And it gets in the way of redefining. I think.
If (funcall ’sym) works, what’s the point of having #’sym – to get in
the way of interactive work?
I see the smilie, but let me indulge at being boring: there are
anonymous functions (made with LAMBDA), there are locally-scoped
functions (introduced by FLET etc.) — in short, there is no sense at all
to restrict FUNCALL to just symbols.
when you “redefine” classes, the old objects keep pointers to the old
code blocks.
Er. Don't they redirect through some sort of vtable that sits in the
class object?
Oh yeah. LAMBDAs. Should have thought about those. They ought to be
kept in multiple versions, BTW, since you can't redirect via persistent
something like a symbol, right?
Python classes – yeah, they redirect via the class dictionary, so
instance objects keep indirect pointers to code blocks. Nobody keeps
track of the class objects; if something did, it could change the class
object without changing all the instance objects.
They ought to be kept in multiple versions, BTW, since you can’t
redirect via persistent something like a symbol, right?
Sorry, who ought to be kept in multiple versions by whom? You mean
what happens if you "redefine" an anonymous function? But you cannot
redefine something that is anonymous. By, um, definition. You can only
make a new object, and perhaps strike it lucky and make sure to change
all the references to the old one by references to the new one. :)
If you are gently prodding me to admit that Lisp cannot support every
possible redefinition one can think of — well, that's obviously true.Â
All it does is making sure that when you want something to be
redefinable, you can easily have that, and that the syntax and
evaluation rules are designed to make the redefinable cases the natural
(that is, more concise and readable than the alternative) choice in most
cases.
they redirect via the class dictionary, so instance objects keep
indirect pointers to code blocks. Nobody keeps track of the class
objects; if something did, it could change the class object without
changing all the instance objects.
So basically the main missing thing is a way to redefine a class
while keeping the identity of the class dictionary?
If you are gently prodding me to admit that Lisp cannot support every
possible redefinition one can think of — well, that’s obviously
true.
No, I sincerely wanted to make sure I understand what happens :)
So basically the main missing thing is a way to redefine a class
while keeping the identity of the class dictionary?
Yep. But then I think you have more complicated problems, that of
updating the old instances according to the new class invariants, and I
wonder how CLOS can handle that; I sneaked in a comment about it above.
The bottom line is that I think type-based binding is basically a way of
linking objects to code, and it can make objects "lose meaning" at the
logical level no matter how redefinition is handled.
(Found no way to reply to an individual comment. Oh, and some sort
of comment notification thing would be great. Hey, ever heard of
LiveJournal? :))
So CLOS updates all objects upon class redefinition (I wonder how it
keeps track of them and what happens if you have threads in the
background, but it’s just perverse, forget it).
No "keeping track" needed. The usual implementation strategy is to
check whether the class is current upon each access to an instance, so
you get a certain penalty for working with CLOS (which is why frobbing
slots in inner loops is not a very good idea) but most of the magic is
lazy.
Well, I add a slot to a class vector3d, which previously kept x, y,
z, and in this slot I cache the vector magnitude. How will CLOS know to
set the new slot to the right value in the old object?
This is where that fancy redefinition protocol comes in. This margin
is sort of too small for the details to fit, but basically you have four
main entry points: fresh object initialization, reinitialization, update
for redefined class and update for a different class. You are told what
class changed and to what (where applicable), which slots are
initialized explicitly, etc.
What if I had a bug in the magnitude computation code, and now I
reload the definition? How will it know to recompute that slot? I mean,
how is it going to reestablish class invariants for old instances when
class invariants change? I don’t think it can be mechanically solved. Is
it “really” a non-problem, or just seems to be a non-problem in
practice?
I don't know whether this can be objectively measured to be a problem
or a non-problem, but of course it's not done by magic. You need a
dataflow mechanism to automate it (such things exist, and even come with
convenient CLOS metaclasses). I think Excel gets this right out of the
box.
it basically means that objects /point/ to code which expects certain
things and these things are no longer true and stuff. And it gets in the
way of redefining. I think.
The general redefinition problem is, of course, intractable. I think
it is not a good reason to give up on interactive programming
altogether, because certain kinds of disciplined redefinition are
tractable, useful, can be made convenient and happen to cover most of
the need.
I guess the redefinition protocol is for the cases where you perform
a surgery on a living system that you really don't want to restart –
otherwise I'd probably restart the shell at this point... It's cool that
it is supported though, for the surgical cases – no good way to do that
in most (all?) environments.
So in general the code-update problem is the same as the
serialization problem; a brute-force way to do a code update is to
serialize your universe and then deserialize it. In fact, that's pretty
much the normal way that people upgrade software in the C world.
There's a tradeoff between losing too much state at upgrade time and
having to write schema migration scripts for too many classes.
The equivalence works the other way too; if you can locate all of the
objects that somebody might want to update to new code, and run a
migration script on each one with introspective knowledge of what was in
the object before, then you can serialize the objects in the migration
script.
Anyway, I think your proposed "pointer-happy systems can't be
interactive" rule is bosh. It's true that it can be pretty hard to
figure out how an object got into a weird state after you change the
code underneath it a few times, but that's a pretty rare frustration, in
my limited experience of Squeak and SBCL. Squeak and Erlang are two of
the most pointer-happy programming environments, and also two of the
most interactive. Elisp is full of objects that refer to each other, but
it's radically interactive. Supposedly the old LispM systems were even
more interactive without being any less pointer-happy.
I'd like to suggest an alternative explanation. Maybe interactivity
is a sine qua non for novice programming environments, and so the best
interactive programming environments happen to be novice programming
environments; and novices are allergic to pointers. (At least that's
what Joel claims, and he helped design one of the best ones: Excel.)
Well, there are different kinds of pointer; I meant, first and
foremost, code pointers, which have the biggest
redefinition/serialization problems. I don't think Erlang is that
pointer-happy in this regard.
I've worked a lot with Matlab, tcsh and IDLE/IPython. The
redefinition problems in Python tend to grow as your program grows,
unless you bend and twist your style to work around it, and even then it
still sucks. Allergic to pointers I hardly am.
In Squeak, almost all of the code pointers are in classes; although
it has first-class closures, they almost never live very long. Classes
live in a global dictionary, so they're easy to find. When you define a
menu item that invokes some arbitrary code, you generally define it by
giving it a class and a method name. So changing the definition of a
method pretty much just does the right thing.
Similarly, in Lisp, most functions live in some module namespace, and
are referred to with an indirection through that namespace (just as in
Python). This makes function calling slow but means that you can change
what function gets called.
In Erlang, which is a language I don't actually know, I think you're
encouraged to pass module names and function names, instead of actual
functions, when you define long-running processes, specifically so that
if you reload the module, the long-running process will start running
the new version of the code after the next time it loops at the top
level.
"The newbie realizes that the difference between code and data is
trivial. The expert realizes that all code is data. And the true master
realizes that all data is code."
http://blogs.msdn.com/sriram/archive/2006/01/15/lisp_is_sin.aspx
It comes down to the fact that Python has no become() method, like
smalltalk has, so if you change a class at runtime, new instanciations
will find the old class, but already instanciated objects will point to
the old code.
So, reloading code works if you access this code through its name
(module.attribute) but not otherwise (if a pointer to the old code was
kept).
Even so, there are problems :
- What if you update the code for a class, and add a data member ?
Are you going to update all the existing objects ? Are you going to
create a runtime code versioning scheme ? What if the information to put
in this data member is only available at object construction ?
- What if you update 2 different pieces of code that depend on each
other and you change the interface between them ? Old bits of this code
that still linger and want to call the new code will fail. New code that
happens to call an old bit of code will fail. And there is the question
of doing the update atomically.
So it looks good in practice, until you consider these evil
dependencies.
Reloading code is very easy in Python, I've used it successfully
several times, but it only works well in certain circumtances :
- you don't change interfaces, or you reload atomically all the stuff
that depends on each other
- whatever contains pointers is short lived and will very soon use
the new code
For instance : in a Python web server, you can easily reload the code
for a page, and it's essential that you can do so (who would want to
restart the server everytime a page is modified ?). Lock all requests
for this page (and wait for current requests to finish), reload the
code, unlock.
You can't easily reload the web server logic itself, though, because
it lives for a long time.
And if you have session objects stored in RAM, you can't reload the
code for them either, unless you implement code versioning, or your
session objects are very simple mappings (with no code like class
instances in them).
I wrote a bittorrent tracker as an asynchronous HTTP server. Some
parts of it are reloadable at run-time (error strings, stats pages, the
parts that control the policy applicable to users, etc). For simplicity
I didn't put this reloadable stuff in classes, they're just a bunch of
functions. Since functions are called by name (module.attribute) in
Python, and I didn't keep function pointers around, it worked very
well.
I wrote this 4 years ago. It could handle more than 1000 HTTP
requests per second. Yeah I could have written it in C++, lol, what for
? In Python it took me a week, and once debugged, the thing just runs,
never crashes (Python is a managed environment, it will log an
exception, not crash). I've left this team since 4 years now, and none
of the guys know Python. All they know is how to stop and restart the
tracker when there is a server migration, lol. Last time I checked, this
python code has been running for 6 months straight without crashing or
manual restart. Try that, C++.
In Python, you could pickle (serialize) your entire environment, save
it, reload your code, and reload all your environment. Code is not saved
: class instances are serialized with the module.classname in them to be
able to instantiate them back. If you want to update the code for all
current in-memory objects, you could do it like that (and it provides a
simple way to save a running session).
Don't wait until PyPy catches up, though, it can serialize a running
thread and teleport it to another box...
Oops, I mixed "old" and "new" in some places.
Pickling replaces pointers to code with name-based references, that's
what you need I guess.
[...] enter my brain are either data or cigarette smoke, because I
don't do drugs. And I hope ... Read More RECOMMENDED BOOKS REVIEWS AND
OPINIONS computer abbreviations market research, [...]
Post a comment