IHateCamelCase
Yeah, naming conventions. Looks like my brain won't do any better today; those 5 drafts will have to wait. If you aren't in a
mood for a trivial subject, skip this.
I think that the best naming convention out there is the Lisp one: case-insensitive-dash-separated. It just doesn't get any
better:
- You never have to hit Shift or Caps Lock in the middle of a name, which makes typing easier. This is especially important
for names because you use auto-completion with names. Auto-completion requires you to press things like Ctrl-Space or Alt-Slash
or Ctrl-P. Together with another Shift needed for the name to be completed correctly, auto-completion is much more likely to
cause repetitive strain injury.
- You never have to think about the case. Figuring out the case of a letter in a case-sensitive naming convention can be
non-trivial; more on that later.
- The dash-as-a-separator-convention is used in English, so your names look natural.
Unfortunately, most languages use C-style identifiers for names, the dreaded [A-Za-z_][A-Za-z_0-9]*
, because
their infix parsers can't tell a dash from a minus. So you can't use this convention.
This leads to two problems:
- How do we separate between subsequent words in an identifier?
- When do we capitalize letters?
Of course, we could use a lowercase_underscore_separated convention. It would solve both problems in a simple way, having all
the benefits of the Lisp convention except for the no-Shifts-and-Caps-Locks part. But (1) Caps Lock is available for
capital letters, but not for underscore, and Shift is less healthy for your hands, and (2) if we have case sensitivity in our
language, we'll of course use it, won't we? OK, let's kill those underscores.
There are two anti-underscore schools: alllowercase and CamelCase. alllowercase looks lame – it makes it easy to know when to
capitalize letters (never), but chooses to ignore the word separation problem completely. I used to sneer at it. However, it has
two huge benefits: it's very typing-friendly, and it discourages the use of long names. Long names, people, are a frigging
nightmare.
HaveYouEverSeenANameTakingHalfAScreen? This is awful. Awful!! I can't lock my eyes on the damned thing. I can only
focus on a tiny part of it. My eyes nervously jump around the line, which mentions the moronic mega-identifier twice
(at both parts of an assignment). I'm looking for differences, small differences in the names... You know, it could be
BlahBlahABlah on the left and BlahBlahBBlah on the right... AAAARGH!
Reading this kind of code is pure mental pain. I prefer mental pain to physical pain on any day, and that's why I'm in the
software industry, but still, this sucks. The good news are that alllowercasenametakinghalfascreen is so ridiculous that even
the most clueless pseudo-orderly person won't emit it.
Now, CamelCase, which is basically the winner, because it's used in all major languages and libraries, is probably the worst
possible naming convention. It fails to solve both problems created by the lack of a good word separator in the A-ZA-Z0-9
languages:
- You don't really know when one word ends and the next word starts.
- You don't really know when a letter should be capitalized.
The problems of camel case come from using capital letters for word separation. This interferes with the other uses of case
in natural language. The problems are amplified by the brilliant idea to assign even more semantical payload to case:
functionsLookLikeThis, but ClassesLookLikeThis, etc. Let's look at some examples.
English has words like TCP, DNA and WTF. Should a TCP socket class be called TCPSocket or TCPsocket? What about a
TCPIPSocket? What if we need a tcpOpen method – should we call it TCPOpen, like a class, to preserve the natural case of an
acronym, or should it be TCPopen, so that the lowercase "o" conveys the fact that it's a function?
Oh, I know, it should be "openTCP"! No, no, why are you using "openTcp" – this is ugliness for its own sake! The
only important thing is to get the first letter of a name right, and then you can use natural capitalization! Unless, of course,
it's "openTCPIPSocket", and then we have a problem again. "openTcpIpSocket"?.. Some people just can't handle it and resort to
underscores: openTCP_IPsocket, open_TCP_IP_socket... It's no use. It's ugly no matter what you do.
Capital letters coming from the natural language, like those in acronyms and names of people, are the smaller part of the
problem – Tcp looks ugly, but you know what it means. The other part of the problem is the capital letters coming from
formal languages, such as mathematical notation.
For example, in computer vision it's common to denote 3D coordinates with uppercase X,Y,Z, and 2D coordinates with lowercase
x,y. In a case-sensitive language, it's damn natural to follow this convention, and it works very well for a local variable X or
x (including the case when you use both in the same function). It doesn't work so well when you try to name functions or classes
after their arguments/coordinate systems.
Does xySomething start with a lowercase x because it's a function, or because it really accepts x values of 2D coordinates?
What about xYSomething – is the Y capitalized because "y" is a word and we always capitalize the first letter of a word, or
maybe the function expects Y values of 3D coordinates?
You can have a function working with 3D X coordinates and 2D y coordinates, you know. I think it's better to call it
XySomething than xYSomething, because meaning is more important than convention. But did the author of the function think so,
too? Of course, we can use an underscore to "clarify" the intent: something_Xy. The underscore clearly shows that the
part after the underscore doesn't follow standard naming conventions, so it must be according to the computer-vision-specific
convention.
So what happens is that CamelCase code deteriorates to the following state:
- You have ugly names like tcpIpOpen.
- Since you also have names like TCP_IP_Open, your real naming convention is "camel case with underscores". Which is
equivalent to "any identifier that compiles".
Maybe there's a good way to augment CamelCase with rules that make it work well. I probably wouldn't know. I ought to say
that I'm not that good at naming conventions in particular and in Best Practices in general. But I doubt there's a good
case-sensitive naming convention out there.
Just look at the Python naming convention. You basically have
everything. thingslikethis, ThingsLikeThis, things_like_this, thingsLikeThis, and they're all attached to different
types of object (module, class, function, method). And every time your language entity convention disagrees with the common
sense (class TCPIPSocket), you've got yourself an ugly name. And in a way, this is a good convention, because it at
least tries to be consistent with the common conventions used in C, C++ and Java.
The annoying part of this is the slowdown. "Um, how should I spell this name?.." There are actual capitalization
trade-offs here. Programming is almost exclusively about making decisions and choosing trade-offs. It's quite tiring,
really. Nobody wants to be making some more pointless decisions on the way just for the fun of it. Maybe it's just me and the
kind of people I've worked with, but I've always, always bumped into lots and lots of names which looked like a
compromise. Somebody was thinking hard here. And it looks ugly anyway.
Barf.
Aesthetics and typing ease are just part of the problem – Hungarian notation.
A while back I considered modifying emacs to convert names-like-this
to names_like_this automatically when in C or Python mode, but didn't
know enough elisp to do so. I still don't, but maybe someday...
If there is anything good to be said about C++ it's that the long and
frequent build times give your fingers a chance to rest. Writing python
using CamelCase leaves my pinky fingers aching by the end of the day. I
hate to think of what they'll be like in a few years.
I always figured that CamelCase was a result of the dumb decision to
make _ be ← in Smalltalk, and that all the other languages where you see
it are that way as a result of aping the styles of former Smalltalk
programmers.
PEP 8, the most recent version of the style guide, has the
bastardCamelCase eliminated. Method names use the same convention as
function_names.
http://www.python.org/dev/peps/pep-0008/
Great! Now all the people who took the old convention seriously will
feel bad for a variety of reasons. I know they will.
I hate coding conventions. And I hate camel case.
Interestingly, they've converged to the same thing I pretty much did
– I write everything_like_this except for TypeNames, which would look
ugly This_Way and it kinda sucks to spell them this_way because they're
capitalized, like, everywhere, and the type_name_t convention from C
really sucks.
Considerations. It's all very deep.
This is the reason why Nimrod is case insensitive and ignores
underscores in identifiers. It also solves the macro problem that you
refer to in one of your other articles by basing its AST on a single
node type. Check it out, you might like it!
Your blog contains great articles, by the way.
Thanks! Regarding the language – interesting stuff; a bunch of things
reminding of other languages suggest that you know plenty :) [almost
certainly more than myself...]
I'll refrain from detailed comments as I've been neglecting the part
of the brain which makes sense of PL information for quite some time, so
these days I'm a very low-end commenter :) In particular, I lost hope in
combining efficiency and elegance to the point where I'm disgustingly
pessimistic compared to most people with practical interest in PL
design.
I certainly agree that hyphen-separated is the way to go, which is
why I made my language accept it. But a name may not start with a
hyphen, which solves the unary-minus problem. (This means you need
whitespace around a binary minus, which I think looks much better
anyway. Also, identifiers are case-sensitive. And the Smalltalk
camel-case legacy survives as the convention for (capitalized) class
names, my one concession to history/Hungarianism.)
An interesting compromise, although could bite whoever copies
expressions verbatim from another infix language – not that an error [as
opposed to silently doing the wrong thing] is that bad a bite.
[...] Shared IHateCamelCase. [...]
choose different starting symbols. i.e.
types: $int $real
vars: a b c (none)
enum-items: #sunday #monday
keywords: `if `else
kragen: It's quite likely that Smalltalk chose to treat "_" as "←"
because early Teletypes were conflicted about what that code point
should print; on early Teletypes, it actually did print a back-arrow
(such as you might use to indicate you've corrected a previously-typed
character) whereas on later Teletypes (as ASCII became more-standarized)
it printed an underscore.
I like camel case because it is compact. I am a slow typist and hence
my hands never hurt (after 20 years or writing on all kinds of
keyboards).
I find natural language inefficient and badly designed, therefore I do
not let it interfere more than it needs to with the well crafted
programing language grammar.
By publishing and abiding by a few simple rules, including limiting the
maximum size of an identifier, camel case can be quite effective.
I almays hated the "m_" convention for members. to me a Hungarian style
m would suffice. Underscore is hard to type for me; and so is dash. I
tend to be much better about numbers and digits at the center of the
keyboard. My early limited keyboards with only 40 keys whould require
lots of shifts to type even dashes.
For a long while I favored single letter identifiers. They were so
much faster to type.
Know what's also unreadable? opentcpipsocket. Your arguments against
camel-case also apply to your ideal case. Retarded argument. Since we
are talking about naming conventions, forbid openTCPIPSocket and require
only the first letter of each acronym to be capitalized:
openTcpIpSocket
His ideal case is lowercase with dashes, i.e. "open-tcp-ip-socket".
How do the arguments apply to that?
I find djb's code the easiest to read. He has a great appreciation
for brevity and succinctness.
The more text you can get to be left-aligned and the shorter you can
keep the line width, the easier reading becomes. Also, the more the text
follows as pattern, the easier it is to digest, e.g.
a
aa
aaa
aaaa
or
bbbb bbbb
bbbb bbbb
bbbb bbbb
Code as ASCII art.
Maybe that's why print book and magazine publishers often used two
columns. It's easier to digest.
Why not use abbrevations, one-letter identifiers if you wish, and
provide and index showing what each represents, as a comment? vi-style
markers let you jump to the index quickly.
I too hate CamelCase. However, I have great respect for the
smallness, speed and simplicity of TurboPascal, which is heavy on
CamelCase. Some people just like it, and I guess Wirth is among
them.
Why they like it is a mystery.
Yes, I can't stand CamelCase, especially when the first letter is
lower case for methods or variables. So I tend to piss off the other
devs by using_underscore_variables. CamelCase makes sense for
class/struct names, but that's about it. Laziness doesn't count as a
criteria for me, and monitors these days are all big enough for even
ridiculously long names.
Actually, I prefer short names, not because of screen limitations,
but because of cognitive limitations – a_long_variable_name is hard to
absorb in one glance, and to distinguish from
a_longer_variable_name.
Good post. Upper case is an abomination in both code and file names,
especially nowadays where every system is case sensitive (except of
course the retarded Windows) and one typo can lead to a disaster.
A simple proof that CamelCase doesn't work is to look at code and file
names around you. It's all over the place.
I prefer camel case over anything else. Maybe if I had to deal more
with 2D vs 3D coordinates it would be a different story, but I hate
having to hit out-of-the-way keys like the dash. I never took an
official typing class, so when I hit special characters(like the dash)
my entire wrist moves instead of just the pinky finger like I believe
you're supposed to.
Pretty much all of the points you gave against camelCase are wrong
when you are in the OOP world:
TCPSocket::open();
TCPIPSocket::open();
And FFS who uses lower-case x, y for 2D coordinates and upper-case X,
Y, Z for 3D ones? As far as I can remember in OpenGL (and pretty much
everything graphics related) there is just Vector2f, Vector3f etc...
Also, it's pretty simple to spot that you have 3 arguments in 3D and 2
in 2D.
Erm... why not TCP_IP_Socket? What if it's CustomTCPIPSocket – how is
that different from openTCPIPSocket? Why can't a class conceivably have
an openTCPIPSocket method?..
"Who uses x,y/X,Y to distinguish between coordinates" – a common
computer vision convention; in vision you mix 2D (image) and 3D (world)
coordinates in expressions so it's not about spotting the number of
arguments that's different in different contexts.
Well put Yossi, well put.
Additionally, I'd like to point out an interesting article about how
humans interpret written words. http://en.wikipedia.org/wiki/Typoglycemia
Removing the spacing between words effectively messes with the brains
ability to easily interpret them, and also reinforces your points about
lengthy names being bad for your mental health.
So many interesting points. Thank you.
atlant: yes, Smalltalk, even though its development started around
1970, used the older ASCII-1963 interpretations of ← and ↑ for _ and ^,
which it interpreted, respectively, as assignment and return. ASCII-1963
didn't mandate lower-case, but it did leave a big open space for
lower-case that ASCII-1967 filled in.
But why would a language whose first identified version was Smalltalk
72 in 1972 be using the ASCII-1963 character set instead of the
ASCII-1967 character set? Probably because they wanted ← and ↑, and in
1972 using your own weird character set was a reasonable thing to do,
even though looking back 42 years later (or 36 years later, when I wrote
my comment above that you replied to) it seems like an obviously dumb
idea. I mean, APL was still popular in 1972, IBM mainframes with EBCDIC
were sort of dominant, PDP-10s could handle bytes of any size but
usually used their own weird six-bit character set, and there were even
UNIVAC mainframes using FIELDATA, although they were less popular than
IBM's mainframes and DEC's mega-minis.
Smalltalk, also, was noted for its tendency to prioritize doing cool
things over compatibility. And that's why it's still marginally relevant
42 years later, while EBCDIC, FIELDATA, UNIVAC, and PDP-10s and SIXBIT
have been consigned to the dustbin of history.
...or "that's why it's only marginally relevant while worse languages
from those days are still extremely relevant"... :)
I "solve" the problem with the CamelCase by using custom text render
that detects them on highlighting and change the colors intensity of the
capitals, thus making it very easy to see the both word separated. I
could put a space between the words, but I use a monospaced font and I
don't bothered to play with the kerning.
Wow! Cool stuff. Does it work in vim? :)
your not having to use the shift argument is acutually a bad one you
ahve to pres the shift button when you want to type an underscore wich
is one of the most distant characters on the keyboard.
a better button would have been the \ on the left side of the board
without caps or shift.
but if you as a programmer use long words than please use the _ or
just mix them it is much better.
Left to my own devices, I've decided this style works for me:
* functions_and_variable_names
* ClassNames
* CONSTANTS_AND_MACROS
I don't even bother with lowerCamelCase, as it's not visually
distinct enough from UpperCamelCase.
I also prefer to treat initialisms as 'words' for the purposes of
capitalization, so something like this is representative:
XmlParser xml_parser = SpecificXmlParser(SOME_XML_CONSTANT);
I do a lot of camel cased programming, but it rarely bothers me too
much. Part of the reason is probably auto-correction that pretty
eliminates the need to actually type most of the characters in any
already defined identifier. This is triggered automatically, so I also
don't have to type Ctrl-Space or anything like that. I still think that
Camel Case is the best solution to all of the problems specified in the
article.
back in the late 80's we were taught that what you all state is
'CamelCase' now was actually (for us veterans) 'camelCase' its funny to
me seeing the word 'CamelCase' over and over, since it ignores the very
reason it got that name. Back then we had ProperCase lowercase UPPERCASE
MiXeDcAsE (sometimes called interleaved case, stupid right?) and then,
of course, camelCase. The reasoning is that a camel has a bump __IN THE
MIDDLE__ — suggesting that ProperCase is CamelCase now basically is like
putting a big lump on the poor old camel's head. "Okay, who punched the
camel!?" **not that it matters but it was just bothering me to read
about it so much so i drop this on you guys right here because it was
the first easy place to do it :) **
-O
Long Live PunchCard Programming!!!
CamelCase and PascalCase are cancer. Just remap the underscore key to
another key and you're good to go.
Python often uses C/C++ libraries, and there is a useful mixed
convention. Snake case is used for python objects and camel case is used
for calls in the external library. It's important to keep track of the
difference in many cases. Object Destructors are handled entirely
differently, for example. (Ex: In PyQt, you would call QtObject.delete()
to remove it from the Qt memory, and 'del QtObject' to delete the
python.)
You missed an ambiguity, at work, we are forced to use an annoying
camel case style, but where "only the first letter" is capitalised, so
TCPOpen would be TcpOpen, but hold on ! TCP is an abreviation of
Transmission Control Protocol, so that should be
TransmissionControlProtocolOpen, so if we remove the extra letters that
becomes TCPOpen. At least, this is how I interpret it, because TcpOpen
is just so darned UGLY. I hate camel case with a vengence.
Post a comment