I've always preferred to use long integers as primary keys in databases, for simplicity and (assumed) speed. But when using a REST or Rails-like URL scheme for object instances, I'd then end up with URLs like this:
And then the assumption is that there are also users with IDs of 782, 781, ..., 2, and 1. Assuming that the web app in question is secure enough to prevent people entering other numbers to view other users without authorization, a simple sequentially-assigned surrogate key also "leaks" the total number of instances (older than this one), in this case users, which might be privileged information. (For instance, I am user #726 in stackoverflow.)
Would a UUID/GUID be a better solution? Then I could set up URLs like this:
Not exactly succinct, but there's less implied information about users on display. Sure, it smacks of "security through obscurity" which is no substitute for proper security, but it seems at least a little more secure.
Is that benefit worth the cost and complexity of implementing UUIDs for web-addressable object instances? I think that I'd still want to use integer columns as database PKs just to speed up joins.
There's also the question of in-database representation of UUIDs. I know MySQL stores them as 36-character strings. Postgres seems to have a more efficient internal representation (128 bits?) but I haven't tried it myself. Anyone have any experience with this?
Update: for those who asked about just using the user name in the URL (e.g., http://example.com/user/yukondude), that works fine for object instances with names that are unique, but what about the zillions of web app objects that can really only be identified by number? Orders, transactions, invoices, duplicate image names, stackoverflow questions, ...
Generating pages from a database
Version control Access 2007 database and application
Best beginner resources for understanding the EAV database model? [closed]
But uuids are great for n-tier applications.
Syncing between two databases
PK generation can be decentralized: each client generates it's own pk without risk of collision.
Help with database connection
And the speed difference is generally small..
Working with a database
Make sure your database supports an efficient storage datatype (16 bytes, 128 bits).
Cursors vs duplicate code/logic
At the very least you can encode the uuid string in base64 and use char(22)..
How is a search in a Database realized?
I've used them extensively with Firebird and do recommend..
That's not to say displaying a GUID is a bad idea, but as others have pointed out, joining on them, and indexing them, by definition, is not going to be anywhere near as fast as with integers..
The reason is that when using NEWID() the value generated is not sequential.
SQL 2005 added the NEWSEQUANTIAL() function to remedy that. One way to still use GUID and int is to have a guid and an int in a table so that the guid maps to the int.
the guid is used externally but the int internally in the DB. for example.
1 and 2 will be used in joins and the guids in the web app.
457180FB-C2EA-48DF-8BEF-458573DA1C10 1 9A70FF3C-B7DA-4593-93AE-4A8945943C8A 2
This table will be pretty narrow and should be pretty fast to query.
A lot of blog software does that, where the exposed id of the entry is identified by a 'slug', and the numeric id is hidden away inside of the system.. The added benefit here is that you now have a really nice URL structure, which is good for SEO.
Obviously for a transaction this is not a good thing, but for something like stackoverflow, it is important (see URL up top...).
Getting uniqueness isn't that difficult.
If you are really concerned, store a hash of the slug inside a table somewhere, and do a lookup before insertion.. edit: Stackoverflow doesn't quite use the system I describe, see Guy's comment below..
Makes it very easy when the client suddenly opens an office in another part of the world....
For example, you could take the 32 bits of the sequential ID and rearrange them with a fixed scheme (for example, bit 1 becomes bit 6, bit 2 becomes bit 15, etc..).
This will be a bidirectional encryption, and you will be sure that two different IDs will always have different encryptions.
It would obviously be easy to decode, if one takes the time to generate enough IDs and get the schema, but, if I understand correctly your problem, you just want to not give away information too easily..
Users hate long, incomprehensible URLs.. Create a shorter ID that you can map to the URL, or enforce a unique user name convention (http://example.com/user/brianly).
The guys at 37Signals would probably mock you for worrying about something like this when it comes to a web app.. Incidentally you can force your database to start creating integer IDs from a base value..
Why not have:.
Which is friendlier to humans and doesn't leak that tiny bit of information?.
For n-tier apps GUIDs/UUIDs are simpler to implement and are easier to port between different databases.
To produce Integer keys some database support a sequence object natively and some require custom construction of a sequence table.. Integer keys probably (I don't have numbers) provide an advantage for query and indexing performance as well as space usage.
Direct DB querying is also much easier using numeric keys, less copy/paste as they are easier to remember..
They have a table which hold the next unique ID.. Although this is probably a good idea for an architectural point of view, it makes working with on a daily basis difficult.
Sometimes there is a need to do bulk inserts and having a UUID makes this very difficult, usually requiring writing a cursor instead of a simple SELECT INTO statement..
It takes up more space but it's more secure..
I would just say use what you prefer.
In 99% of systems it will no matter which type of key you use, so the benefits (stated in the other posts) of using one sort over the other will never be an issue.
You do lots of compares during queries that involve joins on your key.
On the application side, if every table uses a uuid as primary key then you uniquely identify a row by the id.
This makes the identity mapping easier..
. Thinking security by obscurity they fit well when forming obscure URI's and building normalised DB's with Table, Record and Column defined security you cant go wrong with GUID's, try doing that with integer based id's..