I responded to Dan Weinreb's comments with ano...

2011-02-02T13:54:24.609-05:00

I responded to Dan Weinreb's comments with another post (http://ksbhaskar.blogspot.com/2011/02/responses-to-dan-weinrebs-comments-of.html) because it was too long for a reply.

-- Bhaskar

I have a few comments. First: I do not know what ...

2011-02-01T11:49:53.326-05:00

I have a few comments.

First: I do not know what you mean by "recoverable". If the transactions are not durable, then you cannot recover them. Recoverable in the database literature means that you can recover the state as it was after all transactions that committeed, in the face of any "stop" including total system crash. The log file must be persisted when a (write) transaction commits. Omitting this can speed things up a lot; real durability isn't cheap if you have to actually write the log to rotating magnetic storage. (Less so if you can use FLASH, or you disk has a battery-backed-up RAM buffer, or you're copying the data to main memory in two servers with independent failure modes, or something like that.)

Is this really a database benchmark? You use the word "NoSQL" in the name. But since this is just a computation that ends promptly, getting the job done does not need durability anyway. It just runs to completion. That is, unless you want the application to complete even if a server crashes; but then measuring the execution time would not make sense. I don't even see why there's any need for a log at all.

Second: This benchmark appears to be operating on key/value pairs that are very small, containing small character strings (naming integers) rather than, say, images. Don Anderson says that the record sizes are typically 20-40 bytes. Some of the "NoSQL" systems are clearly designed for much larger K/V pairs. I am pretty sure that Riak (with BitCask) is like this, as well as Google's BigTable and the clones thereof (HBase in Hadoop, and Cassandra).

Third: Many of the NoSQL systems are specifically designed to use replication, to the point where using them without replication is simply not done, and therefore not optimized for. This might make a difference.

Fourth: About tuning, this is a problem that always arises in benchmark comparisons. I have been involved in many such comparisons in my career, especially the "OO7" benchmake for object-oriented database systems. "Apples to apples" gets to be very hard to define fairly. This isn't your fault; it's an inherent problem in comparative benchmarks.

It's a good idea to have well-specified benchmarks, and this is a good start!

-- Dan Weinreb

GT.M can also be accessed via Javascript (using No...

2011-02-01T07:04:51.331-05:00

GT.M can also be accessed via Javascript (using Node.js) - see https://github.com/robtweed/node-mwire

Rob

Comments on Que es Bhaskar: Comments in response to Don Anderson's 3n+1 benchmark

I responded to Dan Weinreb's comments with ano...

I have a few comments. First: I do not know what ...

GT.M can also be accessed via Javascript (using No...