02 6 / 2013

nosql:

My thanks to 28.io for sponsoring myNoSQL two consecutive weeks to promote their work on a query platform for MongoDB based on the open JSONiq spec.

Document or schema-free or JSON databases have brought up an interesting challenge to API and query designers. Pretty much everyone in the NoSQL space is using their own query language, while many are suggesting going back to SQL—which even if interesting would probably take a lot of work and bring new corner case inconsistencies to SQL. The promise of a common query language for JSON databases sounds interesting and we’ll see who gets it. With its current backing, Oracle, EMC, IBM, 28.io, JSONiq is indeed a strong contender.

Original title and link: 28.io and JSONiq (NoSQL database©myNoSQL)

31 5 / 2013

nosql:

Salvatore Sanfilippo about a new feature coming with Redis 2.8 simplifying Redis configuration management:

I believe that now that CONFIG REWRITE somewhat completes the triad of the configuration API, users will greatly benefit from that, both in the case of small users that will do configuration changes from redis-cli in a very reliable way, without a restart, without the possibility of configuration errors in redis.conf, and for big users of course where scripting a large farm of Redis instances can be very useful.

✚ Salvatore says that this new feature completes the configuration API. One future extra feature could be replicating some master configuration options to its slave automatically.

Original title and link: Redis configuration rewriting (NoSQL database©myNoSQL)

31 5 / 2013

nosql:

Recently, I helped a cus­tomer opti­mize his data­base. Write lock on the data­base was run­ning con­sis­tently at 95%. CPU was spik­ing con­sis­tently, and mak­ing for a poor expe­ri­ence.

How long until we’ll see profitable consulting businesses focused on optimizing MongoDB? Wait… we already have them.

Original title and link: MongoDB Indexes - I helped a customer optimize his MongoDB (NoSQL database©myNoSQL)

31 5 / 2013

nosql:

Brandon Li, co-author of the HDFS NFS Gateway proposal (PDF) tracked on HDFS-4750:

With NFS access to HDFS, you can mount the HDFS cluster as a volume on client machines and have native command line, scripts or file explorer UI to view HDFS files and load data into HDFS. NFS thus enables file-based applications to perform file read and write operations directly to Hadoop. This greatly simplifies data management in Hadoop and expands the integration of Hadoop into existing toolsets. […] Bringing the full capability of NFS to HDFS is an important strategic initiative for us.

So besides browsing the files stored in HDFS, which to me doesn’t sound too exciting, you’ll be able to upload or even stream data directly to HDFS. Now, that’s cool!

Original title and link: NFS access to HDFS (NoSQL database©myNoSQL)

31 5 / 2013

nosql:

Colin Scott extracted some data about WAN vs datacenter networks reliability:

According to a study by Turner et al. [1], wide area network links have an average of 1.2 to 2.7 days of downtime per year. […] The results are interesting: out of all links types, the average downtime was 0.3 days. […] Intuitively this makes sense. WAN links are much more prone to drunken hunters, bulldozers, wild dogs, ships dropping anchor and the like than links within a secure datacenter.

I’ve been able to find the two papers online:

  1. California Fault Lines: Understanding the Causes and Impact of Network Failures - Daniel Turner, Kirill Levchenko, Alex C. Snoeren, Stefan Savage
  2. Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications - Phillipa Gill, Navendu Jain, Nachiappan Nagappan

Original title and link: WAN vs. Datacenter Link Reliability (NoSQL database©myNoSQL)

31 5 / 2013

nosql:

Peter Bailis:

tl;dr: You can perform non-blocking multi-object atomic reads and writes across arbitrary data partitions via some simple multi-versioning and by storing metadata regarding related items.

Without the time to go through all the details of the algorithm proposed by Peter Bailis and the various scenarios of a distributed system where the algorithm would have to work, my head was cycling between:

  1. could this actually be expanded to a read/write scenario? at what costs?
  2. isn’t this a form of a (weaker) XA implementation?

Luckly, Peter Bailis is already answering some of these questions in his post1:

If you’re a distributed systems or database weenie like me, you may be curious how NBTA related to well-known problems like two-phase commit.

In case you are familiar with XA, you could start reading the post with the “So what just happened?” section and then dive into the details of the algorithm and possible extensions.


  1. Thanks Peter for stopping my head spin! 

Original title and link: Non-blocking transactional atomicity (NoSQL database©myNoSQL)

31 5 / 2013

nosql:

The Aurelius team describing an advanced benchmark of Titan, a massive scale property graph allowing real-time traversals and updates, sponsored by Pearson, developed and run over 5 months:

The 10 terabyte, 121 billion edge graph was loaded into the cluster in 1.48 days at a rate of approximately 1.2 million edges a second with 0 failed transactions. These numbers were possible due to new developments in Titan 0.3.0 whereby graph partitioning is achieved using a domain-basedbyte order partitioner.

✚ The answer to why Titan is built on Cassandra can be found in this interview between Aurelius CTO Matthias Broecheler and DataStax co-founder Matt Pfeil:

[…] we don’t have to worry about things like replication, backup, and snap shots because all of that stuff is handled by Cassandra. We really just focus on: “How do you distribute a graph?”, “How do you represent a graph efficiently in a big table model?”, “How do you do things like etched compression and other things that are very graph specific in order to make the database fast? And, lastly, “How do to build intelligence index structures so that the graphs traversals, which are the core of any graph database, so that those are as fast as possible?”

Original title and link: Titan: Data Loading and Transactional Benchmark (NoSQL database©myNoSQL)

31 5 / 2013

nosql:

Jonathan Ellis:

More qualitatively but perhaps even more important, this addresses the paradox of choice we’ve had in the Cassandra Java world: multiple driver choices provide another barrier to newcomers, where each must evaluate the options for applicability to his project. Having just done such an evaluation to settle on Cassandra itself, this is the last thing they want to spend time on.

And that’s the best-case scenario. More often, a fragmented landscape leads to many solutions, each of which solve a different 80% of the problem. Better to have a single, well-thought-out solution, that lets people get started writing their application immediately.

This is the best argument ever for having official drivers.

✚ In the early days and over long time it’s quite difficult for a company to offer only official drivers. But there’s a solution for that too: recommend one. And support its maintainers.

Original title and link: Best argument for official drivers (NoSQL database©myNoSQL)

31 5 / 2013

nosql:

This is a sample code snippet from the Getting started guide for the recently announced Google Cloud Datastore:

def WriteEntity(): req = datastore.BlindWriteRequest() entity = req.mutation.upsert.add() path = entity.key.path_element.

31 5 / 2013

nosql:

Rob Klopp summarizes a whitepaper published by Cloudera and Teradata:

Simply put, Hadoop becomes the staging area for “raw data streams” while the EDW stores data from “operational systems”. Hadoop then analyzes the raw data and shares the results with the EDW. […] The paper then positions…

31 5 / 2013

nosql:

Two links for those interested in seeing how an automation API for Hadoop would look like:

  1. Ambari API reference v1
  2. Cloudera Manager API v1

At the first glance both of the APIs support the same range of resources/end points.

Cloudera Manager comes in two editions: free and enterprise…

31 5 / 2013

nosql:

The best visualization of JOINS by C.L. Moffatt:

SQL JOINs

Original title and link: SQL JOINs visualized (NoSQL database©myNoSQL)

28 5 / 2013

thedailyweb:

Mastering Sass

28 5 / 2013

thedailyweb:

A true beauty - Codrops once again showcase exceptionally well done CSS3 with these page to page transitions. 

28 5 / 2013

thedailyweb:

This looks promising, also it’s very compact - almost 7kb gzipped and with the introduction of more and more hi-dpi displays, it’s highly possible SVG’s time has finally arrived.