Sunday, January 4, 2015

Follow up on HDFS Client Bandwidth Utilisation

In a previous post, I explained the source of unexpected bandwidth consumption in the HDFS client.  This is a follow up post on HDFS client bandwidth utilization.  Sadly, at this point, I do not have new solutions to keep the bandwidth utilization low for random "small" reads with the HDFS client but I have new insight on how the HDFS client protocol works.

Thursday, January 1, 2015

Unexpected High Bandwidth Consumption in HDFS (Hadoop 2.0.0 and Cloudera 4.x)

I am trying to use HDFS as a backend for data storage using the Java API.  The details about this data storage will be published in a future post (and is irrelevant to this discussion).  During my experimentation, I met unexpected bandwidth consumption on the client/reading nodes.  My findings are shared below.