This is the latest version of our course on big data: Watch the full course on coursera at … [...]
The following diagram shows a typical Big Data Infrastructure Design. This is from one of Allied Consultant’s Big Data [...]
Cluster Expected Volume Benchmark hardware Project Hardware requirements Cores RAM # nodes Disk Source 6 Million records / month ~ 3 records per second HDFS 6 million/month 1 namenode, 20 datanodes, 2 CPU/node, 64GB RAM/node 1 6G 1 : Master 3: Slaves 120% of 6G =7.2GB/month Kafka 4 [...]
With the help of Big data, Small Businesses can gain the competitive edge they require to stay ahead of the curve. For beginners, Big data includes large data sets of information which can reveal insights about your customers to help you make valuable business decisions. Data can help you to [...]
Data Scientists are known for having a knack for statistics, data analysis etc. in order to understand and obtain insights from a given dataset, usually quite enormous in quantity. Here are some fundamentally important data science skills that are absolutely necessary for a Data Scientist. This [...]
Resource Management in Information Technology There is a whole host of technology available now a days to ensure that your IT hardware resources are managed efficiently. You may have a data center in house, a few cloud nodes/services/apps which together may constitute your investment in hardware. [...]
Synchronous vs Async pipelines Synchronous big data pipelines are a series of data processing components that get triggered when a user invokes an action on a screen. e.g. clicking a button. The user typically waits till a response is received to intimate the user for results. In contrast in [...]
There is a lot of hype about “Big Data” solutions with most of our customers. I looked at first a few years ago and I found most things to be very early stage with little genuine intent to implement from customers. However in the recent past, I have seen an increase in the number of [...]
You can also access Part I & Part II of this series “Laying the foundation of a data-driven enterprise with Hadoop“. Hadoop platform has performed well in the batch interactive as well as the real-time data processing if the core is Apache Hadoop. Recently, Hortonworks launched a [...]
Recent Comments