If you’re in business, you have data. And if you’re like a lot of
businesses, you have a lot of data. And it’s not only coming from your
customers, it’s coming from other business units, partners, in-house
applications, the cloud, hardware logs, etc. And that data could help you be
better at your business, if only you had the right solution to access it in
ways that deliver quantifiable value.
One solution is to build an enterprise data hub (EDH) through which all your
data flows for processing. Many IT professionals turn to Apache Hadoop as the
core component of an EDH, but other technologies can be complementary. For
example, a NoSQL database can play an important role in an EDH to help manage
the complexities of processing and storing structured data for your
While RDBMSs still have plenty of useful functions, consider a NoSQL database ... (more)
Putting Analytics into the Decision-Making Workflow with Apache Spark
Data-driven businesses use analytics to inform and support their decisions.
In many companies, marketing, sales, finance, and operations departments tend
to be the earliest adopters of data analytics, with the rest of the business
lagging behind. The goal for many organizations now is to make analytics a
natural part of most-if not every-employee's daily workflow. Achieving that
objective typically requires a shift in the corporate culture, and ready
access to user-friendly data analytics tools.
Big Data Should... (more)
Apache Spark continues to gain a lot of traction as companies launch or
expand their big data initiatives. There is no doubt that it’s finding a
place in corporate IT strategies.
The open-source cluster computing framework was developed in the AMPLab at
the University of California at Berkeley in 2009 and became an incubated
project of the Apache Software Foundation in 2013. By early 2014, Spark had
become one of the foundation’s top-level projects, and today it is one of
the most active projects managed by Apache.
Because Spark was optimized to run in-memory, it is capable of p... (more)
If you’re running Big Data applications, you’re going to want to look at
some kind of distributed processing system. Hadoop is one of the best-known
clustering systems, but how are you going to process all your data in a
reasonable time frame? Apache Spark offers services that go beyond a standard
A choice of job styles
MapReduce has become a standard, perhaps the standard, for distributed file
systems. While it’s a great system already, it’s really geared toward
batch use, with jobs needing to queue for later output. This can severely
hamper your flexibility.... (more)
You might have looked at some of the articles on Apache Spark on the Web and
wondered if you could try it out for yourself. While Spark and Hadoop are
designed for clusters, you might think you need to have lots of nodes.
If you wanted to see what you could do with Spark, you could set up a home
lab with a few servers from Ebay. But there’s no rule saying that you need
more than one machine just to learn Spark. Today’s multi-core processors
are like having a cluster already on your desk. Even better, with a laptop,
you can pick up your cluster and take it with you. Try doing that ... (more)