Wednesday, April 1, 2015

Moore's Law & Cloud Computing

Moore’s Law:


Moore’s law is a term which originated in 1970. The simplified version of this law states that processor speeds, or overall processing power for computers will double every two years.



Cloud Computing:



Cloud computing is a type of Internet-based computing," where services such as servers, storage and applications are delivered to an organization's computers and devices through the Internet.

Cloud Computing Service Providers:


Microsoft Azure:


It is a cloud computing platform provided by Microsoft.






Features:

IaaS + PaaS

  • Azure is the only major cloud platform ranked by Gartner as an industry leader for both infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS)
Hybrid ready

  • Does not make you choose between your datacenter and the public cloud, gives you the best of both worlds
  • Easier to build applications that span both on-premises and the cloud
Open and flexible

  • Supports any operating system, language, tool, and framework
Always up, always on

  • Offers a 99.95% availability SLA, 24x7 tech support, and round-the-clock service health monitoring
Economical and scalable

  • Azure can quickly scale up or down to match demand, so you only pay for what you use
Everywhere
  • Azure runs on a growing global network of Microsoft-managed data-centers across 19 regions, giving you a wide range of options for running applications and ensuring great performance
Xerox, a global leader in business processes and document management, wanted to shift its employee learning focus from a traditional approach focused on classroom-based learning to a more needs-driven approach. Recognizing both the power of video in learning and the fact that its workforce is increasingly mobile, Xerox worked with Microsoft Azure Circle partner Ravnur to implement a Microsoft Azure cloud-based video content management solution that delivers learning content to mobile devices anywhere and anytime.

Google Cloud Platform:



Features:
  • Google has one of the largest and most advanced computer networks
  • Data is automatically mirrored across storage devices in multiple locations
  • Takes care of database administration, server configuration, sharing and load balancing
  • Integrated with several development tools and command line interface which makes it easier to build applications
  • Provides auto scale-up during times of heavy traffic
  • Managed services also scale down, you don’t pay for what you don’t use
  • Provides great performance and excellent support

In 2008, a team of Best Buy developers launched Giftag, a social application that lets users make online wish lists to share with friends through email, Facebook, Twitter and other social media sites. The app was promising, but the development effort was substantial – it took eight developers more than a year to create the app. Once it was developed, adding new features and scaling it for heavy use was onerous.

Once the Best Buy team switched to Google App Engine, the payoff was immediate. Best Buy developers rewrote the Giftag application from scratch – improving on the original code – while only having to do an extract, transform and load on the existing data. The entire process took four and a half developers just 11 weeks – or roughly half the original development team, in 25% of the original time. The time savings enabled Best Buy to relaunch the app just in time for the 2008 holiday shopping season.

Advantages of cloud computing for BI users:


Ease of use
  • Easy to operate and setup -> reduced IT involvement and costs

Deployment speed
  • Easy to deploy, since they require no additional hardware or software installations

Scalability and elasticity
  • Can be rapidly scaled to accommodate an increase in the number of users in an organization.

Accessibility
  • Can be accessed on any web browser or on any mobile device

Wednesday, March 4, 2015

Presentation and Visualization Methods

Data Visualization?



E-Commerce industry trends:


To observe the trend of how online retailers have been performing, we can use various visualization techniques like pie chart, line chart.

A pie chart can be useful to show how much market share has been captured by a particular online retailer. 

A bar chart is also an option but it cannot be used to show the trend over a period of time.


Recommendation: Line Chart

A line chart can be used to display trends over a period of time and also provide an easy way to compare online retailers in a particular year.

A line chart connects individual data points and is used to visualize a sequence if values.



From the visualization given above we can gain the following insights:
  • Individual sales and comparison of the sales of various online retailers
  • Rate of growth of each online retailer

Education industry:


To analyze the per-pupil expenditure for public elementary and secondary education in the United States, we can use visualization techniques like bar chart, map, etc.

A bar chart can depict the amount spent in each state in a very easy way. The highest and lowest amount can be easily identified by looking at the height of the bars. 

A pie chart can be used to show but since there are to many sectors the pie chart would look cluttered.

Recommendation: Map

When a map is used for such kind of a representation, it becomes more intuitive and the business gets an option to directly look at on overview of the amount spend by state.

If high/ low values are important then bar chart is the optimal method. If location wise distribution is important then map is the best solution for visualization.

Following is a map that visualized the current pupil expenditure for public elementary and secondary education by state:

Source:

http://nces.ed.gov/edfin/graph_topic.asp?INDEX=1

Telecommunications industry:


In today’s era, mobile phone are slowly replacing desktops. Mobile phones are getting more powerful and most of the work can usually be done using mobile phones. Accessing social media applications is probably the most frequently performed operation from a mobile device. 

Two pie charts can be used with each one visualizing details about social media usage on mobile devices and desktop respectively. 

A table can be used with two columns for each mobile phone and desktop along with the usage statistics in each row for a particular social networking platform. But this is not very intuitive.

Recommendation: Stacked bar

To compare social media usage using mobile devices and desktops, a stacked bar would be the best option.

Stacked bars:
  • Display data on top of each other
  • Provide an intuitive means to compare two components against each other



Source:


From the visualization given above we can gain the following insights:
  • Most of the popular social networking applications are used from the mobile devices
  • LinkedIn, Tumblr can focus on improvising their mobile applications further


Thus, visualizations enable businesses to look at the data and make better decisions in less time.





Thursday, February 19, 2015

Structured Data v/s Unstructured Data

Data:

Data is information that has been translated into a form that is more convenient to move or process.



Structured Data:
  • Data that resides in fixed fields within a record or file
  • Well defined content: displayed in rows and columns
  • Can be easily organized and processed by data mining tools
  • Everything is labelled and easy to access
  • Can be stored in RDBMS

Examples:
  • Databases
  • XML Data
  • Enterprise Systems (ERP, CRM)

Unstructured Data:
  • Has no identifiable internal structure
  • Data does not have a pre-defined data model
  • Is not organized in a pre-defined manner
  • Unstructured information is typically text - heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional computer programs as compared to structured data
  • Storing it in RDBMS is not a good-fit

Examples:
  • Word documents
  • Email messages
  • Audio/ Video files



Data warehouse:

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.


Types of data:

Historical Data:
  • Typically contains several years of historical data
  • Amount of data depends on disk size
Derived Data:
  • Generated from existing data using a mathematical operation of a data transformation
  • Generated in run-time as a response to a query
Metadata:
  • It describes the data and the schema objects
  • Used by applications to fetch and compute the data directly

Data Warehouse Architecture:

Basic:
  • End users directly access the data derived from several source systems through the data warehouse
With a staging area:
  • Operational data must be cleaned and processed before putting it into the warehouse
  • Staging area is used to accomplish this as it cleanses and consolidates the operational data coming from multiple source systems
With a staging area and data marts:
  • Used to customize warehouse architecture
  • Data marts are systems designed for a particular line of business

Limitations of Data Warehousing:

Extra Reporting Work:
  • Larger the organization, more the amount of data
  • Each business division generates the data needed in the warehouse
  • Not easy to generate reports, requires significant effort
Cost benefit ratio:
  • Involves lot of man hours
  • Lot of investment for the implementation
Huge maintenance cost:
  • The cost of updating the warehouse to adapt to the changing business needs is too high
Data Ownership Concerns:
  • Data warehouses are often, but not always, Software as a Service implementations, or cloud services applications. Hence the security is always a concern
  • A data warehouse that leaks customer data is a privacy and public relations nightmare
Data flexibility:
  • The data is normally days or weeks old before it is actually used
  • Due to ad hoc nature of the queries, it is difficult to tune them for processing speed and query speed

Future of Data Warehousing:
  • Hadoop will serve as a great companion of data warehouse and will be used to share the heaviest workloads and larger volume of data
  • A data warehouse of customer information can be used for sentiment analysis, personalization, marketing automation, sales, and customer service  
  • Data warehouses hold some of the most valuable data for any organization to grow and stay competitive. Thus, the dependency of each organization will increase by a huge extent and data warehousing will play a huge role in contributing to any decision making
  • Enterprise data warehouses will face huge changes from the world of data warehouse automation. Just like we no longer “hand code” ETL scripts, it is foreseen that 2015 as the year that productization of data modeling and database administration to speed up “time to implementation”
  • Processing data and analytics in the cloud will become a requirement


Monday, February 2, 2015

BI tools evaluation

The following BI tools have been evaluated in this blog:

  • Tableau
  • Qlik
  • Microstrategy
  • Oracle
  • SAS


Tableau:


Tableau is a streamlined, user-friendly business intelligence solution that provides a simple, quick way for non-experts to access data and create their own dashboards in just a few clicks. Tableau is tailored to meet the needs of anyone looking to analyze and explore business data. It provides business intelligence that is actionable and insightful. One can learn the tool very quickly just by looking at the video tutorials and exploring the tool.

Pros:
  • Has a very intuitive UI, drag and drop tools that allow non-technical people use it with ease
  • Easy to learn
  • It can be integrated with R
  • It has ready-made drivers for many databases
  • Helps create instant, real-time dashboards
  • Can connect to cube-based data sources
  • Has an active online community
Cons:

  • The in-memory engine is not the fastest
  • The manual mapping of datatypes that are not recognized is cumbersome
  • Has trouble when working with large datasets
  • There is no option to create custom groups for different dimensions

Qlik:


Qlik is a self-service access BI tool built for non-technical professionals that utilizes both engaging graphics and data consolidation from multiple sources into a single place to greatly simplify data analysis.

Pros:
  • Good in-memory processor which speeds up the application
  • Combines with data sources with ease
  • Can be easily deployed and configured
  • Has a large number of partners
Cons:

  • The menus have too many tabs that lack logical structure
  • The visuals are not intuitive drag/ drop as Tableau
  • The online community is not very active
  • Support is not that good
  • Qlik Applications are constrained by how much RAM can be addressed in a single hardware box

Microstrategy:


Microstrategy is an enterprise BI application software vendor. It allows interactive dashboards, easy and intuitive control on data layout, alerts, automated reports, and supports web, desktop as well as mobile interfaces.

Pros:
  • Scalable and can be used across all platforms like mobile, desktop, cloud etc.
  • Capable of handling complex enterprise requirements
  • The SDK allows customization of applications
  • Supports offline access to data
Cons:
  • The online community is dormant
  • The development speed is slow
  • The graphics that are obtained are unusable and formatting them to be presentable takes a long time

Oracle:


Oracle provides an all-in-one BI solution featuring eight components so that users don't have to worry about multiple software or higher cost.

Pros:
  • Supports big data capability
  • Very good training is provided
  • Can analyze large sets of data in a short time
  • Has a user-friendly interface
Cons:

  • Customizing the software requires significant investment of time
  • Has issues with respect to integration


SAS:


The SAS Visual Analytics is an in-memory data visualization tool that works well with both big and small data, providing robust query and reporting features, alerts, and predictive analytics.

Pros:
  • Integration is powerful
  • Has a huge market share
  • Extremely fast and efficient
  • Deals easily with large amounts of data
  • Support for R programming language
Cons:

  • Does not have a user friendly interface
  • Cost is on the higher side
  • The visualizations that are provided are not very aesthetic

Criteria:


Ease of Use

The users should be able to use the application easily with minimal support/ training
  • Integration
The application should integrate seamlessly with multiple data sources
  • Cost
The application should be cost effective
  • Customer support/ Online Community
The customer support should be responsive and the problems should be resolved in minimal number of calls. The online support community should be vast and solutions to issues that are faced should be found easily.
  • Performance
The application should give high performance and not crash while dealing with large datasets.

Criteria
Weight
Tableau
Qlik
Microstrategy
Oracle
SAS
Ease of use
40%
10
9
7
7
7
Integration
10%
9
9
8
7
9
Cost
20%
10
8
7.5
7
6.5
Customer Support/ Online Community
15%
10
8
6
7
7
Performance
15%
8
8
7
8
9
Points

9.6
8.5
7.05
7.15
7.4
Rank

1
2
5
4
3


                     



Based on the criteria mentioned by me, Tableau claims the number one spot among the BI tools that have been evaluated.