Wednesday, October 12, 2016

Visualisation using d3.js based Sunburst with Apache Zeppelin

Zeppelin provides few default visual components (pie, bar, stacked, area, line chart, etc).
If users want either they can add a new default component or create visualisation using AngularJS interpreter.

I tried to create d3 based Sunburst for preparing a report in Apache Zeppelin.

It is easy and quick.

Apache Zeppelin display system adds additional div(s) and which creates some blank area on the screen. 

You can experience this as there is a blank area between sunburst and breadcrumbs in the bottom.


















********






Notebook for the above visualisation is available here which can be imported into Apache Zeppelin.
This contains the AngularJS source for sunburst visual.


Content which I downloaded from NSE historical data section and transformed it for demo purpose i.e. nsecombinedreport.csv can be downloaded from here.
This report is for the various Instrument Type, Security and the amount traded for a day.

Sunday, May 15, 2016

Focusing on implementing govt policies using the big data tool zeppelin

It was good to know from the goverment that it published lots of data collected over the period of time at https://data.gov.in/

I picked and amenities data about the villages from https://data.gov.in/catalog/village-amenities-census-2011 to do some analysis.

I believe govterment is doing sufficient analysis to find where and with what force it should use its machinery to promote its schemes.

I have been doing some analysis using the Apache Spark and eco system around it. But was interested in a quick visualization, which would help to understand the data quickly. A possible use would be using R as I wanted to build the reports quickly. I explored some of the capabilities of R and Shiny App in my earlier post of Custer Analysis of banking data.

Recently I came to know about a fantastic tool, its a web based notebook, with the in-built support for Apache-Spark, with a support of multiple langues like Scala, Python, spark sql and so on and most important that this it is opensource.
"Zeppelin" 
I picked one of the csv from the the whole data, and which is for one of the district in Karnataka state is Gulbarga and started doing some analysis.

Loading the data into the dataframe/table


It is easy to accomodate spark sql also in the notebook paragraph/sections.
Following is a very simple query to show the population spread in the villages of Gulbarga district.



Goverment make policies and spend money on that, and find the effectiveness of it based on the result. We can use the collected data to understand where should be the maximum penetration of the schemes, i.e. find the villages which needs the goverment schemes most. One of the example where goverment can initiates its policies to reduce the gap of male-female ratio, we can understand from the data available, where should be the more focus.





Changed the minbenchmark to 80% and same got updated on the fly



I stated to analyse this data to check for the education facilities in the villages which is in progress, would be publishing that information in later posts.

Installation details:
a) Zeppelin was deployed on Ubuntu VirtualBox with Windows as host.
b) Set your java home (1.7) before starting Zeppelin.
c) To start execute 'zeppelin-daemon.sh start' in the ZEPPELIN_HOME\bin