Keerthi T

  • Hadoop Developer
  • Detroit, MI
  • Member Since Jun 14, 2023

Candidates About

 

Keerthi T

Professional Summary:

·         Over 5+ years of professional IT experience which includes around 3+ years of hands-on experience in Hadoop using Cloudera, Hortonworks.

·         Experience in installation, configuration, management and deployment of Hadoop Cluster, HDFS, Map Reduce, Pig, Hive, Sqoop, Apache Storm, Flume, Oozie, HBase and Zookeeper.

·         In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce, Spark.

·         Expertise in setting up Hadoop on Pseudo distributed environment and Hive, Pig, HBase and Sqoop on Ubuntu Operating System.

·         Extensive experience in Apache Spark, Spark Streaming, Spark SQL and No SQL databases like Cassandra and HBase.

·         Expertise in execution of Batch jobs through the data streams through SPARK Streaming.

·         Excellent hands-on experience in working with different Hadoop file formats like Sequence File, RC File, ORC, AVRO & Parquet.

·         Analyzing data through Hive QL, Pig Latin & MapReduce programs in Java. Extending HIVE and PIG core functionalities by implementing custom UDF’s.

·         Hands-on experience in Implementing, partitioning and bucketing in Hive for more efficient querying of data.

·         Responsible for troubleshooting and development on Hadoop technologies like HDFS, Hive, Pig, Flume, MongoDB, Sqoop, Zookeeper, Spark, MapReduce2, YARN, HBase, Tez, Kafka and Storm.

·         Experience in importing and exporting Teradata using Sqoop from HDFS to RDBMS & vice versa.

·         Extensive practical experience in incremental import by creating Sqoop metastore jobs.

·         Experience in using Apache Flume for collecting, aggregation, moving large amount of data from application server and handling variety of data using streaming and velocity of data.

·         Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like Flat files, XML files and Databases.

·         Used Informatica for ETL processing based on business needs and extensively used Oozie workflow engine to run multiple Hive and Pig jobs.

·         Solid experience in developing workflow using Oozie for running Map Reduce jobs and Hive Queries.

·         Experience in managing and reviewing Hadoop log files.

·         Responsible in performing advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.

·         Strong expertise in Unix Scripting and running Python & Scala Scripts.

·         Proficient in programming with Java/J2EE and strong experience in technologies such as JSP, Servlets, Struts, Spring, Hibernate, EJS’s, Session Beans, JDBC, JavaScript, HTML, JavaScript Libraries and Web Services.

·         Experience in Requirements Gathering/Analysis, Design, Development, Versioning, Integration, Documentation, Testing, Build and Deployment.

·         Efficient in packaging & deploying J2EE applications using ANT, Maven & Cruise Control on WebLogic, WebSphere & JBoss.

·         Experience in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.

·         Experience in using Jenkins and Maven to compile the package and deploy to the Application Servers.

·         Deployment, Distributed and Implementation of Enterprise applications in J2EE environment

·         Good Understanding of bootstrap, spring rest and integration.

·         Strong Knowledge of Version Control Systems like SVN, GIT & CVS.

·         Familiar with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused, adaptive and quick learner with excellent interpersonal, technical and communication skills.

Technical Skills:

 

Big Data Technologies

Hadoop 1.x/2.x(Yarn), HDFS, MapReduce, Pig, Hive, HBase, Cassandra, Zookeeper, Oozie, Sqoop, Flume, HCatalog, Apache Spark, Scala, Impala, Kafka, Storm, Tez, Ganglia, Nagios

Hadoop Distributions

CloudEra, Horton Works, AWS

Operating Systems

Windows, Macintosh, Linux, Ubuntu, Unix, CentOS.

Programming Languages

C, JAVA, J2EE, SQL, PigLatin, HiveQL, Scala, Python, Unix Shell Scripting

Java Technologies

JSP, Servlets, Spring, Hibernate, Maven

Databases

MS-SQL, Oracle, MS-Access, NoSQL, MySQL

Reporting Tools/ETL Tools

Tableau, Informatica, Data stage, Talend, Pentaho, Power View

Methodologies

Agile/Scrum, Waterfall, DevOps

Development Tools

Eclipse, NetBeans, IntelliJ, Hue, Microsoft Office Suite (Word, Excel, PowerPoint, Access)

 

Professional Experience:

Client: GM/OnStar, Detroit, MI                                                                                    Feb 2017 to Aug 2017

Role: Hadoop Developer                                                      

Description: OnStar Corporation is a subsidiary of General Motors that provides subscription-based communications, in-vehicle security, hands-free calling, turn-by-turn navigation and remote diagnostic systems throughout the United States.

 

Responsibilities:

·         Responsible for installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on the Hadoop Cluster.

·         Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig Scripts on data.

·         Designed workflows and coordinators in Oozie to automate and parallelize Hive and Pig jobs in Cloudera Hadoop (CDH 5.8.0).

·         Gained familiarity with both HUE UI as well as HIVE CLI for accessing HDFS files and data.

·         Involved in developing Hive DDLs to create, alter and drop Hive tables and Storm &amp, Kafka.

·         Developed a data pipeline using Kafka and Storm to store data into HDFS.

·         Developed Hive UDF to parse the staged raw data to get the item details from a specific store.

·         Built re-usable Hive UDF libraries for business requirements which enabled users to use these UDF’s in Hive querying.

·         Designed workflow by scheduling Hive processes for Log file data which is streamed into HDFS using Flume.

·         Developed Hive (version 0.11.0.2) and Impala (2.1.0 & 1.3.1) for end user/ analyst requirements to perform hoc analysis.

·         Involved in building the runnable jars for the module framework through Maven clean & Maven dependencies.

·         Tested ApacheTez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.

·         Written multiple MapReduce program in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.

·         Developed SQL scripts to compare all the records for every field and table at each phase of the data movement process from the original source system to the final target.

·         Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and gained experience in using Spark-Shell and Spark Streaming.

·         Responsible for continuous monitoring and managing the Hadoop Cluster using Cloudera Manager.

·         Loaded and transformed large sets of structured, semi-structured and unstructured data.

·         Indulged in regular stand-ups meetings, status calls, Business owner meetings with stake holders, Risk management teams in an Agile environment.

·         Supported code/design analysis, strategy development and project planning.

·         Followed Scrum implementation of scaled agile methodology for entire project.

 

Environment: Cloudera Hadoop Cluster, Unix Servers, Shell Scripting, Java Map Reduce, Hive, Storm, Sqoop, Flume, Oozie, Kafka, Git, Eclipse, Tableau.

 

Client: Blue Shield of California, San Francisco, CA                                                  May 2016 to Dec 2016

Role: Hadoop Developer

Description: Blue Shield of California is a health plan provider that serves over 4 million health plan members and nearly 65,000 physicians across the state. It is a non-profit health plan dedicated to providing Californians with access to high-quality health care at an affordable price.

 

Responsibilities:

·         Worked on live 60 nodes Hadoop Cluster running CDH5.4.4, CHD5.2.0, CDH5.2.1

·         Worked on Hadoop cluster using different Bigdata analytic tools including Kafka, Pig, Hive and Map Reduce.

·         Developed simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig.

·         Implemented data access jobs through Pig, Hive, HBase (0.98.0), Storm (0.91)

·         Involved in loading data from LINUX file system to HDFS

·         Importing and exporting data into HDFS and Hive using Sqoop.

·         Altered existing Scala programs to enhance performance and obtain partitioned results Spark tool.

·         Worked on processing unstructured data using Pig and Hive.

·         Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.

·         Used Impala to read, write and query the Hadoop data in HDFS or HBase.

·         Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.

·         Developed Pig Latin Scripts to extract data from the web server output files to load into HDFS.

·         Responsible in taking backups and restoration of Tableau repository.

·         Converted ETL operations to Hadoop system using Pig Latin operations, transformations and functions.

·         Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.

·         Exported the result set from Hive to MySQL using Shell Scripts.

·         Actively involved in code review and bug fixing for improving the performance.

 

Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Storm, Kafka, LINUX, Hortonworks distribution, Bigdata, Java APIs, Java collection, SQL, NoSQL, MongoDB.

 

Client: Vanguard, Malvern, PA                                                                                  Sep 2015 to Apr 2016

Role: Hadoop Administrator/Developer

Description: The Vanguard Group is an American Investment management company which is the largest provider of mutual funds and exchange-traded funds in the world. Vanguard also provides brokerage services, asset management, educational account services and trust services.

 

Responsibilities:

·         Responsible for installation, configuration, maintenance, monitoring, performance tuning and troubleshooting Hadoop Clusters in different environments such as Development Cluster, Test Cluster and Production.

·         Used Job Tracker to assign MapReduce tasks to Task Tracker in cluster of nodes.

·         Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.

·         Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.

·         Implemented Kerberos security in all environments.

·         Defined file system layout and data set permissions.

·         Implemented Capacity Scheduler to share the resources of the cluster for the MapReduce jobs given by the users.

·         Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.

·         Involved in loading data from Linux and Unix file system to HDFS.

·         Involved in emitting processed data from Hadoop to relational databases or external file systems using Sqoop, HDFS GET or Copy to Local.

·         Involved in Cluster planning and setting up the multimode cluster.

·         Used Gangila to monitor and Nagios to send alerts about the cluster around the clock.

·         Commissioned and Decommissioned nodes from time to time.

·         Involved in HDFS maintenance and administering it through HDFS-Java API.

·         Worked with Hadoop developers and designers in troubleshooting MapReduce job failures and issues.

 

Environment: Hadoop 1.2.1, MapReduce, HDFS, Pig, Hive, Sqoop, Cloudera Hadoop Distribution, HBase, Windows NT, LINUX, UNIX Shell Scripting.

 

Client: SumTotal Systems, Hyderabad, India                                                                 Oct 2014 to Jul 2015

Role: Hadoop Administrator/Developer

Description: SumTotal Systems, Inc. is a software company that provides human resource management software and services to private and public-sector organizations. The company delivers solutions through multiple-cloud based channels, including Software as a Service (SaaS), Hosted Subscription and premises-based licensure.

 

Responsibilities:

·         Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.

·         Imported and exported data into HDFS from Oracle database and vice versa using Sqoop.

·         Installed and configured Hadoop Cluster for major Hadoop distributions.

·         Used Hive and Pig as an ETL tool for event joins, filters, transformations and pre-aggregations.

·         Created partitions, bucketing across state in Hive to handle structured data.

·         Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data such as removing personal information or merging many small files into a handful of very large, compressed files using Pig pipelines in the data preparation stage.

·         Involved in moving all log files generated from various sources to HDFS for further processing through Kafka, Flume &amp and process the files by using Piggybank.

·         Extensively used PIG to communicate with Hive using HCatalog and HBase using Handlers.

·         Used SparkSQL for Scala &amp, Python interface that automatically converts RDD case classes to schema RDD.

·         Used SparkSQL to read and write table which are stored in Hive.

·         Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MangoDB.

·         Created tables, secondary indices, join indices viewed in Teradata development environment for testing.

·         Captured data logs from web server into HDFS using Flume &amp for analysis.

·         Managed and reviewed Hadoop log files.

 

Environment: Hive, Pig, MapReduce, Spark, Sqoop, Oozie, Flume, Kafka, Storm, HBase, Unix, Linux, Python, SQL, Hadoop 1.x, HDFS, GitHub, Talend, Python Scripting.

 

Client: ZenQ, Hyderabad, India                                                                                      Aug 2012 to Sep 2014

Role: Java Developer                        

Description: ZenQ is the leading provider of pure-play software testing services to clients across the globe. The company offers highest quality and efficient solutions to help the clients build quality products and solutions.

 

Responsibilities:

·         Involved in development of JavaScript code for client-side validations.

·         Developed the HTML based web pages for displaying the reports.

·         Developed front-end screens using JSP, HTML, jQuery, JavaScript and CSS.

·         Performed data validation in Struts from beans and Action Classes.

·         Developed dynamic content of presentation layer using JSP.

·         Accessed stored procedures and functions using JDBC Callable statements.

·         Involved in designing use-case diagrams, class diagrams and interaction using UML model with Rational Rose.

·         Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data.

·         Developed coding using SQL, PL/SQL, Queries, Joins, Views, Procedures/Functions, Triggers and Packages.

·         Developed Web Applications with rich internet applications using Java applets, Silverlight, Java.

·         Used JDBC for database access.

·         Played a key role in the high-level design for the implementation of the application.

·         Designed and established the process and mapping the functional requirement to the workflow process.

 

Environment: Java, Servlets, Java Beans, JSP, EJB, J2EE, STRUTS, XML, XSLT, JavaScript, HTML, CSS, Spring 3.2, SQL, PL/SQL, MS Visio, Eclipse, JDBC, Windows XP.

 

Education:

·         Masters in Computer Science from Oklahoma City University, Oklahoma City, OK

·         Bachelors in Information Technology from Kakatiya Institute of Technology and Science, India