Keerthi T
- Hadoop Developer
- Detroit, MI
- Member Since Jun 14, 2023
Keerthi T
Professional Summary:
· Over 5+ years of professional IT experience which includes around 3+ years of hands-on experience in Hadoop using Cloudera, Hortonworks.
· Experience in installation, configuration, management and deployment of Hadoop Cluster, HDFS, Map Reduce, Pig, Hive, Sqoop, Apache Storm, Flume, Oozie, HBase and Zookeeper.
· In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce, Spark.
· Expertise in setting up Hadoop on Pseudo distributed environment and Hive, Pig, HBase and Sqoop on Ubuntu Operating System.
· Extensive experience in Apache Spark, Spark Streaming, Spark SQL and No SQL databases like Cassandra and HBase.
· Expertise in execution of Batch jobs through the data streams through SPARK Streaming.
· Excellent hands-on experience in working with different Hadoop file formats like Sequence File, RC File, ORC, AVRO & Parquet.
· Analyzing data through Hive QL, Pig Latin & MapReduce programs in Java. Extending HIVE and PIG core functionalities by implementing custom UDF’s.
· Hands-on experience in Implementing, partitioning and bucketing in Hive for more efficient querying of data.
· Responsible for troubleshooting and development on Hadoop technologies like HDFS, Hive, Pig, Flume, MongoDB, Sqoop, Zookeeper, Spark, MapReduce2, YARN, HBase, Tez, Kafka and Storm.
· Experience in importing and exporting Teradata using Sqoop from HDFS to RDBMS & vice versa.
· Extensive practical experience in incremental import by creating Sqoop metastore jobs.
· Experience in using Apache Flume for collecting, aggregation, moving large amount of data from application server and handling variety of data using streaming and velocity of data.
· Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like Flat files, XML files and Databases.
· Used Informatica for ETL processing based on business needs and extensively used Oozie workflow engine to run multiple Hive and Pig jobs.
· Solid experience in developing workflow using Oozie for running Map Reduce jobs and Hive Queries.
· Experience in managing and reviewing Hadoop log files.
· Responsible in performing advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
· Strong expertise in Unix Scripting and running Python & Scala Scripts.
· Proficient in programming with Java/J2EE and strong experience in technologies such as JSP, Servlets, Struts, Spring, Hibernate, EJS’s, Session Beans, JDBC, JavaScript, HTML, JavaScript Libraries and Web Services.
· Experience in Requirements Gathering/Analysis, Design, Development, Versioning, Integration, Documentation, Testing, Build and Deployment.
· Efficient in packaging & deploying J2EE applications using ANT, Maven & Cruise Control on WebLogic, WebSphere & JBoss.
· Experience in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
· Experience in using Jenkins and Maven to compile the package and deploy to the Application Servers.
· Deployment, Distributed and Implementation of Enterprise applications in J2EE environment
· Good Understanding of bootstrap, spring rest and integration.
· Strong Knowledge of Version Control Systems like SVN, GIT & CVS.
· Familiar with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused, adaptive and quick learner with excellent interpersonal, technical and communication skills.
Technical Skills:
Big Data Technologies |
Hadoop 1.x/2.x(Yarn), HDFS, MapReduce, Pig, Hive, HBase, Cassandra, Zookeeper, Oozie, Sqoop, Flume, HCatalog, Apache Spark, Scala, Impala, Kafka, Storm, Tez, Ganglia, Nagios |
Hadoop Distributions |
CloudEra, Horton Works, AWS |
Operating Systems |
Windows, Macintosh, Linux, Ubuntu, Unix, CentOS. |
Programming Languages |
C, JAVA, J2EE, SQL, PigLatin, HiveQL, Scala, Python, Unix Shell Scripting |
Java Technologies |
JSP, Servlets, Spring, Hibernate, Maven |
Databases |
MS-SQL, Oracle, MS-Access, NoSQL, MySQL |
Reporting Tools/ETL Tools |
Tableau, Informatica, Data stage, Talend, Pentaho, Power View |
Methodologies |
Agile/Scrum, Waterfall, DevOps |
Development Tools |
Eclipse, NetBeans, IntelliJ, Hue, Microsoft Office Suite (Word, Excel, PowerPoint, Access) |
Professional Experience:
Client: GM/OnStar, Detroit, MI Feb 2017 to Aug 2017
Role: Hadoop Developer
Description: OnStar Corporation is a subsidiary of General Motors that provides subscription-based communications, in-vehicle security, hands-free calling, turn-by-turn navigation and remote diagnostic systems throughout the United States.
Responsibilities:
· Responsible for installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on the Hadoop Cluster.
· Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig Scripts on data.
· Designed workflows and coordinators in Oozie to automate and parallelize Hive and Pig jobs in Cloudera Hadoop (CDH 5.8.0).
· Gained familiarity with both HUE UI as well as HIVE CLI for accessing HDFS files and data.
· Involved in developing Hive DDLs to create, alter and drop Hive tables and Storm &, Kafka.
· Developed a data pipeline using Kafka and Storm to store data into HDFS.
· Developed Hive UDF to parse the staged raw data to get the item details from a specific store.
· Built re-usable Hive UDF libraries for business requirements which enabled users to use these UDF’s in Hive querying.
· Designed workflow by scheduling Hive processes for Log file data which is streamed into HDFS using Flume.
· Developed Hive (version 0.11.0.2) and Impala (2.1.0 & 1.3.1) for end user/ analyst requirements to perform hoc analysis.
· Involved in building the runnable jars for the module framework through Maven clean & Maven dependencies.
· Tested ApacheTez, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
· Written multiple MapReduce program in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
· Developed SQL scripts to compare all the records for every field and table at each phase of the data movement process from the original source system to the final target.
· Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and gained experience in using Spark-Shell and Spark Streaming.
· Responsible for continuous monitoring and managing the Hadoop Cluster using Cloudera Manager.
· Loaded and transformed large sets of structured, semi-structured and unstructured data.
· Indulged in regular stand-ups meetings, status calls, Business owner meetings with stake holders, Risk management teams in an Agile environment.
· Supported code/design analysis, strategy development and project planning.
· Followed Scrum implementation of scaled agile methodology for entire project.
Environment: Cloudera Hadoop Cluster, Unix Servers, Shell Scripting, Java Map Reduce, Hive, Storm, Sqoop, Flume, Oozie, Kafka, Git, Eclipse, Tableau.
Client: Blue Shield of California, San Francisco, CA May 2016 to Dec 2016
Role: Hadoop Developer
Description: Blue Shield of California is a health plan provider that serves over 4 million health plan members and nearly 65,000 physicians across the state. It is a non-profit health plan dedicated to providing Californians with access to high-quality health care at an affordable price.
Responsibilities:
· Worked on live 60 nodes Hadoop Cluster running CDH5.4.4, CHD5.2.0, CDH5.2.1
· Worked on Hadoop cluster using different Bigdata analytic tools including Kafka, Pig, Hive and Map Reduce.
· Developed simple to complex MapReduce streaming jobs using Python language that are implemented using Hive and Pig.
· Implemented data access jobs through Pig, Hive, HBase (0.98.0), Storm (0.91)
· Involved in loading data from LINUX file system to HDFS
· Importing and exporting data into HDFS and Hive using Sqoop.
· Altered existing Scala programs to enhance performance and obtain partitioned results Spark tool.
· Worked on processing unstructured data using Pig and Hive.
· Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
· Used Impala to read, write and query the Hadoop data in HDFS or HBase.
· Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
· Developed Pig Latin Scripts to extract data from the web server output files to load into HDFS.
· Responsible in taking backups and restoration of Tableau repository.
· Converted ETL operations to Hadoop system using Pig Latin operations, transformations and functions.
· Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
· Exported the result set from Hive to MySQL using Shell Scripts.
· Actively involved in code review and bug fixing for improving the performance.
Environment: Hadoop, HDFS, Pig, Hive, Map Reduce, Sqoop, Storm, Kafka, LINUX, Hortonworks distribution, Bigdata, Java APIs, Java collection, SQL, NoSQL, MongoDB.
Client: Vanguard, Malvern, PA Sep 2015 to Apr 2016
Role: Hadoop Administrator/Developer
Description: The Vanguard Group is an American Investment management company which is the largest provider of mutual funds and exchange-traded funds in the world. Vanguard also provides brokerage services, asset management, educational account services and trust services.
Responsibilities:
· Responsible for installation, configuration, maintenance, monitoring, performance tuning and troubleshooting Hadoop Clusters in different environments such as Development Cluster, Test Cluster and Production.
· Used Job Tracker to assign MapReduce tasks to Task Tracker in cluster of nodes.
· Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
· Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
· Implemented Kerberos security in all environments.
· Defined file system layout and data set permissions.
· Implemented Capacity Scheduler to share the resources of the cluster for the MapReduce jobs given by the users.
· Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
· Involved in loading data from Linux and Unix file system to HDFS.
· Involved in emitting processed data from Hadoop to relational databases or external file systems using Sqoop, HDFS GET or Copy to Local.
· Involved in Cluster planning and setting up the multimode cluster.
· Used Gangila to monitor and Nagios to send alerts about the cluster around the clock.
· Commissioned and Decommissioned nodes from time to time.
· Involved in HDFS maintenance and administering it through HDFS-Java API.
· Worked with Hadoop developers and designers in troubleshooting MapReduce job failures and issues.
Environment: Hadoop 1.2.1, MapReduce, HDFS, Pig, Hive, Sqoop, Cloudera Hadoop Distribution, HBase, Windows NT, LINUX, UNIX Shell Scripting.
Client: SumTotal Systems, Hyderabad, India Oct 2014 to Jul 2015
Role: Hadoop Administrator/Developer
Description: SumTotal Systems, Inc. is a software company that provides human resource management software and services to private and public-sector organizations. The company delivers solutions through multiple-cloud based channels, including Software as a Service (SaaS), Hosted Subscription and premises-based licensure.
Responsibilities:
· Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
· Imported and exported data into HDFS from Oracle database and vice versa using Sqoop.
· Installed and configured Hadoop Cluster for major Hadoop distributions.
· Used Hive and Pig as an ETL tool for event joins, filters, transformations and pre-aggregations.
· Created partitions, bucketing across state in Hive to handle structured data.
· Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data such as removing personal information or merging many small files into a handful of very large, compressed files using Pig pipelines in the data preparation stage.
· Involved in moving all log files generated from various sources to HDFS for further processing through Kafka, Flume & and process the files by using Piggybank.
· Extensively used PIG to communicate with Hive using HCatalog and HBase using Handlers.
· Used SparkSQL for Scala &, Python interface that automatically converts RDD case classes to schema RDD.
· Used SparkSQL to read and write table which are stored in Hive.
· Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MangoDB.
· Created tables, secondary indices, join indices viewed in Teradata development environment for testing.
· Captured data logs from web server into HDFS using Flume & for analysis.
· Managed and reviewed Hadoop log files.
Environment: Hive, Pig, MapReduce, Spark, Sqoop, Oozie, Flume, Kafka, Storm, HBase, Unix, Linux, Python, SQL, Hadoop 1.x, HDFS, GitHub, Talend, Python Scripting.
Client: ZenQ, Hyderabad, India Aug 2012 to Sep 2014
Role: Java Developer
Description: ZenQ is the leading provider of pure-play software testing services to clients across the globe. The company offers highest quality and efficient solutions to help the clients build quality products and solutions.
Responsibilities:
· Involved in development of JavaScript code for client-side validations.
· Developed the HTML based web pages for displaying the reports.
· Developed front-end screens using JSP, HTML, jQuery, JavaScript and CSS.
· Performed data validation in Struts from beans and Action Classes.
· Developed dynamic content of presentation layer using JSP.
· Accessed stored procedures and functions using JDBC Callable statements.
· Involved in designing use-case diagrams, class diagrams and interaction using UML model with Rational Rose.
· Implemented Hibernate to persist the data into Database and wrote HQL based queries to implement CRUD operations on the data.
· Developed coding using SQL, PL/SQL, Queries, Joins, Views, Procedures/Functions, Triggers and Packages.
· Developed Web Applications with rich internet applications using Java applets, Silverlight, Java.
· Used JDBC for database access.
· Played a key role in the high-level design for the implementation of the application.
· Designed and established the process and mapping the functional requirement to the workflow process.
Environment: Java, Servlets, Java Beans, JSP, EJB, J2EE, STRUTS, XML, XSLT, JavaScript, HTML, CSS, Spring 3.2, SQL, PL/SQL, MS Visio, Eclipse, JDBC, Windows XP.
Education:
· Masters in Computer Science from Oklahoma City University, Oklahoma City, OK
· Bachelors in Information Technology from Kakatiya Institute of Technology and Science, India