Harsha D

  • Sr. Hadoop Developer
  • St. Louis, MO
  • Member Since Jun 14, 2023

Candidates About

 

Harsha D

PROFESSIONAL SUMMARY:

·         Adept and experienced Hadoop developer with over 9 years of experience in programming world and 5 years of proficiency in Hadoop ecosystem and Bigdata systems

·         In-depth experience and solid subjective knowledge of HDFS, Map Reduce, Hive, Pig, Sqoop, Yarn/MRv2, Spark, Kafka, Impala, HBase and Oozie.

·         Currently working on Spark and Spark Streaming frameworks extensively using Scala as the main programming language.

·         Used Spark Data frames, Spark-SQL and RDD API of Spark for performing various data transformations and dataset building.

·         Extensively worked on Spark Streaming and Apache Kafka to fetch live stream data.

·         Has strong fundamental understanding of distributed computing and distributed storage concepts for highly scalable data engineering.

·         Worked with Pig and Hive and developed custom UDF’s for building various datasets.

·         Worked on MapReduce framework using Java programming language extensively.

·         Strong experience troubleshooting and performance fine-tuning spark, MapReduce and hive applications.

·         Worked with Click Stream Data extensively for creating various behavioral patterns of the visitors and allowing data science team to run various predictive models.

·         Worked on No-SQL data-stores, primarily HBase using the Java API of HBase and Hive Integration.

·         Extensively worked on data migrations from diversified databases into HDFS and Hive using Sqoop.

·         Implemented Dynamic Partitions and Buckets in HIVE for efficient data access.

·         Significant experience in working with cloud environment like AMAZON WEB SERVICES (AWS) EC2 and S3.

·         Strong expertise in Unix shell script programming.

·         Expertise in creating Shell-Scripts, Regular Expression and Cron Automation.

·         Dexterous in visualizing data using Tableau, QlikView, MicroStrategy and MS Excel.

·         Exposure to Mesos, Zookeeper cluster environment for application deployments and dock containers

·         Knowledge on Enterprise Data Warehouse (EDW) architecture and various data modeling concepts like star schema, snowflake schema and Teradata.

·         Highly proficient in Scala programming Knowledge

·         Experience with web technologies which include HTML, CSS, Java Script, Ajax, JSON and frameworks like J2EE, Angular JS, Spring.

·         Good Knowledge in REST Webservices, SOAP programming, WSDL, XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.

·         Acquaintance with Agile and Waterfall methodologies. Responsible for handling several clients facing meetings with great communication skills.

·         Good experience in Customer support role as Training, resolving production issues based on priority.

 

TECHNICAL SKILLS:

Hadoop Ecosystems

HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, YARN, Oozie, Zookeeper, Impala, Spark, Spark SQL, Spark Streaming, Storm, HUE, SOLR

Languages

C, C++, Java, Scala, Python, Swift, C#, SQL, PL/SQL

Frameworks

J2EE, Spring, Hibernate, Angular JS

Web Technologies

HTML, CSS, Java script, jQuery, Ajax, XML, WSDL, SOAP, REST API

No-SQL

HBase, Cassandra, Mongo DB

Security

Kerberos, OAuth

Cluster Management and Monitoring

Coudera Manager, Hortonworks Ambari, Apache Mesos

Relational Databases

Oracle 11g, MySQL, SQL-Server, Teradata

Development Tools

Eclipse, NetBeans, Visual Studio, IntelliJ IDEA, XCode

Build Tools

ANT, Maven, sbt, Jenkins

Application Server

Tomcat 6.0, WebSphere7.0

Business Intelligence Tools

Tableau, Informatica, Splunk, Qlik View

Version Control

GitHub, Bit Bucket, SVN

 

EDUCATION DETAILS:

Bachelors of Engineering                                                                                                                Osmania University

PROFESSIONAL EXPERIENCE:

Client                    :               Wells Fargo                                                                                     Nov 2015 - Present

Location               :               St. Louis, MO                                                

Role                       :              Sr. Hadoop Developer

 

Project Description: Wells Fargo & company is an international banking and financial services holding company. Project involved analysis on huge datasets for company banking sector utilization and optimization. I am part of digital marketing group. We analyzed the data from all transactions to find out the customer and company profitable plan with services which assists in determining new strategic plans into market.

Responsibilities:

·         Ingested Click-Stream data from FTP servers and S3 buckets using custom Input Adaptors.

·         Designed and developed Spark jobs to enrich the click stream data.

·         Implemented Spark jobs using Scala, used Spark SQL to access hive tables into spark for faster processing of data.

·         Involved in performance tuning of Spark jobs using cache and using complete advantage of cluster environment.

·         Worked with Data-science team to gather requirements for data mining projects.

·         Developed Kafka Producer and Spark Streaming consumer for working with Live Click Stream feeds.

·         Worked on different file formats (PARQUET, TEXTFILE) and different compression codecs (GZIP, SNAPPY, LZO).

·         Written complex Hive queries involving external dynamic partitioned on Hive tables which stores rolling window time-period user viewing history.

·         Worked with data science team to build various predictive models with Spark MLLIB.

·         Experience in troubleshooting various Spark applications using spark-shell, spark-submit.

·         Good experience in writing Map Reduce programs in Java on MRv2 / YARN environment.

·         Developed java code to generate, compare and merge Avro schema files.

·          Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and Python.

·         Designed and developed External and Managed Hive Tables with data formats such as Text, Avro, Sequence File, RC, ORC, Parquet.

·         Implemented Spark RDD transformations, actions to migrate Map reduce algorithms.

·         Implemented Sqoop job to perform import / incremental import of data from any relational tables into Hadoop in different formats such as text, Avro and Sequence into Hive table.

·         Developed ETL scripts for Data acquisition and Transformation using Talend.

·         Good hands on experience in writing HQL statements as per the requirement.

·         Involved in designing and developing tables in HBase and storing aggregated data from Hive table.

·         Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud AWS S3 and used Elastic Map Reduce(EMR) to run a Map-Reduce.

·         Responsible in analysis, design, testing phases and responsible for documenting technical specifications.

·         Coordinated effectively with offshore team and managed project deliverable on time.

·         Used Impala and Tableau to create various reporting dashboards.

Environment: Spark, Hive, Impala, Sqoop, HBase, Tableau, Scala, Talend, Eclipse, YARN, Oozie, Java,                               Cloudera Distro, Kerberos.

 

Client                    :               USAA                                                                                               Aug 2014- Oct 2015

Location               :               San Antonio, TX                                              

Role                       :              Sr. Hadoop Developer

Project Description: United Services Automobile Association (USAA) is a Texas-based diversified financial services group of companies and Department of insurance regulated reciprocal inter-insurance exchange and subsidiaries offering banking, investing, insurance to people. The objective of the project is to migrate all the data warehousing data to Hadoop platform and perform ETL transformations.

Responsibilities:

·         Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.

·         Import the data from different sources like HDFS/HBase into Spark RDD.

·         Issued SQL queries via Impala to process the data stored in HBase and HDFS.

·         Used SparkCassandra Connector to load data to and from Cassandra.

·         Used Fast Load for loading into empty tables.

·         Wrote python scripts to parse XML documents and load the data in database.

·         Good Experience with Amazon AWS for accessing Hadoop cluster components.

·         Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers, and Kafka brokers.

·         Implemented modules using Core Java APIs, Java collection and integrating the modules.

·         Loading data from different source (database & files) into Hive using Talend Tool.

·         Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows

·         Knowledge in developing Nifi flow prototype for data ingestion in HDFS.

·         Good Experience in building highly scalable Bigdata solutions using Hadoop and Distributions like Horton works.

·         Developed Custom Input Formats in MapReduce jobs to handle custom file formats and to convert them into key-value pairs.

·         Responsible for building scalable distributed data solutions using Hadoop.

·         Written custom writable classes for Hadoop Serialization and De-serialization of time series Tuples.

·         Developed Sqoop import Scripts for importing reference data from Netezza.

·         Used Shell scripting for Jenkins job automation with Talend.

·         Created Hive external tables on the map reduce output before partitioning, bucketing is applied on top of it.

·         Comprehensive Knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, Scrum data manipulation.

·         Worked with Data Governance team to ensure metadata management and best practices

·         Implemented Daily Cron – jobs that automate parallel tasks of loading the data into HDFS and pre -processing with Pig using Oozie coordinator jobs.

·         Cluster coordination services through Zookeeper.

·         Worked with BI teams in generating reports and designing ETL workflows on Tableau.

Environment: Apache Hadoop, Hive, Scala, PIG, HDFS, Horton works, Java Map-Reduce, Maven, GIT, Jenkins, Eclipse, Oozie, Sqoop, Flume, SOLR, Nifi, OAuth, Teradata, FastLoad, Multi Load, Netezza, Zookeeper.

 

Client                    :               Cerner Corporation                                                                  Aug 2013 – July 2014

Location               :               Kansas City, MO                                              

Role                       :               Hadoop Developer

Project Description: Cerner Corporation is an American supplier of health information technology (HIT) solutions, services, devices and hardware. Cerner as the world’s largest health informatics properties mediating petabyte-scale health data.

Responsibilities:

·         Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.

·         Involved in planning and implementation of an additional 10 node Hadoop clusters for data warehousing, historical data storage in HBase and sampling reports.

·         Used Sqoop extensively to import data from RDBMS sources into HDFS.

·         Performed Data transformations, cleaning, and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.

·         Developed Pig UDF’s to pre-process data for analysis.

·         Worked with business teams and created Hive queries for ad hoc process.

·         Responsible for creating Hive tables, partitions, loading data and writing Hive queries.

·         Created Pig Latin scripts to sort, group, join, filter the enterprise wise data.

·         Worked on Oozie to automate job flows.

·         Handled Avro and JSON data in Hive using Hive SerDe

·         Integrated Elastic search and implemented dynamic faceted search

·         Created MapReduce programs to handle semi/unstructured data like XML, JSON, AVRO data files and sequence files for log files

·         Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.

·         Worked in Agile Environment.

·         Communicated effectively and made sure that business problem is solved.

·         Creating files and tuned the SQL queries in Hive utilizing HUE.

·         Created the Hive external tables using Accumulo connector.

·         Generated summary reports utilizing Hive and Pig and exported these results via Sqoop for Business reporting and intelligence analysis.

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Java, Eclipse, SQL Server, Apache Flume, Shell Scripting, Zookeeper.

 

Client                    :                Acxiom                                                                                        Nov 2012 – July 2013

Location               :               Conway, AR                                              

Role                       :               Hadoop Developer

Project Description: Acxiom Corporation is a marketing technology and information management services, including multichannel marketing, addressable advertising and database management. The company wanted to retire the legacy SQL server database due to increasing customer base and growing data.

Responsibilities:

·         Developed complex MapReduce jobs in Java to perform data extraction, aggregation, transformation and performed rule checks on multiple file formats like XML, JSON, CSV.

·         Implemented schedulers on the Job tracker to share resources of the cluster for the MapReduce jobs given by cluster.

·         Used Sqoop to import and export the data from HDFS.

·         Moved data from HDFS to Cassandra using MapReduce and BulkOutputFormat class.

·         Participated with the admin team in designing and migrating the cluster from CDH to HDP.

·         Developed some helper class for abstracting Cassandra cluster connection act as core toolkit.

·         Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

·         Wrote Query Mappers and MQ Experience in Junit test Cases.

·         Created dashboards in Tableau to create meaningful metrics for decision making.

·         Involved in designing the next generation data architecture for the unstructured and semi structured data.

·         Worked with team which analyzes system failures, identifying the root cause and taking necessary action

Environment: HDFS, MapReduce, Cassandra, Pig, Hive, Sqoop, Maven, Log4j, Junit, Tableau

Client                    :                Polaris                                                                                          Jan 2009 – Oct 2012

Location               :               Chennai, India                                             

Role                       :               Java Developer

Project Description: Polaris is a financial technology product, legacy modernization services and consulting for core banking, corporate banking, and wealth and asset management insurance. This is an FPX Maintenance systems for handling all internal process like Users, Role based access, Merchants along with their modilewise charges and all reporting.

Responsibilities:

·         Involved in client meetings to gather the System requirements.

·         Generated Use case, class and sequence diagrams using Rational Rose.

·         Written Java Script, HTML, CSS, Servlets and JSP for designing GUI of the application.

·         Strong hands-on knowledge of Core JAVA, Web-based Applications and OOPS concepts.

·         Developed the application using Agile/Scrum methodology which involves daily stand ups.

·         Developed Server-Side technologies using Spring, Hibernate, Servlets/JSP, Multi-threading.

·         Extensively worked with the retrieval and manipulation of data from the Oracle database by writing queries using SQL and PL/SQL.

·         Implemented Persistence layer using Hibernate that uses the POJO’s to represent the persistence database.

·         Used JDBC to connect the J2EE server with the relational database.

·         Involved on development of RESTFUL web services using JAX-RS in spring based project.

·         Web application development by setting up an environment, configuring an application and Web Logic Application Server.

·         Implemented back-end service using spring annotations to retrieve user data information from the database.

·         Involved in writing AJAX scripts for the requests to process quickly.

·         Used Dependency injection feature and AOP features of Spring

·         Used and implemented Unit test cases using Junit Framework.

·         Issue findings from the production system and providing the information to the app support team.

·         Involved in the Bug Traise meetings in QA, UAT teams.

Environment: Spring, Hibernate, CSS, AJAX, HTML, Java Script, Rational Rose, UML, Junit, Servlets, JDBC, RESTful API, JSF, JSP, Oracle, SQL, PL/SQL.

 

 

Client                    :                Ideas                                                                                         April 2008 – Dec 2008

Location               :                Bangalore, India                                             

Role                       :               Java Developer

Project Description: Ideas is a commercial credit and invoice management hub providing cash flow visibility to both businesses and lenders.

Responsibilities:

·         Involved in various SDLC phases like Requirements gathering and analysis, Design, Development and Testing.

·         Developed the business methods as per the IBM Rational Rose UML Model.

·         Extensively used Core Java, Servlets, JSP and XML.

·         Used HQL, Native SQL and Criteria programming to retrieve from the database.

·         Understanding New Crs and Service requests and giving Development Estimation Time and designing the database according to the business requirement.

·         Writing client side and server side validations.<