Learning Data Science : Hadoop for R's Data scientist

Dec 30, 2013

Hadoop for R's Data scientist

I don’t exactly know where to start. But, after a real pleasant discussion with one of my ex colleague, it seems that there are many thongs around Hadoop ecosystem and R for analyst that should be said by a data scientist, means that, someone who don’t know much more about big data architecture, but who should know the essentials about the simple architecture that can allow to run better analysis in the best conditions.

For sure, good knowledge about how R can use hadoop platform to run better analysis is very important.

A small plan :

- Hadoop ecosystem

- R and Hadoop

- Launch big data job on R using Hadoop

Hadoop ecosystem

We use Hadoop when :

- you need agility

- you need to perform analysis with diversity of data sources

- your architecture require to move on tim

- you need to reduce your costsI

Hadoop is :

An ecosystem

Designed for storage and analytical computation

Design for running analysis in parallel

When we talk about Hadoop, we deal with

HDFS (Hadoop Distributed File System : The core of solution)

Map Reduce :
Use data from HDFS and execute algorithm based on mar-reduce paradigm

High Level languages: Pig, Hive : Query language with embedded the map reduce paradigm to solve complex problems

HBase : the couch on the top of HDFS storage to build and manipulate data when random access memory is needed.

Pig or Hive : I don’t know. As you feel

You can get your own opinion reading this http://stackoverflow.com/questions/3356259/difference-between-pig-and-hive-why-have-both, but, it’s important to remind that as data scientist, it is always better to get structured data. When we have to do this, we should think about the best languages wich can suit for this quickly. Both languages run for sure map reduce paradigm and every operation is reduce to map and reduce.

Another thing to remind is:

– Hive: HQL is like SQL

– Pig: Pig Latin is a bit like Perl

In my next post, I will focus on Hive and Pig. For now, I just want to point that, Hive & Pig are components of Hadoop and Data scientist should know how to deal about

There’s a good post on Internet which explain how to install Hadoop and how to make it connected with R. In my opinion, the best is : Tutorial R and Hadoop except the part wich explain that we need homebrew to set Hadoop environment

Interact with Hadoop

In the terminal, we can build basic operations. For example, pick up data from cluster and load it into memory for analysis with R;

We can then load this data with R command

We can also do this in R terminal without dealing with the Mac's terminal
> system("~/hadoop/bin/hadoop fs -copyToLocal ~/hadoop/data/mag.csv ~/Documents/Recherches/test2.csv")

The main hadoop command can be find out there : http://hadoop.apache.org/docs/r0.18.3/hdfs_shell.html

A small map reduce job

Map reduce is a programming pattern witch aid in the parallel analysis of data.
The name is because of two parts of the algorithm :

map = to identify the subject of the data by key
reduce = group by key identified and run analysis

There is many packages to run map reduce job in R :

* HadoopStreaming
* Hive
* Rhipe
* RHadoop (with rmr2,Hbase, Rhdfs) are maintain by Revolution Analytics and provide some of good functions to interact with hadoop environment
etc... For sure, I overlook many others good packages.

Let us introduce how to run a simple map reduce job using RHadoop
suppose that we have this data :
> x=sample(rep(c("F","V","R"),10000),size=1000000,replace=T)
> df=data.frame(value=x, note=abs(9*rnorm(1000000)))
> head(df)
value      note
1     F 4.209874
2     V 9.587087
3     F 6.323354
4     V 9.274668
5     R 13.886767
6     V 5.273159
> dim(df)
[1] 1000000       2

And we want to determine witch value has "note" greater than the mean of all value
If we have to do this using R, we should run this :
> meanAll = mean(df$note)
> meanAll
[1] 7.18068
> meanGroup<-aggregate(x=df$note,by=list(df$value),FUN=mean)
> meanGroup
Group.1        x
1       F 7.170956
2       R 7.189213
3       V 7.181848
> index =meanGroup$x>=meanAll
> index
[1] FALSE TRUE TRUE
> meanGroup$Group.1[index]
[1] R V

If we want to do this with map reduce, we will do something like this
demo = to.dfs(demo)
monMap = function(k,v)
{
w <- v[2]
keyval(w,v[1])
}
monReduce<-function(k,val)
{
keyval(k, mean(val))
}
job<-mapreduce(input=demo, map =monMap,reduce = monReduce)
from.dfs(job)

Some helpful litterature around Map Reduce
http://www.asterdata.com/wp_mapreduce_and_the_data_scientist/
https://class.coursera.org/datasci-001/lecture/71
http://www.information-management.com/ad_includes/welcome_imt.html

In my next post, I will talk about Pig and Hive for preparing dataset beforme machine learning,

22 comments:

UnknownJuly 20, 2015 at 9:27 AM
There are lots of information about hadoop have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get to the next level in big data. Thanks for sharing this.

Hadoop training in velachery
Big data training in velachery
Hadoop training chennai velachery
ReplyDelete
Replies
AnonymousOctober 27, 2015 at 6:38 AM
From my view, the only difference between learning at hadoop online training and learning from the informative blogs like this is that, here in these blogs we can connect to more examples along with the videos which helps us to understand the subject to the point.
ReplyDelete
Replies
harikasri.blogspot.comOctober 20, 2018 at 1:20 PM
The blog is so interactive and Informative , i Request you to write more blogs like this Hadoop Administration Online Training
ReplyDelete
Replies
Dharani MNovember 30, 2018 at 7:42 AM
Nice information.....
angularjs training in Marathahalli

angularjs training institutes in Marathahalli

best angularjs training in Marathahalli

ReplyDelete
Replies
ashaDecember 1, 2018 at 4:31 AM
Great Blog.....
robotics courses in Bangalore

robotic process automation training in Bangalore

blue prism training in Bangalore

rpa training in Bangalore

automation anywhere training in Bangalore

ReplyDelete
Replies
SophiaJuly 29, 2019 at 1:52 PM
Nice blog about Using Hadoop for R's Data scientist. Thanks for share with us. If you need data management services then ConroyCreativeCounsel is the best place for you.

ReplyDelete
Replies
CETPA Infotech Pvt. Ltd.October 11, 2023 at 3:39 PM
Impressive posting, really liked reading it. I like your writing style, it’s quite unique. Thanks for sharing the information here. Become a Certified DevOps Expert with Comprehensive DevOps Training
ReplyDelete
Replies
Melissa OwenMarch 12, 2024 at 9:32 AM
Investing in YouTube subscribers in rupees is a strategic decision for creators aiming to amplify their digital presence on a budget. This approach addresses the challenges of organic growth by providing an immediate lift in subscriber counts, enhancing the channel's visibility and appeal. With tailored packages, creators can choose an investment that fits their budget, ensuring an affordable path to boosting their channel's profile. The secure, straightforward payment process adds to the convenience, offering peace of mind alongside effective results. This tactic serves as a catalyst for attracting a larger audience, setting the stage for enhanced viewer engagement and retention. Ultimately, buying YouTube subscribers in rupees is a practical, cost-effective strategy for creators to fast-track their channel's growth, widen their reach, and establish a stronger foothold in the competitive landscape of digital content creation.
https://www.buyyoutubesubscribers.in/
ReplyDelete
Replies
RamblingsApril 14, 2024 at 7:40 PM
The pinnacle of Lasik surgery in Delhi is defined by its fusion of cutting-edge technology, exceptional surgeon expertise, and a patient-centered approach. Here, individuals seeking vision correction find a sanctuary of innovation where advanced laser systems ensure precision and efficacy in every procedure. Delhi's clinics pride themselves on offering bespoke treatment plans, crafted after in-depth assessments to guarantee outcomes that transcend expectations. The surgeons, world-renowned for their skill and dedication, commit to patient safety and satisfaction as guiding principles. Additionally, the affordability and transparency of the process demystify the financial aspect, making world-class eye care accessible to all. The swift recovery times herald a rapid return to normalcy, with patients often experiencing a dramatic improvement in vision almost immediately. This blend of technological sophistication, medical excellence, and compassionate care positions Delhi as a leading destination for those in pursuit of the best Lasik surgery experience.
https://www.visualaidscentre.com/
ReplyDelete
Replies
Our travel destinyApril 15, 2024 at 10:19 AM
Buying Real YouTube Views offers a distinctive advantage for content creators focusing on genuine, sustainable channel growth. This approach ensures that each view on your videos comes from a real, active user, which not only increases the view count but also enhances engagement metrics like likes, comments, and shares. Beyond inflating numbers, real views serve to build your video's credibility and authority, making it more attractive to future viewers. This increased interaction signals to YouTube's algorithms that your content is valuable, improving its ranking and visibility on the platform. Furthermore, by fostering a more engaged audience, content creators can develop a loyal viewer base more inclined to subscribe and participate in the community. Implementing this strategy judiciously can propel a channel beyond mere numbers, driving authentic engagement and fostering a profound connection with viewers globally.
https://www.buyyoutubeviewsindia.in/
ReplyDelete
Replies
James L. CumminsApril 15, 2024 at 12:07 PM
The cheapest web hosting services in India are transforming how individuals and small businesses venture into the digital domain. They combine affordability with a rich feature set, including extensive bandwidth, substantial storage, and multi-domain support, to cater to diverse online needs. These services are designed for user convenience, featuring intuitive tools like cPanel for seamless website management, thus removing barriers for those with limited technical skills. Despite their low cost, these hosting packages do not compromise on cybersecurity, offering robust protections against online threats. Their reliability is unmatched, providing consistent uptime to ensure websites are always accessible to visitors. This approach makes web hosting not only economically viable but also a secure and reliable solution for establishing a strong online presence. In essence, the cheapest web hosting options in India are critical enablers for digital entrepreneurship and innovation, making them indispensable in today’s internet-driven world.
https://onohosting.com/
ReplyDelete
Replies
Gourmet GossipApril 20, 2024 at 7:10 PM
Discover the opportunity to work in one of the most highly regarded healthcare systems globally – Singapore. We are actively seeking dedicated Indian nurses to fill key positions within our innovative and compassionate care teams. Successful candidates will gain exposure to leading-edge medical technology and participate in groundbreaking care practices. Enjoy a competitive salary, reflective of Singapore's thriving economic landscape, along with a substantial benefits package emphasizing work-life balance and professional growth. In joining us, you'll immerse yourself in a multicultural environment, thriving on diversity and collaborative excellence. This is not just a chance to advance your career but to contribute significantly to a community that values health and innovation above all. Experience the vibrancy of Singapore's culture and the fulfillment of making a real difference in patient care.
https://dynamichealthstaff.com/nursing-jobs-in-singapore-for-indian-nurses
ReplyDelete
Replies
Food52May 7, 2024 at 7:59 PM
Breast Cancer Oncologists in Ahmedabad stand at the forefront of combating breast cancer, equipped with an extensive array of therapeutic options tailored to each patient's unique circumstance. Their expertise is not just in administering treatments such as chemotherapy and targeted therapy but also in integrating these with the patient's overall care plan, ensuring a holistic approach to healing. These oncologists stay abreast of global advancements in cancer research, bringing innovative and effective treatment modalities to their practice. Their role transcends treatment; they are advocates for patient empowerment, offering guidance, support, and education to patients and their families throughout the cancer journey. By fostering a collaborative environment with a multidisciplinary team, they ensure comprehensive care that addresses all facets of the patient's health and well-being. Their dedication and compassionate care have made them pillars of hope in Ahmedabad's fight against breast cancer, deeply impacting patients' lives by offering not just medical treatment, but also emotional and psychological support.
https://drvirajlavingia.com/
ReplyDelete
Replies
IELTS Online CoachingMay 7, 2024 at 8:18 PM
Breast Cancer Oncologists in Mumbai are distinguished by their comprehensive expertise in medical oncology, specifically in the treatment of breast cancer. These specialists leverage the latest advancements in chemotherapy, targeted therapy, and hormonal treatments to design individualized care plans that maximize patient outcomes. With a profound commitment to research and clinical trials, they are constantly exploring new therapies and strategies to combat breast cancer more effectively. Their collaborative approach ensures that each patient benefits from a multidisciplinary team, integrating the best possible treatment modalities. Beyond their clinical skills, these oncologists are deeply empathetic towards their patients, prioritizing clear communication and compassionate support throughout the treatment journey. They understand the complexities of breast cancer and are dedicated to providing care that addresses not only the physical but also the emotional challenges faced by patients. This dedication to excellence and patient-centered care establishes Breast Cancer Oncologists in Mumbai as leaders in their field, offering hope and advanced therapeutic options to those in their care.
https://drnitanair.com/
ReplyDelete
Replies
Stacy EckardMay 7, 2024 at 8:33 PM
Breast Cancer Oncologists in Gurgaon stand out for their exceptional expertise and compassionate care, making them leaders in the field of oncology. These dedicated professionals employ a comprehensive approach to cancer treatment, combining the latest in medical oncology with targeted therapies and personalized care plans tailored to each patient's unique situation. Their commitment to research and education ensures that patients have access to the most advanced treatment options available. These oncologists work closely with a team of specialists to provide a holistic treatment experience, addressing not just the physical aspects of breast cancer but also the emotional and psychological challenges. Their focus on patient-centered care, coupled with the use of state-of-the-art technology, offers hope and healing to those affected by breast cancer in Gurgaon.
https://www.breastoncosurgery.com/
ReplyDelete
Replies
Dr. AmanMay 21, 2024 at 9:06 AM
A Breast Cancer Surgeon in Pune is a highly specialised medical professional with expertise in the surgical management of breast cancer. They perform a range of operations, from lumpectomies to mastectomies, ensuring the removal of cancerous tissue while preserving as much healthy tissue as possible. These surgeons collaborate with oncologists, radiologists, and reconstructive surgeons to provide comprehensive care. Their approach includes meticulous preoperative planning, precise surgical execution, and attentive postoperative care. Emphasising patient education, they help individuals understand their surgical options and recovery processes. By utilising advanced surgical techniques and adhering to the highest medical standards, Breast Cancer Surgeons in Pune strive to achieve the best possible outcomes for their patients.
https://www.drshonanagbreastcancer.in/
ReplyDelete
Replies
TraveltalesfromindiaMay 26, 2024 at 8:34 PM
When it comes to the best hair loss treatments for men in Australia, several options stand out for their efficacy and reliability. Minoxidil remains a top choice, available over-the-counter in liquid or foam formulations designed to stimulate hair regrowth and increase follicle health. Finasteride, an oral medication, works by blocking the hormone responsible for hair loss and is often prescribed by healthcare professionals. For those looking for a more permanent solution, hair transplant surgery is a popular option, leveraging advanced techniques to restore natural hair density. Laser therapy and Platelet-Rich Plasma (PRP) treatments also offer non-invasive alternatives, promoting hair growth and scalp health. Lastly, natural supplements like biotin and saw palmetto are frequently recommended to support overall hair health.
https://generichealth.com.au/minoxidil-3/
ReplyDelete
Replies
vishalJune 21, 2024 at 8:04 PM
Delhi is home to some of the most advanced LASIK eye surgery centers, offering state-of-the-art procedures to correct vision problems such as nearsightedness, farsightedness, and astigmatism. LASIK, short for Laser-Assisted In Situ Keratomileusis, involves reshaping the cornea using precise laser technology. This outpatient procedure boasts a high success rate with minimal recovery time, allowing patients to return to their daily activities quickly. Renowned eye specialists in Delhi employ advanced diagnostics and customized treatment plans to ensure optimal outcomes. The city’s LASIK clinics are known for their experienced surgeons, cutting-edge equipment, and a strong commitment to patient safety and satisfaction.
https://medium.com/@pojagupta
ReplyDelete
Replies
bloggingbasics101June 21, 2024 at 8:18 PM

LASIK Eye Surgery in Delhi is a premier vision correction procedure renowned for its effectiveness in treating refractive errors such as nearsightedness, farsightedness, and astigmatism. Utilizing advanced laser technology, the procedure reshapes the cornea to enhance visual acuity significantly. This minimally invasive outpatient surgery ensures a swift recovery, allowing patients to return to their daily routines quickly. Delhi's LASIK centers boast state-of-the-art facilities and are staffed by experienced ophthalmologists who provide personalized care. The high success rates and patient satisfaction make LASIK a popular choice for those seeking improved vision.
https://www.linkedin.com/today/author/romila-chaudhary-b2194626
ReplyDelete
Replies
PortmanteauJune 28, 2024 at 9:05 PM
Our Android app development company in Delhi NCR specializes in creating top-notch, bespoke applications designed to align with your specific business objectives. Utilizing the latest technologies and innovative design principles, we deliver user-friendly, scalable, and highly secure Android apps. Our expert team caters to a multitude of industries such as e-commerce, healthcare, and finance, providing versatile and effective solutions. From the initial consultation through to the final deployment, our streamlined development process ensures efficiency and punctual delivery. Client satisfaction drives our mission, strengthened by transparent communication and outstanding service. Partner with us to transform your visionary ideas into thriving Android applications that boost your digital presence and propel business growth.
https://olycoder.com/
ReplyDelete
Replies
Anamika IraniJuly 7, 2024 at 4:00 PM
A monthly investment plan targeting high returns enables disciplined wealth accumulation through consistent investments. Noteworthy options include Systematic Investment Plans (SIPs) in diverse mutual funds, utilizing the benefits of rupee cost averaging and market appreciation. Equity Linked Savings Schemes (ELSS) stand out for offering substantial returns along with tax benefits under Section 80C. Real Estate Investment Trusts (REITs) provide a gateway to high-value real estate investments with manageable entry costs. Additionally, monthly investments in blue-chip stocks can yield significant long-term returns, backed by the stability and growth potential of established companies. Robo-advisors now offer automated, tailored monthly investment plans, optimizing asset allocation based on individual risk profiles. Regulated by the Securities and Exchange Board of India (SEBI), these plans ensure investor protection and market integrity. Through regular contributions and strategic allocation, investors can harness highreturn opportunities and steadily expand their financial portfolios.
https://www.perannum.money/
ReplyDelete
Replies
Gourmet GossipAugust 6, 2024 at 1:05 AM
House Clearance Edinburgh is a trusted service provider specialising in efficient and compassionate clearance solutions for homes across Edinburgh. With a dedicated team of professionals, they handle everything from removing unwanted items to disposing of them responsibly. Whether you are clearing out a single room, an entire house, or preparing for a move, their thorough approach ensures a seamless experience. They offer flexible scheduling, accommodating the specific needs and timelines of their clients. Additionally, House Clearance Edinburgh prioritises environmentally-friendly disposal methods, recycling as much as possible to minimise landfill impact. Their commitment to customer satisfaction and attention to detail make them a reliable choice for all house clearance needs. Trust House Clearance Edinburgh to create a clean, clutter-free space, allowing you to focus on what matters most.
https://eh1-edinburghremovals.co.uk/house-clearance/
ReplyDelete
Replies

About Me

Dec 30, 2013

Hadoop for R's Data scientist

22 comments: