Open Source Programming: Hadoop File Already Exists Exception : org.apache.hadoop.mapred.FileAlreadyExistsException

Monday, September 7, 2015

Hadoop File Already Exists Exception : org.apache.hadoop.mapred.FileAlreadyExistsException

Hadoop File Already Exists Exception

org.apache.hadoop.mapred.FileAlreadyExistsException

Hello folks!
Aim behind writing this article is to make developers aware about the issue which they might face while developing the MapReduce application. Well the above error "org.apache.hadoop.mapred.FileAlreadyExistsException" is one of the most basic exception which every beginner face while writing their first map reduce program.

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/home/facebook/crawler-output already exists
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:269)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:142)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at com.wagh.wordcountjob.WordCount.main(WordCount.java:68)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Let's start from scratch.

To run a map reduce job you have to write a command similar to below command

$hadoop jar {name_of_the_jar_file.jar} {package_name_of_jar} {hdfs_file_path_on_which_you_want_to_perform_map_reduce} {output_directory_path}

Example : - hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output

Just pay attention on the {output_directory_path} i.e. /home/facebook/crawler-output . If you have already created this directory structure in your HDFS than Hadoop EcoSystem will throw the exception "org.apache.hadoop.mapred.FileAlreadyExistsException".

Solution: - Always specify the output directory name at run time(i.e Hadoop will create the directory automatically for you. You need not to worry about the output directory creation).

As mentioned in the above example the same command can be run in following manner - "hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output-1"

So output directory {crawler-output-1} will be created at runtime by Hadoop eco system.

39 comments:

AjeetOctober 20, 2015 at 5:19 AM
Hi Rahul,

I am facing same problem with "yarn-cluster", in local it is working fine. In cluster mode, first node works fine but after the execution of first node, other nodes are throwing this exception - FileAlreadyExistsException.

Any idea about this.

Thanks,
Ajeet
ReplyDelete
Replies
Rahul WaghOctober 21, 2015 at 7:19 AM
Hi Ajeet,

As per the problem description of yours just check the other nodes whether directory already exist or not. Because of replication directory might exist in the other nodes. Try to remove the directory from other nodes and than specify the output directory at run time. Hope it would solve your problem.

Regards
Rahul wagh
ReplyDelete
Replies
DDMarch 10, 2016 at 12:47 PM
Hi!

Please can you help, you are my only hope...

I am completely new to this and after having been helped by the above I still can't get the right file in my output. Its driving me crazy! Can you help?
ReplyDelete
Replies
TejutejuJune 21, 2018 at 9:04 PM
Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking. Big data hadoop online Course

ReplyDelete
Replies
saiDecember 24, 2018 at 10:07 PM
Inspiring writings and I greatly admired what you have to say , I hope you continue to provide new ideas for us all and greetings success always for you..Keep update more information..
python Online training in chennai
python Online training in bangalore
python interview question and answers
ReplyDelete
Replies
priyaJanuary 7, 2019 at 2:34 AM
Wonderful article, very useful and well explanation. Your post is extremely incredible. I will refer this to my candidates...
Data Science training in Chennai
Data science training in Bangalore
Data science training in pune
Data science online training

ReplyDelete
Replies
jefrinApril 3, 2019 at 4:58 AM
Your very own commitment to getting the message throughout came to be rather powerful and have consistently enabled employees just like me to arrive at their desired goals.
Data science Course Training in Chennai | No.1 Data Science Training in Chennai
RPA Course Training in Chennai | No.1 RPA Training in Chennai
AWS Course Training in Chennai | No.1 AWS Training in Chennai
Devops Course Training in Chennai | Best Devops Training in Chennai
Selenium Course Training in Chennai | Best Selenium Training in Chennai
ReplyDelete
Replies
Rainbow Training InstituteApril 9, 2019 at 4:35 AM
Thanks for sharing valuable article having good information and also gain worth-full knowledge.

Oracle ICS Online Training

ReplyDelete
Replies
zaintech99May 11, 2019 at 5:43 AM
I was just browsing through the internet looking for some information and came across your blog. I am impressed by the information that you have on this blog. It shows how well you understand this subject. Bookmarked this page, will come back for more.
data analytics certification courses in Bangalore
ExcelR Data science courses in Bangalore
ReplyDelete
Replies
AadityaMay 11, 2019 at 6:39 AM
The article is actually the best topic about this issue. Great sharing.

ExcelR Data Science Course Bangalore
ReplyDelete
Replies
malaysiaexcelr01May 14, 2019 at 10:13 PM
You completed certain reliable points there. I did a search on the subject and found nearly all persons will agree with your blog.

BIG DATA COURSE MALAYSIA
ReplyDelete
Replies
data science analytics rakshiMay 15, 2019 at 12:12 AM
You might comment on the order system of the blog. You should chat it's splendid. Your blog audit would swell up your visitors. I was very pleased to find this site.I wanted to thank you for this great read!!data science course in dubai
ReplyDelete
Replies
zaintech99June 1, 2019 at 6:00 AM
I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
top 7 best washing machine
ReplyDelete
Replies
i Digital AcademySeptember 5, 2019 at 12:18 AM
Awesome post sir,
really appreciate for your writing. This blog is very much useful...
Hi guyz click here Digital Marketing Course to get the best knowledge and details and also 100% job assistance hurry up... !!

DO NOT MISS THE CHANCE...
ReplyDelete
Replies
JanuMay 30, 2020 at 4:01 AM
Thanks for your informative blog!!! Your article helped me to understand the future of .net programming language. Keep on updating your with such awesome information. .net

Dot Net Training in Chennai | Dot Net Training in anna nagar | Dot Net Training in omr | Dot Net Training in porur | Dot Net Training in tambaram | Dot Net Training in velachery

ReplyDelete
Replies
priyasriJune 28, 2020 at 8:18 PM
I am impressed by the information that you have on this blog. It shows how well you understand this subject.
360digitmg data scientist course online
ReplyDelete
Replies
priyasriJuly 2, 2020 at 2:47 AM
Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
360digitmg artificial intelligence course
ReplyDelete
Replies
RevathiJuly 10, 2020 at 8:18 AM
Wonderful share in this blog!!

Android Training in Chennai | Certification | Mobile App Development Training Online | Android Training in Bangalore | Certification | Mobile App Development Training Online | Android Training in Hyderabad | Certification | Mobile App Development Training Online | Android Training in Coimbatore | Certification | Mobile App Development Training Online | Android Training in Online | Certification | Mobile App Development Training Online
ReplyDelete
Replies
praveenSeptember 5, 2020 at 6:42 AM
I must appreciate you for providing such a valuable content for us. This is one amazing piece of article. Helped a lot in increasing my knowledge
hadoop training in chennai

hadoop training in porur

salesforce training in chennai

salesforce training in porur

c and c plus plus course in chennai

c and c plus plus course in porur

machine learning training in chennai

machine learning training in porur
ReplyDelete
Replies
EXCELROctober 12, 2020 at 3:53 AM
Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. data scientist courses
ReplyDelete
Replies
RohiniNovember 18, 2020 at 9:44 PM
Through this post, I know that your good knowledge in playing with all the pieces was very helpful. I notify that this is the first place where I find issues I've been searching for. You have a clever yet attractive way of writing.
machine learning courses in bangalore
ReplyDelete
Replies
Ramesh SampangiOctober 30, 2021 at 5:42 AM
Informative blog.
Online Python Training in Hyderabad
ReplyDelete
Replies
360DigiTMGMarch 25, 2022 at 1:32 AM
It is the perfect time to make some plans for the future and it is the time to be happy. I've read this post and if I could I would like to suggest some interesting things or suggestions. Perhaps you could write the next articles referring to this article. I want to read more things about it!
data analytics courses in hyderabad with placements
ReplyDelete
Replies
sakshi.gupta.universityOctober 28, 2024 at 4:05 AM
The org.apache.hadoop.mapred.FileAlreadyExistsException occurs when Hadoop tries to write to a file or directory that already exists. This often requires either deleting the existing file or configuring the job to overwrite it.

Data Science Courses in Kolkata
ReplyDelete
Replies
AnonymousOctober 28, 2024 at 5:27 AM
The org.apache.hadoop.mapred.FileAlreadyExistsException indicates that a file or directory you are trying to create already exists in the specified location. To resolve this issue, you can either delete the existing file or configure your job to overwrite the output.

Data Science Courses in Kolkata
ReplyDelete
Replies
Data ScienceNovember 4, 2024 at 6:22 PM
This article addresses the common "FileAlreadyExistsException" error in Hadoop's MapReduce applications, often encountered by beginners. This error arises when the specified output directory already exists in HDFS, causing the job to fail. To avoid this, developers should use a unique output directory name each time they run a MapReduce job, allowing Hadoop to create the directory automatically at runtime. Data science courses in Gurgaon
ReplyDelete
Replies
mjNovember 6, 2024 at 6:04 AM
precise accurate and beautifully written, thanks for sharing
Data science courses in Hyderabad </a
ReplyDelete
Replies
NEHA PATHARENovember 9, 2024 at 12:56 AM
"Your blog is a go-to resource for any Spring developer."
ReplyDelete
Replies
sakshi.gupta.universityNovember 21, 2024 at 11:39 PM
The "FileAlreadyExistsException" in Hadoop occurs when the output directory already exists. Ensure to delete or rename the existing directory before re-running the job, or configure overwrite options in your code.

Data science courses in Pune
ReplyDelete
Replies
LocaXionNovember 22, 2024 at 8:25 AM
This blog explains the common "FileAlreadyExistsException" in Hadoop when running MapReduce jobs. The error occurs if the specified output directory already exists in HDFS. The solution is simple: always specify a unique output directory name at runtime, so Hadoop can create it automatically. For example, using a name like crawler-output-1 ensures the directory is created without conflicts. This approach avoids the exception and keeps your MapReduce jobs running smoothly.
Data science courses in Gujarat
ReplyDelete
Replies
RICHADecember 5, 2024 at 1:43 AM
Explains how to resolve the "File Already Exists" error in Hadoop, covering potential causes and solutions.
Data science courses in the Netherlands
ReplyDelete
Replies
P. Zaheer KhanDecember 6, 2024 at 2:01 AM
Your post on handling FileAlreadyExistsException in Hadoop is very insightful. Well-organized and easy to follow!
Data science Courses in Sydney
ReplyDelete
Replies
IIM Skills Data ScienceDecember 6, 2024 at 8:44 PM
This post on the "Hadoop File Already Exists" exception is a lifesaver! The explanation and solution are clear and easy to follow. Thanks for providing such a practical fix for a common issue!
Data science Courses in Canada
ReplyDelete
Replies
kriti sharmaDecember 10, 2024 at 12:10 AM
This is an excellent troubleshooting guide! The "file already exists" exception in Hadoop can be frustrating, but your clear explanation and solutions will definitely save many developers time. I particularly appreciate the tip on using the -skipTrash option for immediate deletion. This is a great post for anyone working with Hadoop who may encounter this issue.
Data science courses in Glasgow
ReplyDelete
Replies
Abar SinghDecember 10, 2024 at 3:05 AM
Troubleshooting the "File Already Exists" exception in Hadoop with practical solutions.

Data science courses in France

ReplyDelete
Replies
kritishaJanuary 6, 2025 at 9:58 PM
This is a super useful post! I’ve faced the 'file already exists' exception in Hadoop before, and your explanation of how to resolve it was spot on. Thanks for sharing such practical solutions for common issues
Top 10 Digital marketing course in pune
ReplyDelete
Replies
reenaiimskillsJanuary 25, 2025 at 5:30 AM
Inspiring writings and I greatly admired what you have to say , I hope you continue to provide new ideas for us all and greetings success always for you..Keep update more information
top 10 digital marketing agency in delhi
ReplyDelete
Replies
AnjaliJanuary 30, 2025 at 4:17 AM
The detailed explanation of topic discovery using Apache Pig is enlightening. Great job simplifying a complex topic!
digital marketing course in varanasi
ReplyDelete
Replies

Add comment