Monday, September 7, 2015

Hadoop File Already Exists Exception : org.apache.hadoop.mapred.FileAlreadyExistsException


Hadoop File Already Exists Exception


org.apache.hadoop.mapred.FileAlreadyExistsException



Hello folks!
Aim behind writing this article is to make developers aware about the issue which they might face while developing the MapReduce application. Well the above error "org.apache.hadoop.mapred.FileAlreadyExistsException" is one of the most basic exception which every beginner face while writing their first map reduce program.

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/home/facebook/crawler-output already exists
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:269)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:142)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at com.wagh.wordcountjob.WordCount.main(WordCount.java:68)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Let's start from scratch.

To run a map reduce job you have to write a command similar to below command

 $hadoop jar {name_of_the_jar_file.jar} {package_name_of_jar} {hdfs_file_path_on_which_you_want_to_perform_map_reduce} {output_directory_path}

Example : - hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output

 Just pay attention on the {output_directory_path} i.e. /home/facebook/crawler-output . If you have already created this directory structure in your HDFS than Hadoop EcoSystem will throw the exception "org.apache.hadoop.mapred.FileAlreadyExistsException".

 Solution: - Always specify the output directory name at run time(i.e Hadoop will create the directory automatically for you. You need not to worry about the output directory creation).

 As mentioned in the above example the same command can be run in following manner - "hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output-1"

 So output directory {crawler-output-1} will be created at runtime by Hadoop eco system.

29 comments:

  1. Hi Rahul,

    I am facing same problem with "yarn-cluster", in local it is working fine. In cluster mode, first node works fine but after the execution of first node, other nodes are throwing this exception - FileAlreadyExistsException.

    Any idea about this.

    Thanks,
    Ajeet

    ReplyDelete
  2. Hi Ajeet,

    As per the problem description of yours just check the other nodes whether directory already exist or not. Because of replication directory might exist in the other nodes. Try to remove the directory from other nodes and than specify the output directory at run time. Hope it would solve your problem.

    Regards
    Rahul wagh

    ReplyDelete
  3. Hi!

    Please can you help, you are my only hope...

    I am completely new to this and after having been helped by the above I still can't get the right file in my output. Its driving me crazy! Can you help?

    ReplyDelete
    Replies
    1. Hello Thanks for your comment.

      As per my experience issue might be related to the output file path.
      Always remember never specify the output directory name which you already created in HDFS.

      Example : /your_path/output_directory

      So "output_directory" should never pre exist in HDFS. Just give any random name which comes in your mind at runtime and you will not get exception.

      Let me know your end result

      Regards
      Rahul wagh

      Delete
  4. Thanks for sharing this article.. You may also refer http://www.s4techno.com/blog/2016/07/11/hadoop-administrator-interview-questions/..

    ReplyDelete
  5. Amazing & Great informative blog,it gives very useful practical information to developer like me. Besides that Wisen has established as Best Hibernate Training in Chennai . or learn thru Online Training mode Hibernate Online Training | Java EE Online Training. Nowadays Hibernate ORM has tons of job opportunities on various vertical industry.

    ReplyDelete
  6. Excellent article. Very interesting to read. I really love to read such a nice article. Thanks! keep rocking. Big data hadoop online Course

    ReplyDelete
  7. Inspiring writings and I greatly admired what you have to say , I hope you continue to provide new ideas for us all and greetings success always for you..Keep update more information..
    python Online training in chennai
    python Online training in bangalore
    python interview question and answers

    ReplyDelete
  8. Wonderful article, very useful and well explanation. Your post is extremely incredible. I will refer this to my candidates...
    Data Science training in Chennai
    Data science training in Bangalore
    Data science training in pune
    Data science online training

    ReplyDelete
  9. I found your blog while searching for the updates, I am happy to be here. Very useful content and also easily understandable providing.. Believe me I did wrote an post about tutorials for beginners with reference of your blog. 
    rpa training in bangalore
    best rpa training in bangalore
    rpa training in pune | rpa course in bangalore
    rpa training in chennai

    ReplyDelete
  10. Thanks for sharing valuable article having good information and also gain worth-full knowledge.

    Oracle ICS Online Training



    ReplyDelete
  11. Thank you for sharing your awesome and valuable article this is the best blog for the students they can also learn.

    Workday Online Training

    ReplyDelete
  12. I was just browsing through the internet looking for some information and came across your blog. I am impressed by the information that you have on this blog. It shows how well you understand this subject. Bookmarked this page, will come back for more.
    data analytics certification courses in Bangalore
    ExcelR Data science courses in Bangalore

    ReplyDelete
  13. The article is actually the best topic about this issue. Great sharing.

    ExcelR Data Science Course Bangalore

    ReplyDelete
  14. You completed certain reliable points there. I did a search on the subject and found nearly all persons will agree with your blog.



    BIG DATA COURSE MALAYSIA

    ReplyDelete
  15. You might comment on the order system of the blog. You should chat it's splendid. Your blog audit would swell up your visitors. I was very pleased to find this site.I wanted to thank you for this great read!!data science course in dubai

    ReplyDelete
  16. Mmm.. good to be here in your article or post, whatever, I think I should also work hard for my own website like I see some good and updated working in your site.
    Data Science Course in Pune

    ReplyDelete
  17. I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
    top 7 best washing machine

    ReplyDelete


  18. This is a wonderful article, Given so much info in it, These type of articles keeps the users interest in the website, and keep on sharing more ... good luck.
    www.technewworld.in
    How to Start A blog 2019
    Eid AL ADHA

    ReplyDelete
  19. Pretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and I hope you post again soon.
    Data Science Courses

    ReplyDelete
  20. Awesome post sir,
    really appreciate for your writing. This blog is very much useful...
    Hi guyz click here Digital Marketing Course to get the best knowledge and details and also 100% job assistance hurry up... !!

    DO NOT MISS THE CHANCE...

    ReplyDelete
  21. Existing without the answers to the difficulties you’ve sorted out through this guide is a critical case, as well as the kind which could have badly affected my entire career if I had not discovered your website.

    Best PHP Training Institute in Chennai|PHP Course in chennai
    Best .Net Training Institute in Chennai
    Dotnet Training in Chennai
    Dotnet Training in Chennai

    ReplyDelete
  22. Really thanks for sharing...This blog is awesome very informative..
    ExcelR Machine Learning

    ReplyDelete
  23. Wow its a very good post. The information provided by you is really very good and helpful for me. Keep sharing good information.

    Best Training Institute in Bangalore BTM. My Class Training Bangalore training center for certified course, learning on Software Training Course by expert faculties, also provides job placement for fresher, experience job seekers.
    Software Training Institute in Bangalore

    ReplyDelete
  24. Really very happy to say, your post is very interesting to read. I never stop myself to say something about it. You’re doing a great job. Keep it up...

    Learn Hadoop Training from the Industry Experts we bridge the gap between the need of the industry. Softgen Infotech provide the Best Hadoop Training in Bangalore with 100% Placement Assistance. Book a Free Demo Today.
    Big Data Analytics Training in Bangalore
    Tableau Training in Bangalore
    Data Science Training in Bangalore
    Workday Training in Bangalore

    ReplyDelete
  25. Nice blog,I understood the topic very clearly,And want to study more like this.
    Data Scientist Course

    ReplyDelete
  26. This is an excellent post I seen thanks to share it. It is really what I wanted to see hope in future you will continue for sharing such a excellent post.
    ExcelR Data Analytics Course in Pune

    ReplyDelete