Monday, September 7, 2015

Hadoop File Already Exists Exception : org.apache.hadoop.mapred.FileAlreadyExistsException


Hadoop File Already Exists Exception


org.apache.hadoop.mapred.FileAlreadyExistsException



Hello folks!
Aim behind writing this article is to make developers aware about the issue which they might face while developing the MapReduce application. Well the above error "org.apache.hadoop.mapred.FileAlreadyExistsException" is one of the most basic exception which every beginner face while writing their first map reduce program.

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/home/facebook/crawler-output already exists
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:269)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:142)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at com.wagh.wordcountjob.WordCount.main(WordCount.java:68)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Let's start from scratch.

To run a map reduce job you have to write a command similar to below command

 $hadoop jar {name_of_the_jar_file.jar} {package_name_of_jar} {hdfs_file_path_on_which_you_want_to_perform_map_reduce} {output_directory_path}

Example : - hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output

 Just pay attention on the {output_directory_path} i.e. /home/facebook/crawler-output . If you have already created this directory structure in your HDFS than Hadoop EcoSystem will throw the exception "org.apache.hadoop.mapred.FileAlreadyExistsException".

 Solution: - Always specify the output directory name at run time(i.e Hadoop will create the directory automatically for you. You need not to worry about the output directory creation).

 As mentioned in the above example the same command can be run in following manner - "hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output-1"

 So output directory {crawler-output-1} will be created at runtime by Hadoop eco system.

6 comments:

  1. Hi Rahul,

    I am facing same problem with "yarn-cluster", in local it is working fine. In cluster mode, first node works fine but after the execution of first node, other nodes are throwing this exception - FileAlreadyExistsException.

    Any idea about this.

    Thanks,
    Ajeet

    ReplyDelete
  2. Hi Ajeet,

    As per the problem description of yours just check the other nodes whether directory already exist or not. Because of replication directory might exist in the other nodes. Try to remove the directory from other nodes and than specify the output directory at run time. Hope it would solve your problem.

    Regards
    Rahul wagh

    ReplyDelete
  3. Hi!

    Please can you help, you are my only hope...

    I am completely new to this and after having been helped by the above I still can't get the right file in my output. Its driving me crazy! Can you help?

    ReplyDelete
    Replies
    1. Hello Thanks for your comment.

      As per my experience issue might be related to the output file path.
      Always remember never specify the output directory name which you already created in HDFS.

      Example : /your_path/output_directory

      So "output_directory" should never pre exist in HDFS. Just give any random name which comes in your mind at runtime and you will not get exception.

      Let me know your end result

      Regards
      Rahul wagh

      Delete
  4. Thanks for sharing this article.. You may also refer http://www.s4techno.com/blog/2016/07/11/hadoop-administrator-interview-questions/..

    ReplyDelete
  5. Amazing & Great informative blog,it gives very useful practical information to developer like me. Besides that Wisen has established as Best Hibernate Training in Chennai . or learn thru Online Training mode Hibernate Online Training | Java EE Online Training. Nowadays Hibernate ORM has tons of job opportunities on various vertical industry.

    ReplyDelete