Monday, September 7, 2015

Hadoop File Already Exists Exception : org.apache.hadoop.mapred.FileAlreadyExistsException

Hadoop File Already Exists Exception


Hello folks!
Aim behind writing this article is to make developers aware about the issue which they might face while developing the MapReduce application. Well the above error "org.apache.hadoop.mapred.FileAlreadyExistsException" is one of the most basic exception which every beginner face while writing their first map reduce program.

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/home/facebook/crawler-output already exists
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(
    at org.apache.hadoop.mapreduce.Job$
    at org.apache.hadoop.mapreduce.Job$
    at Method)
    at org.apache.hadoop.mapreduce.Job.submit(
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(
    at com.wagh.wordcountjob.WordCount.main(
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(
    at java.lang.reflect.Method.invoke(
    at org.apache.hadoop.util.RunJar.main(

Let's start from scratch.

To run a map reduce job you have to write a command similar to below command

 $hadoop jar {name_of_the_jar_file.jar} {package_name_of_jar} {hdfs_file_path_on_which_you_want_to_perform_map_reduce} {output_directory_path}

Example : - hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output

 Just pay attention on the {output_directory_path} i.e. /home/facebook/crawler-output . If you have already created this directory structure in your HDFS than Hadoop EcoSystem will throw the exception "org.apache.hadoop.mapred.FileAlreadyExistsException".

 Solution: - Always specify the output directory name at run time(i.e Hadoop will create the directory automatically for you. You need not to worry about the output directory creation).

 As mentioned in the above example the same command can be run in following manner - "hadoop jar facebookCrawler.jar com.wagh.wordcountjob.WordCount /home/facebook/facebook-cocacola-page.txt /home/facebook/crawler-output-1"

 So output directory {crawler-output-1} will be created at runtime by Hadoop eco system.