Hadoop filesystem api

spark

Hadoop fs api can access the default filesystem like hdfs directly with the xml settings in cluster. It can also access other file systems as long as they are compatible to hdfs such as ADLS.

The key thing to remember is that you need to set some credentials or special settings in hadoop conf to authorize the new namespace. And when you get the filesystem, please use the full qualified path but not the static get method FileSystem.get(conf). This method always accesses the default filesystem linked to the cluster.

// the path is the full qualified path with schema, such as hdfs
val path = "hdfs://tmp/data"
val hadoopConf = spark.sparkContext.hadoopConfiguration
// or
// val hadoopConf = spark.sessionState.newHadoopConf()
val qualifiedPath = new Path(path)
val fs = qualifiedPath.getFileSystem(hadoopConf)
// then u can use fs corresponding to the new namespace right now