Slow LS operation on directories with large number of immediate descendants

Description

Slow LS operation on directories with large number of immediate descendants.

In HDFS large directories are returned to the client in small increments ( default batch size is 1000 ). When the client receives a batch, it sends another 'ls' operation request with an index to indicate the files/directories it has already received. This is suitable for HDFS as it can keep the files in memory.

Currently this batching techniques is poorly ported to HopsFS. In HopsFS every 'ls' request reads all the files from database and then return a small batch (I know this is silly). This is slowing down LS operation in HopsFS. ClusterJ supports 'limits'. We can start with that. We will also have to sort the rows, otherwise ClusterJ will return random rows.

Status

Assignee

Unassigned

Reporter

Salman Niazi

Labels

Fix versions

Affects versions

2.8.2

Priority

Medium