Databricks workspace

8/14/2023

Databricks workspace

Read Now

Simultaneously, I have also tested extending CIDR for workspace 1 within the same IP range instead of spilling to the next range. I raised a case with Microsoft support to get to know whether this is possible. This is where, I thought, might be changing CIDR for the underlying subnet is not supported, and probably I have to recreate the workspace. After this, I could read the data correctly from data lake using data bricks notebook. I rolled back the subnet changes to workspace 2. This proves that the control plane is unable to communicate to the data plane. As expected, there isn’t any connectivity to the slave nodes. To validate this further, I picked up one of the slave node and tried pinging it from the notebook (which gets initiates from the master node), below is the query to ping slave node from master node through notebook.

I suspected that the NSG created internally by Microsoft in the managed resource group might be the problem. This led me to the assumption that for some reason the control plane is unable to communicate to the data plane. I saw this warning before when the slave is not ready to accept the jobs. WARN TaskSchedulerImpl: Initial job has not accepted any resources check your cluster UI to ensure that workers are registered and have sufficient resources Upon investigating further I don’t see any errors in the cluster logs however I could see the below warning continuously written to the logs. When the data is read(even a small data set), it never completed, and it was running forever. Listing files and folders in the mounted data lake storage works perfectly fine. Cluster started successfully, I could run queries using notebook. I moved workspace 2 to the proposed IP range successfully however I ended up with another issue. I started with workspace 2 as workspace 1 is critical, and I want to avoid taking a chance. Proposed Network Architecture Description If you notice I wanted the IP range 10.1.0.x (256 IP Address space) to be solely used by workspace 1 and 10.1.1.x will be assigned to workspace 2. Necessary access is granted to the service endpoints for all the subnets through the NSG. Since the subnets are part of the same VNET, it uses the same NSG and the firewall rules are not changed.

I was thinking to do below changes to the subnet so that we can expand subnets 1.Īt the time of writing this article, based on the KB article suggests changing VNET is not possible without recreating the workspace however there are no details around changing CIDR, so I assumed I can do changes to underlying subnets. We cannot expand the subnet as the sequence range is already assigned to private and public subnet 2 for workspace 2. This is where we wanted to expand the number of IP’s in private and public subnets 1. When data bricks cluster tries to scale up it will end up with an error message saying not enough IP addresses exists. Since we are tied up directly to the number of IP ranges in the attached subnet, we cannot exceed more than 57 nodes. Now we are in a situation where workspace 1 is quite extensively used with large data sets and requires more nodes to process the data quickly. This directly relates to the underlying clusters provisioned by the workspace which means we can only provision up to 57 nodes/clusters for that workspace.ĭon’t ask me why this was set up to 64 initially? This was set up a while ago and let’s not step into that. In which 5 will be reserved for internal usage and only 57 IP spaces are allowed to use per subnet. The problem here is that for workspace 1 and 2 only 64 IP addresses are allocated.

0 Comments

Databricks workspace

Leave a Reply.

Author

Archives

Categories