Opened 16 months ago

Last modified 14 months ago

#2 assigned defect

DAV is unreasonably slow

Reported by: mbooth Owned by: caolan
Priority: major Milestone: Ansible Migration Leftovers
Component: service: dav Keywords:
Cc:

Description (last modified by mbooth)

The DAV interface seems quite slow.

Can I assign this to Caolan as king of DAV?

Change History (13)

comment:1 Changed 16 months ago by mbooth

Description: modified (diff)
Owner: changed from somebody to caolan
Status: newassigned

comment:2 Changed 16 months ago by mbooth

Milestone: Ansible Migration Leftovers

comment:3 Changed 16 months ago by mbooth

Component: miscservice: dav

comment:4 Changed 15 months ago by caolan

Here's the crux of it:

<?php

$conn = ldap_connect("ldaps://id.darkpeak.org/");
$basedn = "cn=users,cn=accounts,dc=darkpeak,dc=org";
$filter = "(&(objectClass=person)(uid=caolan))";
$result = ldap_search($conn, $basedn, $filter, array("uid"));
$info = ldap_get_entries($conn, $result);
echo $info[0]['uid'][0] . "\n";

On my development machine (vagrant):

root@stretch:~# time php ldap-test.php 
caolan

real	0m0.153s
user	0m0.020s
sys	0m0.004s

On the production box:

root@edale:~# time php ldap-test.php 
caolan

real	2m10.819s
user	0m0.032s
sys	0m0.008s

comment:5 Changed 15 months ago by caolan

Note that currently the Android client recommended by the wiki (DAVDroid) seems to timeout during authentication. So calendar and contacts sync is effectively broken.

comment:6 Changed 15 months ago by caolan

Using ldap:// instead of ldaps:// does not affect the timing.

comment:7 Changed 15 months ago by caolan

Not sure if this is PHP + LDAP specific or just LDAP yet. All non-affected services use either single-sign-on, SSH keys stored locally, or auth via PAM (effectively the same as single-sign-on).

comment:8 Changed 15 months ago by mbooth

Another thought: SSSD (the security services daemon that is used by SSO and ZNC) could be caching credentials to make SSO appear to be much quicker than going to LDAP direct... We could try clearing the cache to see if that causes SSO to slow down on your next login

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/sssd-cache

Last edited 15 months ago by mbooth (previous) (diff)

comment:9 Changed 15 months ago by caolan

Looks like it's not PHP specific.

On my desktop:

caolan@caolan-desktop:~$ time ldapsearch -h id.darkpeak.org -x -s base "(&(objectClass=person)(uid=caolan))"
# extended LDIF
#
# LDAPv3
# base <> (default) with scope baseObject
# filter: (&(objectClass=person)(uid=caolan))
# requesting: ALL
#

# search result
search: 2
result: 0 Success

# numResponses: 1

real	0m0.048s
user	0m0.004s
sys	0m0.000s

On production:

root@edale:~# time ldapsearch -h id.darkpeak.org -x -s base "(&(objectClass=person)(uid=caolan))"
# extended LDIF
#
# LDAPv3
# base <dc=darkpeak,dc=org> (default) with scope baseObject
# filter: (&(objectClass=person)(uid=caolan))
# requesting: ALL
#

# search result
search: 2
result: 0 Success

# numResponses: 1

real	2m9.914s
user	0m0.008s
sys	0m0.004s

comment:10 Changed 15 months ago by caolan

I can ping id.darkpeak.org on my desktop:

caolan@caolan-desktop:~$ ping id.darkpeak.org
PING id.darkpeak.org (213.138.110.5): 56 data bytes
64 bytes from 213.138.110.5: icmp_seq=0 ttl=56 time=12.737 ms
64 bytes from 213.138.110.5: icmp_seq=1 ttl=56 time=11.765 ms
64 bytes from 213.138.110.5: icmp_seq=2 ttl=56 time=10.787 ms
^C--- id.darkpeak.org ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max/stddev = 10.787/11.763/12.737/0.796 ms

If I try to ping id.darkpeak.org on the production box I don't get any responses:

root@edale:~# ping id.darkpeak.org
PING id.darkpeak.org(id.darkpeak.org (2001:41c8:51:505:fcff:ff:fe00:4489)) 56 data bytes
^C
--- id.darkpeak.org ping statistics ---
303 packets transmitted, 0 received, 100% packet loss, time 309251ms

Looks like id.darkpeak.org is resolving to an IPv6 address on the production box, but not on my local machine, so I tried pinging the IPv4 directly:

root@edale:~# ping 213.138.110.5
PING 213.138.110.5 (213.138.110.5) 56(84) bytes of data.
64 bytes from 213.138.110.5: icmp_seq=1 ttl=62 time=11.8 ms
64 bytes from 213.138.110.5: icmp_seq=2 ttl=62 time=12.0 ms
64 bytes from 213.138.110.5: icmp_seq=3 ttl=62 time=12.7 ms
^C
--- 213.138.110.5 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 11.848/12.240/12.779/0.404 ms

The IPv4 seems to work fine, but the IPv6 address does not respond.

comment:11 Changed 15 months ago by caolan

I've deployed a temporary fix, which uses the IPv4 address directly for LDAP auth instead of id.darkpeak.org (there's a note in the ansible config linking to this ticket).

Caldav/Carddav? and ttrss now all seem to work fine. I'm able to sync calendar and contacts on my phone again.

comment:12 Changed 14 months ago by graenol

We should file a ticket with Bytemark about this (ipv6 timeouts). Who has the permissions to do that?

comment:13 Changed 14 months ago by ejs

We seem to have spun up darkpeak.org and id.darkpeak.org in different cities/datacenters. That can't be all of the problem, but it's something that won't be helping the routing issue

Definitely looks like something to raise with bytemark

Note: See TracTickets for help on using tickets.