Examples for GAVO Data Center TAP service

Query against coverage
Crossmatch with proper motions
Crossmatch for a guide star
Tricking the query planner
Filtering by non-match
Dissecting combined flags
Query against boolean columns
Correct magnitudes for extinction use ev_i
Matching neutrino search areas with normal tables
Using CTEs to test queries on large tables

Query against coverage

When querying against geometric columns, in particular coverage, use ADQL's contains or intersect functions, like this:

SELECT accref, seeing
  FROM cars.images
      WHERE 1=INTERSECTS(coverage, circle('ICRS', 34, -4, 2))
      ORDER BY seeing

Of course, this concerns all SIAP and SSAP tables (↗cars.images only standing as an example here) as well as ↗ivoa.obscore.

Crossmatch with proper motions

When you epoch-propagate positions, any positional indexes that are on the table will become useless, and that will mean slow crossmatches. To use indexes anyway, so a two-step operation where you do a rough match in the first step and do the narrow comparison with proper motions only in a second step.

This is particularly important when there is a large (larger than a million rows, say) table in the operation. Say you want to compare LSPM proper motions with the ones in Gaia, and you want to use Gaia's superior proper motions to do precise matches at LSPM's epoch, J2000. You could then write:

SELECT TOP 50 l.id, l.pmra as lpmra, l.pmde as lpmde,
  g.source_id, g.pmra as gpmra, g.pmdec as gpmde
FROM
  lspm.main as l
  JOIN gaia.dr3lite AS g
  ON (DISTANCE(g.ra, g.dec, l.raj2000, l.dej2000)<0.01) -- rough pre-selection
WHERE
  DISTANCE(
    ivo_epoch_prop_pos(
      g.ra, g.dec, g.parallax,
      g.pmra, g.pmdec, g.radial_velocity,
      2016, 2000),
    POINT(l.raj2000, l.dej2000)
  )<0.0002                            -- fine selection with PMs

Crossmatch for a guide star

(This contains some tutorial-style material)

Suppose you have developed an adaptive optics instrument that you want to use to observe radio-loud quasars; you don't really care which one, but you're sitting on the south pole, and you'll need a guide star, and it'd be nice if the object were in the redshift range of 0.5 to 1.

Say you have located our little list of QSOs ↗veronqsos.data (of course, you could use ↗sdssdr7.sources about as well, but that doesn't give radio information as readily), e.g., via the VO registry as queriable in WIRR.

You see there's a column specifying whether these quasars are detected in radio; also, you want the thing to be far enough south, and you specify the redshift:

SELECT TOP 100 raj2000, dej2000, name, z
      FROM veronqsos.data
  WHERE notRadio!='*'
  AND z BETWEEN 0.5 AND 1
  AND dej2000<-40

The TOP 100 here says that we only want to see 100 items. Writing this is generally a good idea when you do not know how much to expect. Our service will happily serve you millions of rows, and your browser may not like that, and of course it's nice not to put unnecessary load on our servers. However, without any TOP, our server will insert a TOP 3000 for you.

Our conditions are fairly straightforward. You can use the usual operators (where equality is "=" rather than C's "=="; SQL doesn't have any assignments in the C sense, so the equality sign isn't used otherwise). SQL has some nice additional operators like the "BETWEEN ... AND ..." shown above.

Now, if you actually run this query, you will get 100 rows; there even is a warning that your query limit kicked it, but you will not usually see it in most VOTable clients. You thus typically need to be on your ward yourself. In this case, it would be safe to run without a TOP, or even a TOP 1000000. The query above results in 422 rows, which is still convenient to display.

Now, which of these objects have a "good" guide star? Say our device works best of guide stars in the magnitude range of 10 to 11 in V, and the guide star should be no farther away than 0.3 degrees. Consulting GloTS or the registry, you may come up with ↗ppmx.data. What you need is a crossmatch of PPMX with the little catalogue of QSOs relevant to you generated from the query above.

In ADQL's lingo, a crossmatch could look like this:

SELECT q.name, q.raj2000, q.dej2000, p.alphaFloat, p.deltaFloat FROM (
  SELECT TOP 100 raj2000, dej2000, name, z FROM veronqsos.data
    WHERE notRadio!='*'
    AND z BETWEEN 0.5 AND 1
    AND dej2000<-40) AS q JOIN
  ppmx.data AS p ON (1=CONTAINS(
    POINT('', q.raj2000, q.dej2000),
    CIRCLE('', p.alphaFloat, p.deltaFloat, 0.3)))

Note that most of the mess in here is the query for the QSOs we did above. Queries can usually stand in for wherever tables can stand in ADQL. You always need an AS clause to give the subquery a name, though.

The main new point here is the join, which basically means "bring together two tables". Now, a table in SQL is a set of tuples. When you have two sets of tuples, there are various ways to bring them together -- you can build the (flattened) cartesian product of the two (usually resulting in a huge set), you can stick together randomly drawn tuples, etc.

Most of these operations are supported by SQL's (and hence ADQL's) JOIN. The pattern above, however, is what you want for crossmatches: You write down the two tables, giving the aliases (with AS) for convenience and then join them. This happens by writing JOIN between the two table specifications and then giving a condition in parentheses after an ON behind the last table.

For crossmatching, this boils down to the ADQL CONTAINS function operating on an ADQL POINT and and ADQL CIRCLE, made up from the coordinates relevant to you. The radius of the circle is given in degrees; most of ADQL is leaning towards degrees, but not the trigonometric functions, which work in radians. CONTAINS is a numeric function, returning 1 when the point in the first argument is within the circle in the second argument, 0 otherwise.

Points and circles are constructed with a coordinate system specification in the first argument. The current ADQL implementation largely ignores this specification, so you could in principle put there whatever you like.

In the example above, we used qualified names, i.e., names of the form <table>.<column>. If a column name is unique, you can leave the qualification out, i.e., you could have written:

SELECT name, raj200, dej2000, alphaFloat, deltaFloat...

above.

The result of the above query is a list of 3428 positions of quasars and possible guide stars of any magnitude. To select only guide stars with, you could filter the results after the selection by appending something like WHERE vmag BETWEEN 10 AND 11. Equivalently, you could add the condition to the selection from PPMX, like this:

SELECT q.name, q.raj2000, q.dej2000, p.alphaFloat, p.deltaFloat FROM (
  SELECT TOP 100 raj2000, dej2000, name, z FROM veronqsos.data
    WHERE notRadio!='*'
    AND z BETWEEN 0.5 AND 1
    AND dej2000<-40) AS q
  JOIN (
  SELECT * FROM ppmx.data WHERE vmag BETWEEN 10 AND 11) AS p
  ON (1=CONTAINS(
    POINT('', q.raj2000, q.dej2000),
    CIRCLE('', p.alphaFloat, p.deltaFloat, 0.3)))

However, both of these queries will probably time out on you. Our system will kill queries coming from the web after 15 seconds and tell you that your query timed out. In that case, it may be worthwhile to try and reformulate it. Otherwise, just contact us and we will figure out some way to get your query to execute, possibly by adding more indices to our tables. In particular, any query on large-ish data sets (like the PPMX) not using at least one condition on a column with an index is bound to time out. Columns that are parts of indices are highlighted in the table descriptions.

It may not be obvious why adding the WHERE clause above should hurt so badly here, since the database would only have to check a couple of thousand rows, and that's a breeze for a modern computer. However, database engines contain a component called a query planner that should reduce all equivalent queries to the same, optimal form. In reality, this doesn't always work very well, which isn't surprising when you think about the amount of information required to find the optimal sequence of operations to a given result. This means that the machine might completely mess up your query, and that is what happens in this case.

There is a common workaround in SQL, known as the "OFFSET 0" trick; this is not possible in ADQL since its syntax doesn't allow this. As a workaround, you can say SELECT ALL, which internally does the same thing (of course, it's not nice to overload a no-op with an almost-no-op). The downside is that you need one more query level. The result then is:

SELECT * FROM (
  SELECT ALL
    q.name,
    q.raj2000,
    q.dej2000,
    p.alphaFloat,
    p.deltaFloat,
    p.vmag FROM (
      SELECT TOP 100 raj2000, dej2000, name, z
      FROM veronqsos.data
      WHERE notRadio!='*'
        AND z BETWEEN 0.5 AND 1
        AND dej2000<-40) AS q
      JOIN ppmx.data AS p ON (1=CONTAINS(
        POINT('', p.alphaFloat, p.deltaFloat),
        CIRCLE('', q.raj2000, q.dej2000, 0.3)))) AS f
WHERE vmag BETWEEN 10 and 11

It may look a bit daunting, but it really built up from simple queries, and it can be taken apart to reveal the simple building blocks.

Tricking the query planner

Because if current limitations, the postgres query planner easily gets confused when it is asked to work out how to do crossmatches, in particular when views (“virtual tables“) are involved. This concerns tables like ↗gedr3dist.litewithdist or ↗gedr3mock.main.

In such cases, for the time being you have to “unwrap” the view and use its constituent tables (which usually are in the same schema), and then force the query planner to do your crossmatch first. The simplest way to force it is by using WITH subclauses (also known as common table expressions or CTEs). For instance, a match with an uploaded table like:

SELECT hipno, source_id, r_med_photogeo
FROM gedr3dist.litewithdist AS db
JOIN TAP_UPLOAD.t1 AS tc
ON distance(db.ra, db.dec, tc.raj2000, tc.dej2000)<1./3600.

(t1 would have to have raj2000, dej2000, and hipno columns; use fk6.fk6join if you want to experiment) will be excruciatingly slow, as the planner deems it's faster to first join the huge tables making up litewithdist.

To fix this, force postgres to do the fast and small join of your uploaded table with the gaia catalogue first and only then join the distances in. In this case, that is:

WITH innerresult AS (
      SELECT hipno, source_id
      FROM gaia.edr3lite AS db
      JOIN TAP_UPLOAD.t1 AS tc
      ON distance(db.ra, db.dec, tc.raj2000, tc.dej2000)<1./3600.)

SELECT innerresult.*, r_med_photogeo
FROM innerresult
JOIN gedr3dist.main
USING (source_id)

Apologies for the complication; this is not easy to fix in the underlying software.

If unsure about the original join, feel free to contact the operators.

Filtering by non-match

This query demonstrates how to filter objects in one table using another one; in this case, we filter objects with variable proper motion (due to non-resolved duplicity) from a field of PPMXL. Cf. ↗dmubin.main, ↗ppmxl.main:

select *
from (select *
  from ppmxl.main
  where
    1=contains(point('ICRS', raj2000, dej2000),
      circle('ICRS', 121, 12, 0.3))) as q
where not exists (
  select * from dmubin.main as d
  where
    1=contains(point('ICRS', d.raj2000, d.dej2000),
      circle('ICRS',q.raj2000, q.dej2000, 0.001)))

Dissecting combined flags

This example shows how to decode combined flags (i.e., flags-like numbers in which digits (or groups of digits) need to be extracted to allow interpretation. This is common practice in many historical tables; in the data center, it is rampart within ↗arigfh.id and the related tables. The principle here is to use the mod function (which, in a pinch, can also help with binary, rather than decimal, multiflags):

SELECT decCat, raj2000, dej2000, epDec, eqDec
FROM arigfh.id
WHERE 4=MOD(decflags/10000, 10)

Query against boolean columns

Regrettably, ADQL has no notion of boolean values. Some tables this service exposes to TAP -- e.g., ↗amanda.nucand, have boolean columns natively, and we dare give boolean as a datatype for those in the table metadata. To query those, you cannot say WHERE boolCol or somesuch as in SQL dialects having native booleans. You must compare against something, and at least on this server, it has to be 'True' or 'False' as string literals, as in

SELECT * FROM amanda.nucand WHERE atmonusubset='True'

Correct magnitudes for extinction use ev_i

In our ↗mcextinct.exts table we list extinctions within the magellanic clouds. To apply them, use TAP. For instance, to correct visual magnitudes for extinctions, say:

SELECT vmag-ev_i*2.4 as vmagcorr, mine.* FROM
        mcextinct.exts AS exts
        JOIN tap_upload.t1 AS mine
        ON (1=CONTAINS(POINT('ICRS', mine.raj2000,
                mine.dej2000),exts.bbox))

Here's a table adapted from 1998ApJ...500..525S that gives A/E(V-I) for some common filters:

Filter	name	λ_eff	A/A(V)	A/E(B-V)	A/E(V-I)
Landolt	U	3372	1.66	5.43	3.94
Landolt	B	4404	1.32	4.32	3.13
Landolt	V	5428	1.02	3.32	2.4
Landolt	R	6509	0.82	2.67	1.94
Landolt	I	8090	0.59	1.94	1.41
Gunn	g	5244	1.07	3.48	2.52
Gunn	r	6707	0.79	2.59	1.88
Gunn	i	7985	0.61	1.99	1.44
Gunn	z	9055	0.47	1.54	1.12
Stromgren	u	3502	1.6	5.23	3.79
Stromgren	b	4676	1.24	4.05	2.93
Stromgren	v	4127	1.39	4.55	3.3
Stromgren	beta	4861	1.18	3.86	2.8
Stromgren	y	5479	1	3.28	2.37
Sloan	u'	3546	1.58	5.16	3.74
Sloan	g'	4925	1.16	3.79	2.75
Sloan	r'	6335	0.84	2.75	1.99
Sloan	i'	7799	0.64	2.09	1.51
Sloan	z'	9294	0.45	1.48	1.07

Matching neutrino search areas with normal tables

Some tables, for instance ↗antares10.data, have columns containing geometries (circles, polygons, etc.); in this case, this is because the positional uncertainty for neutrino observatories is so large.

Let's say you want to try some statistics with stars with "odd" spectra, perhaps those with RAVE (↗rave.main) estimated RVs over 500 km/s. Here's how to do it:

SELECT raveid, n_hits, rv
FROM rave.main as r
JOIN antares10.data
ON (1=CONTAINS(POINT('', r.raj2000, r.dej2000), origin_est))
WHERE ABS(rv)>500

A similar, perhaps more sensible, but also longer-running query is given in in the resource's documentation.

Using CTEs to test queries on large tables

Queries that touch a large part of the rows present will take a long time on large tables (typically several 10s of minutes or an hour per billion rows). Please only run such queries if you have previously validated on a smaller subset that such queries actually do what you expect. A good way to do that is with Common Table Expressions (CTEs). With these, you define temporary, named tables that can then be used as if they were normal tables. For instance:

WITH sample AS (
        SELECT * FROM gdr2mock.main
        WHERE distance(ra, dec, 66.73, 75.87)<1)
SELECT
        ROUND(teff_val*0.01)/0.01 AS bin, avg(1/parallax) AS meandistance
FROM sample
GROUP BY bin

Note that CTEs are a planner fence, i.e., the query planner will usually actually execute the CTEs rather than mangle the relational expressions in order to save work. This means that when the query is ready for application on the whole table, you should normally remove the CTE and search "sample" in the remaining query to replace it with (in this case) "gdr2mock.main". This will (ideally) let the database find the optimal way to execute the query.

The technique is particularly recommended on tables like ↗califadr3.fluxposv1200, ↗califadr3.fluxposv500, ↗gaia.edr3lite, ↗gdr2mock.main, ↗gps1.main, ↗hsoy.main, ↗ppmxl.main, ↗sdssdr7.sources, ↗supercosmos.sources, ↗twomass.data, ↗ucac5.main, ↗usnob.data, ↗wise.main).

Note that CTEs are not yet available on all TAP services. On services that do not have them, you can substitute them with subqueries in many situations.

Metadata

Examples

Identifier

Cite this

Description

Keywords

Creator

Created

Data updated

Metadata updated

Reference URL