Examples for GAVO Data Center TAP service

Crossmatch with proper motions

When you epoch-propagate positions, any positional indexes that are on the table will become useless, and that will mean slow crossmatches. To use indexes anyway, so a two-step operation where you do a rough match in the first step and do the narrow comparison with proper motions only in a second step.

This is particularly important when there is a large (larger than a million rows, say) table in the operation. Say you want to compare LSPM proper motions with the ones in Gaia, and you want to use Gaia's superior proper motions to do precise matches at LSPM's epoch, J2000. You could then write:

SELECT TOP 50 l.id, l.pmra as lpmra, l.pmde as lpmde,
  g.source_id, g.pmra as gpmra, g.pmdec as gpmde
FROM
  lspm.main as l
  JOIN gaia.dr3lite AS g
  ON (DISTANCE(g.ra, g.dec, l.raj2000, l.dej2000)<0.01) -- rough pre-selection
WHERE
  DISTANCE(
    ivo_epoch_prop_pos(
      g.ra, g.dec, g.parallax,
      g.pmra, g.pmdec, g.radial_velocity,
      2016, 2000),
    POINT(l.raj2000, l.dej2000)
  )<0.0002                            -- fine selection with PMs

Query against boolean columns

Regrettably, ADQL has no notion of boolean values. Some tables this service exposes to TAP -- e.g., amanda.nucand, have boolean columns natively, and we dare give boolean as a datatype for those in the table metadata. To query those, you cannot say WHERE boolCol or somesuch as in SQL dialects having native booleans. You must compare against something, and at least on this server, it has to be 'True' or 'False' as string literals, as in

SELECT * FROM amanda.nucand WHERE atmonusubset='True'

Query for CALIFA object properties

This example shows how to combine the califadr3.objects table of properties of CALIFA target galaxies with the califadr3.cubes table of data cubes to select bright, early-type spirals:

SELECT target_name, accref, hubtyp, magg
FROM califadr3.cubes
  NATURAL JOIN califadr3.objects
WHERE hubtyp in ('S d', 'S cd', 'S c')
  AND magg<13

Make a color map from CALIFA cubes

Here is a query showing how to use the califadr3.fluxv500 (or califadr3.fluxv1200) tables to obtain maps of color indices:

SELECT TOP 20000 xindex, yindex, flux3807-flux3839 as color
FROM (
SELECT califaid, xindex, yindex, flux as flux3807
FROM califadr3.fluxv500
WHERE lambda=3807) AS t1
NATURAL JOIN (
SELECT califaid, xindex, yindex, flux as flux3839
FROM califadr3.fluxv500
WHERE lambda=3839) AS t2
WHERE califaid=124

Here, we (virtually) create two tables that have single-lambda maps of fluxes for a wavelength each. Note how we use column aliasing here to allow natural joining the tables and to have a nice expression in the select clause.

Note that it's fairly easy to turn this into "broadband" colors; you'll just have to adjust the inner selects, e.g., like this:

SELECT obsid, SUM(flux)/count(*) AS flux3839
  FROM califadr3.fluxv500
  WHERE lambda BETWEEN 3850 AND 3900
  GROUP BY obsid

Note that queries like this can run for quite a while, which means you probably will want to run them in async mode. Further note that NULL values in flux tend to make such images "spotty". To make things more robust, you could filter by WHERE flux IS NOT NULL in yet another inner select.

The result is in pixel coordinates. To turn them into real coordinates, you can use califadr3.spectra, which as physical coordinates for triples of xindex, yindex, and califaid; while we're at it, we add links to the spectra, too, so you can easily check out interesting points:

select raj2000, dej2000, color, accref
from califadr3.spectra
join tap_upload.t1
using (xindex, yindex)
where califaid=124

We've worked with califaids here, because they're more natural for these low-level tables. To turn object names or positions to califaids, use califadr3.objects.

Query against coverage

When querying against geometric columns, in particular coverage, use ADQL's contains or intersect functions, like this:

SELECT accref, seeing
  FROM cars.images
      WHERE 1=INTERSECTS(coverage, circle('ICRS', 34, -4, 2))
      ORDER BY seeing

Of course, this concerns all SIAP and SSAP tables (cars.images only standing as an example here) as well as ivoa.obscore.

Crossmatch for a guide star

(This contains some tutorial-style material)

Suppose you have developed an adaptive optics instrument that you want to use to observe radio-loud quasars; you don't really care which one, but you're sitting on the south pole, and you'll need a guide star, and it'd be nice if the object were in the redshift range of 0.5 to 1.

Say you have located our little list of QSOs veronqsos.data (of course, you could use sdssdr7.sources about as well, but that doesn't give radio information as readily), e.g., via the VO registry as queriable in WIRR.

You see there's a column specifying whether these quasars are detected in radio; also, you want the thing to be far enough south, and you specify the redshift:

SELECT TOP 100 raj2000, dej2000, name, z
      FROM veronqsos.data
  WHERE notRadio!='*'
  AND z BETWEEN 0.5 AND 1
  AND dej2000<-40

The TOP 100 here says that we only want to see 100 items. Writing this is generally a good idea when you do not know how much to expect. Our service will happily serve you millions of rows, and your browser may not like that, and of course it's nice not to put unnecessary load on our servers. However, without any TOP, our server will insert a TOP 3000 for you.

Our conditions are fairly straightforward. You can use the usual operators (where equality is "=" rather than C's "=="; SQL doesn't have any assignments in the C sense, so the equality sign isn't used otherwise). SQL has some nice additional operators like the "BETWEEN ... AND ..." shown above.

Now, if you actually run this query, you will get 100 rows; there even is a warning that your query limit kicked it, but you will not usually see it in most VOTable clients. You thus typically need to be on your ward yourself. In this case, it would be safe to run without a TOP, or even a TOP 1000000. The query above results in 422 rows, which is still convenient to display.

Now, which of these objects have a "good" guide star? Say our device works best of guide stars in the magnitude range of 10 to 11 in V, and the guide star should be no farther away than 0.3 degrees. Consulting GloTS or the registry, you may come up with ppmx.data. What you need is a crossmatch of PPMX with the little catalogue of QSOs relevant to you generated from the query above.

In ADQL's lingo, a crossmatch could look like this:

SELECT q.name, q.raj2000, q.dej2000, p.alphaFloat, p.deltaFloat FROM (
  SELECT TOP 100 raj2000, dej2000, name, z FROM veronqsos.data
    WHERE notRadio!='*'
    AND z BETWEEN 0.5 AND 1
    AND dej2000<-40) AS q JOIN
  ppmx.data AS p ON (1=CONTAINS(
    POINT('', q.raj2000, q.dej2000),
    CIRCLE('', p.alphaFloat, p.deltaFloat, 0.3)))

Note that most of the mess in here is the query for the QSOs we did above. Queries can usually stand in for wherever tables can stand in ADQL. You always need an AS clause to give the subquery a name, though.

The main new point here is the join, which basically means "bring together two tables". Now, a table in SQL is a set of tuples. When you have two sets of tuples, there are various ways to bring them together -- you can build the (flattened) cartesian product of the two (usually resulting in a huge set), you can stick together randomly drawn tuples, etc.

Most of these operations are supported by SQL's (and hence ADQL's) JOIN. The pattern above, however, is what you want for crossmatches: You write down the two tables, giving the aliases (with AS) for convenience and then join them. This happens by writing JOIN between the two table specifications and then giving a condition in parentheses after an ON behind the last table.

For crossmatching, this boils down to the ADQL CONTAINS function operating on an ADQL POINT and and ADQL CIRCLE, made up from the coordinates relevant to you. The radius of the circle is given in degrees; most of ADQL is leaning towards degrees, but not the trigonometric functions, which work in radians. CONTAINS is a numeric function, returning 1 when the point in the first argument is within the circle in the second argument, 0 otherwise.

Points and circles are constructed with a coordinate system specification in the first argument. The current ADQL implementation largely ignores this specification, so you could in principle put there whatever you like.

In the example above, we used qualified names, i.e., names of the form <table>.<column>. If a column name is unique, you can leave the qualification out, i.e., you could have written:

SELECT name, raj200, dej2000, alphaFloat, deltaFloat...

above.

The result of the above query is a list of 3428 positions of quasars and possible guide stars of any magnitude. To select only guide stars with, you could filter the results after the selection by appending something like WHERE vmag BETWEEN 10 AND 11. Equivalently, you could add the condition to the selection from PPMX, like this:

SELECT q.name, q.raj2000, q.dej2000, p.alphaFloat, p.deltaFloat FROM (
  SELECT TOP 100 raj2000, dej2000, name, z FROM veronqsos.data
    WHERE notRadio!='*'
    AND z BETWEEN 0.5 AND 1
    AND dej2000<-40) AS q
  JOIN (
  SELECT * FROM ppmx.data WHERE vmag BETWEEN 10 AND 11) AS p
  ON (1=CONTAINS(
    POINT('', q.raj2000, q.dej2000),
    CIRCLE('', p.alphaFloat, p.deltaFloat, 0.3)))

However, both of these queries will probably time out on you. Our system will kill queries coming from the web after 15 seconds and tell you that your query timed out. In that case, it may be worthwhile to try and reformulate it. Otherwise, just contact us and we will figure out some way to get your query to execute, possibly by adding more indices to our tables. In particular, any query on large-ish data sets (like the PPMX) not using at least one condition on a column with an index is bound to time out. Columns that are parts of indices are highlighted in the table descriptions.

It may not be obvious why adding the WHERE clause above should hurt so badly here, since the database would only have to check a couple of thousand rows, and that's a breeze for a modern computer. However, database engines contain a component called a query planner that should reduce all equivalent queries to the same, optimal form. In reality, this doesn't always work very well, which isn't surprising when you think about the amount of information required to find the optimal sequence of operations to a given result. This means that the machine might completely mess up your query, and that is what happens in this case.

There is a common workaround in SQL, known as the "OFFSET 0" trick; this is not possible in ADQL since its syntax doesn't allow this. As a workaround, you can say SELECT ALL, which internally does the same thing (of course, it's not nice to overload a no-op with an almost-no-op). The downside is that you need one more query level. The result then is:

SELECT * FROM (
  SELECT ALL
    q.name,
    q.raj2000,
    q.dej2000,
    p.alphaFloat,
    p.deltaFloat,
    p.vmag FROM (
      SELECT TOP 100 raj2000, dej2000, name, z
      FROM veronqsos.data
      WHERE notRadio!='*'
        AND z BETWEEN 0.5 AND 1
        AND dej2000<-40) AS q
      JOIN ppmx.data AS p ON (1=CONTAINS(
        POINT('', p.alphaFloat, p.deltaFloat),
        CIRCLE('', q.raj2000, q.dej2000, 0.3)))) AS f
WHERE vmag BETWEEN 10 and 11

It may look a bit daunting, but it really built up from simple queries, and it can be taken apart to reveal the simple building blocks.

Dissecting combined flags

This example shows how to decode combined flags (i.e., flags-like numbers in which digits (or groups of digits) need to be extracted to allow interpretation. This is common practice in many historical tables; in the data center, it is rampart within arigfh.id and the related tables. The principle here is to use the mod function (which, in a pinch, can also help with binary, rather than decimal, multiflags):

SELECT decCat, raj2000, dej2000, epDec, eqDec
FROM arigfh.id
WHERE 4=MOD(decflags/10000, 10)

Apply ICRS corrections

The tables ppmxl.usnocorr and ucac3.icrscorr give mean corrections in 1-degree-sized squares in RA and Dec each to bring positions (and possibly proper motions) based on USNO-B1 and UCAC3 (closer) to ICRS. Consider the following query:

SELECT
   u.raj2000+d_alpha+d_pmalpha/cos(radians(u.dej2000))*(u.epoch-2000)
     AS ra_icrs,
   u.dej2000+d_delta+d_pmdelta*(u.epoch-2000) AS de_cicrs,
   u.pmra+d_pmalpha AS pmra_icrs,
   u.pmde+d_pmdelta AS pmde_icrs,
   u.*
FROM
  TAP_UPLOAD.T1 AS u
  JOIN ucac3.icrscorr AS c
  ON (c.alpha=FLOOR(u.raj2000)+0.5 and c.delta=FLOOR(u.dej2000)+0.5)

This (approximately; in reality, the corrections would need to be applied in the tangetial plane) applies corrections to a table you upload (i.e., a VOTable or something you can load into TOPCAT). As you can see, we assume the positions and proper motions in your table are in the raj2000, dej2000, pmra, and pmde columns, and epoch contains the epoch the data has been reduced to using the proper motions, in Julian years.

You will also have to adjust the string after TAP_UPLOAD according to whatever name you give to your upload. The query given would work when your table is the first table within TOPCAT.

The equations are taken from the service info; again, they are not exact and will fail miserably near the poles, for epochs far away from J2000., and they do not take into account crossing the RA=0 line.

Tricking the query planner

Because if current limitations, the postgres query planner easily gets confused when it is asked to work out how to do crossmatches, in particular when views (“virtual tables“) are involved. This concerns tables like gedr3dist.litewithdist or gedr3mock.main.

In such cases, for the time being you have to “unwrap” the view and use its constituent tables (which usually are in the same schema), and then force the query planner to do your crossmatch first. The simplest way to force it is by using WITH subclauses (also known as common table expressions or CTEs). For instance, a match with an uploaded table like:

SELECT hipno, source_id, r_med_photogeo
FROM gedr3dist.litewithdist AS db
JOIN TAP_UPLOAD.t1 AS tc
ON distance(db.ra, db.dec, tc.raj2000, tc.dej2000)<1./3600.

(t1 would have to have raj2000, dej2000, and hipno columns; use fk6.fk6join if you want to experiment) will be excruciatingly slow, as the planner deems it's faster to first join the huge tables making up litewithdist.

To fix this, force postgres to do the fast and small join of your uploaded table with the gaia catalogue first and only then join the distances in. In this case, that is:

WITH innerresult AS (
      SELECT hipno, source_id
      FROM gaia.edr3lite AS db
      JOIN TAP_UPLOAD.t1 AS tc
      ON distance(db.ra, db.dec, tc.raj2000, tc.dej2000)<1./3600.)

SELECT innerresult.*, r_med_photogeo
FROM innerresult
JOIN gedr3dist.main
USING (source_id)

Apologies for the complication; this is not easy to fix in the underlying software.

If unsure about the original join, feel free to contact the operators.

Katkat bibliography

To search for title (or other) words in katkat.katkat's source field or in some other sort of bibliographic query, use the ivo_hasword "user defined function" (an ADQL extension which, in this case is also used by the relational registry). This basically works a bit like you'd expect from search engines: case-insensitive, and oblivious to any context.

SELECT *
FROM katkat.katkat
WHERE 1=ivo_hasword(source, 'variable')
  AND minEpoch<1920

Filtering by non-match

This query demonstrates how to filter objects in one table using another one; in this case, we filter objects with variable proper motion (due to non-resolved duplicity) from a field of PPMXL. Cf. dmubin.main, ppmxl.main:

select *
from (select *
  from ppmxl.main
  where
    1=contains(point('ICRS', raj2000, dej2000),
      circle('ICRS', 121, 12, 0.3))) as q
where not exists (
  select * from dmubin.main as d
  where
    1=contains(point('ICRS', d.raj2000, d.dej2000),
      circle('ICRS',q.raj2000, q.dej2000, 0.001)))

Correct for extinction

In our mcextinct.exts table we list extinctions within the magellanic clouds. To apply them, use TAP. For instance, to correct visual magnitudes for extinctions, say:

SELECT vmag-ev_i*2.4 as vmagcorr, mine.* FROM
        mcextinct.exts AS exts
        JOIN tap_upload.t1 AS mine
        ON (1=CONTAINS(POINT('ICRS', mine.raj2000,
                mine.dej2000),exts.bbox))

Here's a table adapted from 1998ApJ...500..525S that gives A/E(V-I) for some common filters:

Filter name λ_eff A/A(V) A/E(B-V) A/E(V-I)
Landolt U 3372 1.66 5.43 3.94
Landolt B 4404 1.32 4.32 3.13
Landolt V 5428 1.02 3.32 2.4
Landolt R 6509 0.82 2.67 1.94
Landolt I 8090 0.59 1.94 1.41
Gunn g 5244 1.07 3.48 2.52
Gunn r 6707 0.79 2.59 1.88
Gunn i 7985 0.61 1.99 1.44
Gunn z 9055 0.47 1.54 1.12
Stromgren u 3502 1.6 5.23 3.79
Stromgren b 4676 1.24 4.05 2.93
Stromgren v 4127 1.39 4.55 3.3
Stromgren beta 4861 1.18 3.86 2.8
Stromgren y 5479 1 3.28 2.37
Sloan u' 3546 1.58 5.16 3.74
Sloan g' 4925 1.16 3.79 2.75
Sloan r' 6335 0.84 2.75 1.99
Sloan i' 7799 0.64 2.09 1.51
Sloan z' 9294 0.45 1.48 1.07

Matching neutrino search areas with normal tables

Some tables, for instance antares10.data, have columns containing geometries (circles, polygons, etc.); in this case, this is because the positional uncertainty for neutrino observatories is so large.

Let's say you want to try some statistics with stars with "odd" spectra, perhaps those with RAVE (rave.main) estimated RVs over 500 km/s. Here's how to do it:

SELECT raveid, n_hits, rv
FROM rave.main as r
JOIN antares10.data
ON (1=CONTAINS(POINT('', r.raj2000, r.dej2000), origin_est))
WHERE ABS(rv)>500

A similar, perhaps more sensible, but also longer-running query is given in in the resource's documentation.

Find resources for a set of points

Several (but unfortunately by no means all) service operators define the spatial coverage of their services. This information is havested into the proposed new relational registry table rr.stc_spatial. One way to use this information is to match the resources against the points in an upload and perhaps see if they are in the infrared band, like this:

SELECT ivoid FROM
rr.resource
NATURAL JOIN
(SELECT DISTINCT
 ivoid
 FROM rr.stc_spatial AS db
 JOIN TAP_UPLOAD.t1 AS tc
 ON 1=CONTAINS(POINT('ICRS', tc.ra, tc.dec), coverage)) AS spate
WHERE 1=ivo_hashlist_has(waveband, 'Infrared')

As usual, you will have to adapt the index after TAP_UPLOAD. To find out access URLs of these services, NATURAL JOIN with rr.interface. For more information, see also the RegTAP specification

Make a HEALPIX map for something

This service understands the the ivo_healpix_index extension function (and its inverse, ivo_healpix_center, returning the center of a given HEALPix). This is nifty for computing properties over equal-sized parts of the sky, in particular when operating over large datasets (like gaia.edr3lite, ppmxl.main, twomass.data, hsoy.main). Here's a simple example over a small table with large HEALPixes; for large tables, it'll take this machine between 30 minutes and an hour to sift through 1 billion objects in this way:

SELECT
        MAX(parallax)/AVG(parallax) AS obs,
        ivo_healpix_index(4, ra, dec) AS hpx
FROM hipparcos.main
GROUP BY hpx

Note that you need to raise MAXREC to get back the whole sky, and be sure to check out TOPCAT's HEALPix plot (see Graphics/Sky Plot, and then Layers/Add Healpix Control).

Using CTEs to test queries on large tables

Queries that touch a large part of the rows present will take a long time on large tables (typically several 10s of minutes or an hour per billion rows). Please only run such queries if you have previously validated on a smaller subset that such queries actually do what you expect. A good way to do that is with Common Table Expressions (CTEs). With these, you define temporary, named tables that can then be used as if they were normal tables. For instance:

WITH sample AS (
        SELECT * FROM gdr2mock.main
        WHERE distance(ra, dec, 66.73, 75.87)<1)
SELECT
        ROUND(teff_val*0.01)/0.01 AS bin, avg(1/parallax) AS meandistance
FROM sample
GROUP BY bin

Note that CTEs are a planner fence, i.e., the query planner will usually actually execute the CTEs rather than mangle the relational expressions in order to save work. This means that when the query is ready for application on the whole table, you should normally remove the CTE and search "sample" in the remaining query to replace it with (in this case) "gdr2mock.main". This will (ideally) let the database find the optimal way to execute the query.

The technique is particularly recommended on tables like califadr3.fluxposv1200, califadr3.fluxposv500, gaia.edr3lite, gdr2mock.main, gps1.main, hsoy.main, ppmxl.main, sdssdr7.sources, supercosmos.sources, twomass.data, ucac5.main, usnob.data, wise.main).

Note that CTEs are not yet available on all TAP services. On services that do not have them, you can substitute them with subqueries in many situations.

Using ivo_histogram

While you will usually use simple GROUP statements to create histograms, in some use cases it is convenient to have histograms in cells. To support this, DaCHS has an array-valued ivo_histogram(col, low, high, n) function. It will return an array, where the first and last elements count under- and overflows, respectively (i.e., in how many rows col was below low or above high). Between these, there are n bins of width (high-low)/n with the respective counts.

A simple example, in this case exploiting that Gaia's source_id (as in gaia.edr3lite and gaia.dr2light) contains healpix numbers, would be:

SELECT
  ivo_histogram(phot_g_mean_mag, 0, 20, 20) as hist,
  ROUND(source_id/140737488355328) as hpx
FROM
  gaia.edr3lite
WHERE source_id BETWEEN 281474976710656000 AND 288511851128422400
GROUP BY hpx

In comparison, the relationally-preferred (and portable) way to do the same thing would be:

SELECT
  ROUND(phot_g_mean_mag) as bin,
  ROUND(source_id/140737488355328) as hpx
FROM
  gaia.dr2light
WHERE source_id BETWEEN 281474976710656000 AND 288511851128422400
GROUP BY hpx, bin

-- in this case, the aggregation by healpix needs to be done on the client side.

More Examples